Blog Index

iroh on QUIC Multipath

by Floris Bruynooghe
a phone and a laptop connected by cloud, cellular, and wifi paths through the internet

Iroh is a library to establish direct peer-to-peer QUIC connections. This means iroh does NAT traversal, colloquially known as holepunching.

The basic idea is that two endpoints, both behind a NAT, establish a connection via a relay server. Once the connection to the relay server is established they can do two things:

  • Exchange QUIC datagrams via the relay connection.
  • Coordinate holepunching to establish a direct connection.

And once you have holepunched, you can move the QUIC datagrams to the direct connection and stop relying on the relay server. Simple.

Relay Servers

An iroh relay server is a classical piece of server software, running in a datacenter. It exists even though we want p2p connections, because in today's internet we cannot have direct connections without holepunching. And you cannot have holepunching without being able to coordinate. Thus, the relay server.

Because we would like this relay server to essentially always work, it uses the most common protocol on the internet: HTTP1.1 inside a TLS stream. Endpoints establish an entirely normal HTTPS connection to the relay server and then upgrade it to a WebSocket connection.1 This works even in many places where the TLS connection is Machine-In-The-Middled by inserting new "trusted" root certs because of "security". As long as an endpoint keeps this WebSocket connection open it can use the relay server.

The relay server itself is the simplest thing we can get away with. It forwards UDP datagrams from one endpoint to another, tunneling them inside the HTTP connections. Since iroh endpoints are identified by a NodeId it means you send it a destination NodeId together with a datagram. The relay server might now either:

  • Drop the datagram on the floor, because the destination endpoint is not connected to this relay server.

  • Forward the datagram to the destination.

The relay server does not need to know what is in the datagram. In fact, iroh makes sure it cannot know what is inside: the payload is always encrypted to the destination endpoint.2 The relay server is nothing more than another network path along which UDP datagrams can travel between iroh nodes.

Holepunching

UDP holepunching is simple really.3 All you need is for each endpoint to send a UDP datagram to the other at the same time. The NAT routers will think the incoming datagrams are a response to the outgoing ones and treat it as a connection. Now you have a holepunched, direct connection.

To do this an endpoint needs to:

  • Know which IP addresses it might be reachable on. Some time we'll write this up in its own blog post, for now I'll just assume the endpoints know this.

  • Send these IP address candidates to the remote endpoint via the relay server.

  • Once both endpoints have the peer's candidate addresses, send "ping" datagrams to each candidate address of the peer. Both at the same time.

  • If a "ping" datagram is received, respond with "yay, we holepunched!". Typically this will be only on 1 IP path out of all the candidates. Or maybe more and more these days it'll succeed for both an IPv4 and an IPv6 path.

If you followed carefully you'll have counted 3 special messages that need to be sent to the peer endpoint:

  1. IP address candidates. These are sent via the relay server.

  2. Pings. These are sent on the non-relayed IP paths.

  3. Pongs. These are also sent on the non-relayed IP paths.

They need to be sent as UDP datagrams. Over the same paths as the QUIC datagrams are also being sent: the relay path and any direct paths.

Multiplexing UDP datagrams

Iroh stands on the shoulders of giants, and it looked carefully at ZeroTier and Tailscale. In particular it borrowed a lot from the DERP design from Tailscale. From the above holepunching description we get two kinds of packets:

  • Application payload. For iroh these are QUIC datagrams.
  • Holepunching datagrams.

Both these need to be sent and received from the same socket, because holepunching a different socket than your application data uses is not that helpful. So when an iroh endpoint receives a packet it needs to first figure out which kind of packet this is: a QUIC datagram, or a holepunching datagram? If it is a QUIC datagram it is passed onto the QUIC stack.4 If it is a holepunching datagram it needs to be handled by iroh itself, by a component we call the magic socket. This is done using the "QUIC bit", a bit in the first byte of the datagram which is defined as always set to 1 in QUIC version 1.5 For holepunching datagrams we set this bit to 0.

IP Congestion Control

This system works great and is what powers iroh today. However, it also has its limitations. One interesting aspect of the internet is congestion control. Basically, IP packets get sent around the internet from router to router, and each hop has its own speed and capacity. If you send too many packets the pipes will clog up and start to slow down. If you send yet more packets, routers will start dropping them.

Congestion control is tasked with threading the fine line of sending as many packets as fast as possible between two endpoints, without adversely affecting the latency and packet loss. This is difficult because there are many independent endpoints using all those links between routers at the same time. But it also has had a few decades of research, so we achieve reasonably decent results by now.

Each TCP connection has its own congestion controllers, one per endpoint. And the same goes for each QUIC connection. Unfortunately, our holepunching packets live outside of the QUIC connection, so they do not know about the QUIC congestion controller. What is worse: when holepunching succeeds, the iroh endpoint will route the QUIC datagrams via a different path than before; they will stop flowing over the relay connection and start using the direct path.

But the QUIC stack is entirely unaware of this! It has no idea about what destination packets get sent to. Iroh completely lies to the QUIC stack and tells it to send packets to some private IPv6 range.6 Routing them to the correct path on the way out and rewriting received packets to come from this address.

Which is not great for the congestion controller, so iroh somehow coerces the QUIC congestion controller to restart whenever iroh chooses a new path.

Multiple Paths

By now I've talked several times about a "relay path" and a "direct path". A typical iroh connection has probably quite a few possible paths available between the two endpoints. A typical set would be:

  • The path via the relay server.7
  • An IPv4 path over the WiFi interface.
  • An IPv6 path over the WiFi interface.
  • An IPv4 path over the mobile data interface.
  • An IPv6 path over the mobile data interface.

The entire point of the relay path is to be able to start communicating without needing holepunching. So that path just works. But generally you'd expect the bottom 4 paths to need holepunching. And currently iroh chooses the path with the lowest latency after holepunching. But what if iroh was aware of all those paths all the time?

QUIC Multipath

Let's forget holepunching for a minute, and assume we can establish all those paths without any firewall getting in the way. Would it not be great if our QUIC stack was aware of these multiple paths? For example, it could keep a congestion controller for each path separately. Each path would also have its own Round Trip Time (RTT). So you can make an educated guess on which path you'd like to send new packets without them being blocked, dropped or slowed down.8

This is exactly what the QUIC-MULTIPATH IETF draft has been figuring out: allow QUIC endpoints to use multiple paths at the same time. And we totally want to use this in iroh. We can have a world where we have several possible paths, select one as primary and others as backup paths and seamlessly transition between them as your endpoint moves through the network and paths appear and disappear.9

There are a lot of details about QUIC-MULTIPATH on how to make it work. And adding this functionality to Quinn has been a major undertaking. But the branch is becoming functional at last.

Multipath Holepunching

If you've paid attention you'll have noticed that so far this still doesn't solve some of our issues: the holepunching datagrams still live outside of the QUIC stack. This means we send them at whatever time, not paying attention to the congestion controller. That's fine under light load, but under heavy load often results in lost packets. That in turn leads to having to re-try sending those. But preferably without accidentally DOSing an innocent UDP socket just quietly hanging out on the internet, accidentally using an IP address that you thought might belong to the remote endpoint.

So the next step we would like to take with the iroh multipath project is to move holepunching logic itself into QUIC. We're also not the first to consider this: Marten Seemann and Christian Huitema have been thinking about this as well and wrote down some thoughts in a blog post. More importantly they started QUIC-NAT-TRAVERSAL draft which conceptually does a simple thing: move the holepunching packets into QUIC packets.

While QUIC-NAT-TRAVERSAL is highly experimental and we don't expect to follow it exactly as of the time of writing, this does have a number of benefits:

  • The QUIC packets are already encrypted, we no longer need to manage our own encryption layer separately.

  • QUIC already has very advanced packet acknowledgement and loss recovery mechanisms. Including the congestion control mechanisms. Essentially QUIC is a reliable transport, which this gets to benefit from.

  • QUIC already has robust protection against sending too much data to unsuspecting hosts on the internet.

  • In combination with QUIC-MULTIPATH we get a very robust and flexible system, allowing us to always schedule packets on the best possible path. Timely reacting to path changes and restarting holepunching.

Another consideration is that QUIC is already extensible. Notice that both QUIC-MULTIPATH and QUIC-NAT-TRAVERSAL are negotiated at connection setup. This is a robust mechanism that allows us to be confident that in the future we'll be able to improve on these mechanisms.

Work In Progress

Integrating QUIC-MULTIPATH and QUIC-NAT-TRAVERSAL into iroh changes the wire-protocol. That is part of the reason we want this done before our 1.0 release: once we release this we promise to keep our wire-protocol backwards compatible. Right now we're hard at work building the pieces needed to make these improvements. And sometime soon-ish they will start landing in the 0.9x releases.

We aim for iroh to become even more reliable for folks who push the limits, thanks to moving all the holepunching logic right into the QUIC stack.


Footnotes

Footnotes

  1. What's that? You're still using iroh < 0.91? Ok fine, maybe your relay server still uses a custom upgrade protocol instead of WebSockets.

  2. Almost: The QUIC handshake has to establish a TLS connection. This means it has to send the TLS ClientHello message in clear text like any other TLS connection on the internet. Yes, we know about ECH. One thing at a time.

  3. Of course it isn't. But as already said, the word count of this post is finite.

  4. iroh uses Quinn for the QUIC stack, an excellent project.

  5. Since QUIC has released RFC 9287 which advocates "greasing" this bit: effectively toggling it randomly. This is an attempt to stop middleboxes from ossifying the protocol by starting to recognize this bit. Iroh not being able to grease this bit right now is not ideal either.

  6. Using our own IPv6 Unique Local Address Global ID.

  7. While this is currently a single relay path, you can easily imagine how you could expand this to a number of relay server paths. Patience. The future.

  8. But hey! Some of these paths share at least the first and last hop. So they are not independent! Indeed, they are not. Congestion control is still a research area, especially for multiple paths with shared bottlenecks. Though, you should note that this already happens a lot on the internet, your laptop or phone probably has many TCP and/or QUIC connections to several servers right now. And these definitely share hops. Yet the congestion controllers do somehow figure out how to make this work, at least to some reasonable degree.

  9. Wait, doesn't iroh already say it can do this? Indeed, indeed. Though if you've tried this you'd have noticed your application did experience some hiccups for a few seconds as iroh was figuring out where traffic needs to go. In theory we can do better with multipath, though it'll take some tweaking and tuning.

Iroh is a dial-any-device networking library that just works. Compose from an ecosystem of ready-made protocols to get the features you need, or go fully custom on a clean abstraction over dumb pipes. Iroh is open source, and already running in production on hundreds of thousands of devices.
To get started, take a look at our docs, dive directly into the code, or chat with us in our discord channel.