Home > Uncategorized > The Sad Story of TCP Fast Open

The Sad Story of TCP Fast Open

I’m very interested in performance. If there’s a way to make something fast, you’ve got my attention. Especially when there’s a way to make a lot of things fast with a simple change – and that’s what TCP Fast Open (TFO) promises to do.

TFO (RFC 7413), started out in 2011 as a way to eliminate one of the round trips involved in opening a TCP connection. In early testing discussed at the 2011 Linux Plumbers Conference, Google found that TFO reduced page load times by 4-40%. The slowest, highest latency connections would benefit the most – TFO promised to be a great improvement for many users.

Support for this performance improving technology rapidly grew. In 2012, Linux 3.7 gained support for client and server TFO. In 2013, Android gained support when KitKat (4.4) was released using the Linux 3.10 kernel. In 2015, iOS gained support. In 2016, Windows 10 got support in the Anniversary Update. Even load balancers, such as F5, added support.

And yet, today, no browsers support it. Chrome, Firefox, and Edge all have use of TFO disabled by default.

What happened to this technology that once sounded so promising?

Initial Optimism Meets Hard Reality

I attribute the failure to achieve widespread adoption of TCP Fast Open to four factors:

  1. Imperfect initial planning
  2. Middleboxes
  3. Tracking concerns
  4. Other performance improvements

Factor 1: Imperfect Initial Planning

TCP Fast Open was in trouble from initial conception because it is an operating system change that had to done perfectly from the very beginning. Operating systems have very long lifespans – updates happen slowly, backwards compatibility is paramount, and changes are (rightfully so) difficult to make. So when the TFO specification wasn’t perfect the first time, that was a major blow to the changes of ever achieving widespread adoption.

TFO requires the allocation of a new, dedicated TCP Option Kind Number. Since TFO was experimental when it started out, it used a number (254 with magic 0xF989) from the experimental allocation as described in RFC 4727. Which quickly got ingrained in Windows, iOS, Linux. and more. As the saying goes, “nothing is as permanent as a temporary solution.”

So when TFO left experiment status with RFC 7413, the document states: “Existing implementations that are using experimental option 254 per [RFC6994] with magic number 0xF989 (16 bits) as allocated in the IANA “TCP Experimental Option Experiment Identifiers (TCP ExIDs)” registry by this document, SHOULD migrate to use this new option (34) by default.”

Did all implementations migrate? If they did, they would lose compatibility with those that didn’t migrate.

So all systems must now support both the experimental TCP Option Kind Number and the permanent one.

This issue isn’t a deal breaker – but it certainly wasn’t a great way to start out.

Factor 2: Middleboxes

Middleboxes are the appliances that sit between the end user and the server they’re trying to reach. They’re firewall, proxies, routers, caches, security devices, and more. They tend to be very rarely updated, very expensive, and running proprietary software. Middleboxes are, in short, why almost everything runs over HTTP today and not other protocols as the original design for the Internet envisioned.

The first sign of trouble appeared in the initial report from Google in 2011 regarding TFO. As reported by LWN, “about 5% of the systems on the net will drop SYN packets containing unknown options or data. There is little to be done in this situation; TCP fast open simply will not work. The client must thus remember cases where the fast-open SYN packet did not get through and just use ordinary opens in the future.”

Over the years, Google and Mozilla did much more testing and found that TFO caused more trouble than it was worth. Clients that initiated TFO connections found failures frequently enough that on average, TFO wasn’t worth it. In some networks, TFO never works – for example, China Mobile’s firewall consistently fails to accept TFO requiring every connection to be retried without it, leading to TFO actually increasing roundtrips.

Middleboxes are probably the fatal blow for TFO: the existing devices won’t be replaced for (many) years, and the new replacement devices may have the same problems.

Factor 3: Tracking Concerns

During initial connection to a host, TFO negotiates a unique cookie; on subsequent connections to the same host, the client uses the cookie to eliminate one round trip. Using this unique cookie allows servers using TFO to track users. For example, if a user browses to a site, then opens an incognito window and goes to the same site, the same TFO cookie would be used in both windows. Furthermore, if a user goes to a site at work, then uses the same browser to visit that site from a coffee shop, the same TFO cookie would be used in both cases allowing the site to know it’s the same user

In 2011, tracking by the governments and corporations wasn’t nearly as much of a concern as it is today. It would still be 2 years before Edward Snowden would release documents describing the US government massive surveillance programs.

But, in 2019, tracking concerns are real. TFO potential to be used for user tracking makes it unacceptable for most use cases.

One way to mitigate tracking concerns would be for the TFO cookie cache to be cleared whenever the active network changes. Windows/Linux/MacOS/FreeBSD/etc should consider clearing the OS’s TFO cookie cache when changing networks. See this discussion on curl’s issue tracker for more.

Factor 4: Other Performance Improvements

When TFO started out, HTTP/2 was not yet in use – in fact, HTTP/2’s precursor, SPDY, have a draft until 2012. With HTTP/1, a client would make many connections to the same server to make parallel requests. With HTTP/2, clients can make parallel requests over the same TCP connections. Therefore, since it setups up far fewer TCP connections, HTTP/2 benefits much less than HTTP/1 from TFO.

HTTP/3 even plans to use UDP, instead of TCP, to reduce connection setup round trips gaining the same performance advantage of TFO but without its problems.

TLS 1.3 offers another improvement which reduces round trips called 0RTT.

In the end, performance has been improving without requiring TFO’s drawbacks/costs.

The Future of TFO

TFO may never be universally used, but it still has its place. To summarize, the best use case for TFO would be with relatively new clients and servers connected by a network using either no middleboxes or only middleboxes that don’t interfere with TFO in a use case where user tracking isn’t a concern.

DNS is such a use case. DNS is very latency sensitive – eliminating the latency from one round trip would give a perceivable improvement to users. The same TCP connections are made from the same clients to the same servers repeatedly, which is TFO’s best case scenario. And there’s no tracking concern since many DNS clients and servers don’t move around (there’s no “incognito” mode for DNS). Stubby, Unbound, dnsmasq, BIND, and PowerDNS, for example, include or are currently working on support for TFO.

CC BY-SA 4.0 The Sad Story of TCP Fast Open by Craig Andrews is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.