Archive

Archive for March, 2019

Performance Testing WebDAV Clients

March 18th, 2019 No comments

Part of migrating applications from on-premises hosting to cloud hosting (AWS, Azure, etc) involves re-evaluating how users access their data. A recent migration involved users running Windows 10 accessing a Windows file share using the SMB protocol. Since SMB isn’t safe to run directly over the Internet (it’s usually not encrypted and it has a long history of security vulnerabilities), potential options included tunneling SMB within a VPN or changing away from the SMB protocol. One such alternative is WebDAV.

Web Distributed Authoring and Versioning (WebDAV) provides the same basic functionality as NFS, SMB, or FTP over the familiar, widely deployed well supported HTTP protocol. It was literally designed for this use case.

Testing Methodology

I evaluated 7 approaches for clients to access a WebDAV share, testing the performance of each and noting pros and cons.

Testing was performed in March 2019. The same client, same server, and same connection was used for each test.

The client is a Dell Latitude 7490 running Fedora 29. Windows 10 (version 1809 build 17763.348) was run as a VM in KVM using user mode networking to connect to the Internet. For the smb+davfs2 testing, smb+davfs2 ran in Docker running in Linux and the Windows 10 VM connected to it.

The WebDAV server is running Apache 2.4.38 and Linux kernel 5.0.0-rc8. The WebDAV site is served over HTTPS.

Ping time between them is 18ms average, httping to the WebDAV server URL is 140ms average.

Windows 10’s Redirector over HTTP/1.1

WebDAV Redirector is the name of the built in WebDAV client in Windows. It’s accessed from Windows Explorer by using the “Map network drive…” option and selecting “Connect to a web site that you can use to store your documents and pictures”

Recent versions of Windows 10 support HTTP/2, so for this test, the “h2” protocol was disabled on the Apache server.

Windows 10’s Redirector over HTTP/2

This test is the same as the previous one except the “h2” protocol is enabled on the Apache server.

WebDrive

WebDrive (version 2019 build 5305) was used with “Basic Cache Settings” set to “Multi-User.” All other settings were left at their defaults.

WebDrive does not currently support TLS 1.3 or HTTP/2; I suspect these would improve performance.

davfs2 on Linux

davfs2 is a Linux file system driver which allows mounting WebDAV shares just like any other file system.

For this testing, the client is Linux, not Windows 10. Therefore, this test isn’t a real solution (the clients do need to run Windows 10). This test is for comparison and testing on component of a possible solution (Samba + davfs).

davfs2 does not support HTTP/2.

davfs2 offer a number of configuration options; here’s what was used for this testing:

/home/candrews/.davfs2/davfs2.conf
[/home/candrews/webdavpoc]
use_locks 1
delay_upload 0 # disable write cache so users know that when the UI tells them the upload is done it’s actually done

And the mount command:

sudo mount -t davfs https://www.integralblue.com/cacdav /home/candrews/webdavpoc/ -ouid=1000,gid=1000,conf=/home/candrews/.davfs2/davfs2.conf

The davfs2 version is 1.5.4.

davfs2 with locks disabled (using a Linux client)

This test is the same as the previous one but this WebDAV locks disabled by setting “use_locks 1” in davfs2.conf.

Samba + davfs2

This test uses Samba to expose the davfs2 mount from the davfs2 test and uses Windows 10 to access the result SMB share.

The davfs2 mount exposed using SMB with the dperson/samba docker image using:

name smb -e USERID=1000 -e GROUPID=1000 --security-opt label:disable -v /home/candrews/webdavpoc:/webdavpoc:Z --rm -it -p 139:139 -p 445:445 -d dperson/samba -s "webdavpoc;/webdavpoc;yes;no;no;testuser" -u "testuser;testpass" -g "vfs objects ="

The biggest challenge with this approach, which I didn’t address, is with regards to security. The davfs2 mount that Samba exposes uses fixed credentials – so all Windows 10 users using the smb+davfs proxy will appear to the WebDAV server to be the same user. There is no way to pass through the credentials from the Windows 10 client through Samba through davfs2 to the WebDAV server. I imagine this limitation may disqualify this solution.

samba + davfs2 with locks disabled

This test is the same as the previous one but WebDAV locks are disabled in davfs2.

Results

Conclusion

Key takeaways from the results:

  • The built in Windows Redirector (which is free and the easiest solution) is by far the slowest.
  • WebDAV benefits greatly from HTTP/2. HTTP/2 was designed to improve the performance of multiple concurrent small requests (as is the case of HTML with references to CSS, Javascript, and images), and that’s exactly how WebDAV works. Each download is a PROPFIND and a GET, and each upload is a PROPFIND, PROPPUT, PUT, and, if locking is enabled, LOCK and UNLOCK.
  • Disabling WebDAV locks improves performance. By eliminating the LOCK/UNLOCK requests for every file, upload performance doubled. Unfortunately, disabling WebDAV locks for the built in Windows Redirector requires a registry change, a non-starter for many organizations. WebDrive has locks disabled by default. davfs2 only uses locks when editing files (when a program actually has the file open for writing), not when uploading a new file.
  • Surprisingly, the overall fastest (by far) approach involves a middle box running Linux using smb+davfs2. Adding a hop and another protocol usually slows things down, but not in this case.
  • davfs2 is fast because it opens more HTTP connections allowing it to do more requests in parallel. WebDrive supports adjusting the concurrency too; in “Connection Settings” there are settings for “Active Connections Limit” (defaults to 4) and “Active Upload Limit” (defaults to 2). Adjusting these settings would impact performance.
  • The client/server connection was high bandwidth (~100mpbs) and low latency (~18ms). Lower bandwidth and high latency would result in even greater benefits from using HTTP/2 and disabling locking.

Considerations for future research / testing include:

  • AWS Storage Gateway may be a solution. Concerns for this approach include:
    • Incompatible with MITM HTTPS-decrypting security devices
    • Uses a local, on premises cache (called an “upload buffer” in the documentation) so changes are not immediately available globally
    • Integrates with Active Directory for authentication/authorization, but setting up for a large number of users is onerous.
    • Requires the use of S3 for file storage, which may not be compatible with the servers in AWS
  • Various other WebDAV clients for Windows, such as DokanCloudFS, NetDrive, DirectNet Drive, ExpanDrive, and Cyberduck / Mountain Duck.
  • Further tweaks to configuration options, such as the concurrency settings in WebDrive and davfs2
Categories: Uncategorized Tags:

The Sad Story of TCP Fast Open

March 13th, 2019 No comments

I’m very interested in performance. If there’s a way to make something fast, you’ve got my attention. Especially when there’s a way to make a lot of things fast with a simple change – and that’s what TCP Fast Open (TFO) promises to do.

TFO (RFC 7413), started out in 2011 as a way to eliminate one of the round trips involved in opening a TCP connection. In early testing discussed at the 2011 Linux Plumbers Conference, Google found that TFO reduced page load times by 4-40%. The slowest, highest latency connections would benefit the most – TFO promised to be a great improvement for many users.

Support for this performance improving technology rapidly grew. In 2012, Linux 3.7 gained support for client and server TFO. In 2013, Android gained support when KitKat (4.4) was released using the Linux 3.10 kernel. In 2015, iOS gained support. In 2016, Windows 10 got support in the Anniversary Update. Even load balancers, such as F5, added support.

And yet, today, no browsers support it. Chrome, Firefox, and Edge all have use of TFO disabled by default.

What happened to this technology that once sounded so promising?

Initial Optimism Meets Hard Reality

I attribute the failure to achieve widespread adoption of TCP Fast Open to four factors:

  1. Imperfect initial planning
  2. Middleboxes
  3. Tracking concerns
  4. Other performance improvements

Factor 1: Imperfect Initial Planning

TCP Fast Open was in trouble from initial conception because it is an operating system change that had to done perfectly from the very beginning. Operating systems have very long lifespans – updates happen slowly, backwards compatibility is paramount, and changes are (rightfully so) difficult to make. So when the TFO specification wasn’t perfect the first time, that was a major blow to the changes of ever achieving widespread adoption.

TFO requires the allocation of a new, dedicated TCP Option Kind Number. Since TFO was experimental when it started out, it used a number (254 with magic 0xF989) from the experimental allocation as described in RFC 4727. Which quickly got ingrained in Windows, iOS, Linux. and more. As the saying goes, “nothing is as permanent as a temporary solution.”

So when TFO left experiment status with RFC 7413, the document states: “Existing implementations that are using experimental option 254 per [RFC6994] with magic number 0xF989 (16 bits) as allocated in the IANA “TCP Experimental Option Experiment Identifiers (TCP ExIDs)” registry by this document, SHOULD migrate to use this new option (34) by default.”

Did all implementations migrate? If they did, they would lose compatibility with those that didn’t migrate.

So all systems must now support both the experimental TCP Option Kind Number and the permanent one.

This issue isn’t a deal breaker – but it certainly wasn’t a great way to start out.

Factor 2: Middleboxes

Middleboxes are the appliances that sit between the end user and the server they’re trying to reach. They’re firewall, proxies, routers, caches, security devices, and more. They tend to be very rarely updated, very expensive, and running proprietary software. Middleboxes are, in short, why almost everything runs over HTTP today and not other protocols as the original design for the Internet envisioned.

The first sign of trouble appeared in the initial report from Google in 2011 regarding TFO. As reported by LWN, “about 5% of the systems on the net will drop SYN packets containing unknown options or data. There is little to be done in this situation; TCP fast open simply will not work. The client must thus remember cases where the fast-open SYN packet did not get through and just use ordinary opens in the future.”

Over the years, Google and Mozilla did much more testing and found that TFO caused more trouble than it was worth. Clients that initiated TFO connections found failures frequently enough that on average, TFO wasn’t worth it. In some networks, TFO never works – for example, China Mobile’s firewall consistently fails to accept TFO requiring every connection to be retried without it, leading to TFO actually increasing roundtrips.

Middleboxes are probably the fatal blow for TFO: the existing devices won’t be replaced for (many) years, and the new replacement devices may have the same problems.

Factor 3: Tracking Concerns

During initial connection to a host, TFO negotiates a unique cookie; on subsequent connections to the same host, the client uses the cookie to eliminate one round trip. Using this unique cookie allows servers using TFO to track users. For example, if a user browses to a site, then opens an incognito window and goes to the same site, the same TFO cookie would be used in both windows. Furthermore, if a user goes to a site at work, then uses the same browser to visit that site from a coffee shop, the same TFO cookie would be used in both cases allowing the site to know it’s the same user

In 2011, tracking by the governments and corporations wasn’t nearly as much of a concern as it is today. It would still be 2 years before Edward Snowden would release documents describing the US government massive surveillance programs.

But, in 2019, tracking concerns are real. TFO potential to be used for user tracking makes it unacceptable for most use cases.

One way to mitigate tracking concerns would be for the TFO cookie cache to be cleared whenever the active network changes. Windows/Linux/MacOS/FreeBSD/etc should consider clearing the OS’s TFO cookie cache when changing networks. See this discussion on curl’s issue tracker for more.

Factor 4: Other Performance Improvements

When TFO started out, HTTP/2 was not yet in use – in fact, HTTP/2’s precursor, SPDY, have a draft until 2012. With HTTP/1, a client would make many connections to the same server to make parallel requests. With HTTP/2, clients can make parallel requests over the same TCP connections. Therefore, since it setups up far fewer TCP connections, HTTP/2 benefits much less than HTTP/1 from TFO.

HTTP/3 even plans to use UDP, instead of TCP, to reduce connection setup round trips gaining the same performance advantage of TFO but without its problems.

TLS 1.3 offers another improvement which reduces round trips called 0RTT.

In the end, performance has been improving without requiring TFO’s drawbacks/costs.

The Future of TFO

TFO may never be universally used, but it still has its place. To summarize, the best use case for TFO would be with relatively new clients and servers connected by a network using either no middleboxes or only middleboxes that don’t interfere with TFO in a use case where user tracking isn’t a concern.

DNS is such a use case. DNS is very latency sensitive – eliminating the latency from one round trip would give a perceivable improvement to users. The same TCP connections are made from the same clients to the same servers repeatedly, which is TFO’s best case scenario. And there’s no tracking concern since many DNS clients and servers don’t move around (there’s no “incognito” mode for DNS). Stubby, Unbound, dnsmasq, BIND, and PowerDNS, for example, include or are currently working on support for TFO.

Categories: Uncategorized Tags: