Improving Performance of Spring’s ShallowEtagHeaderFilter by 50%

I led the team developing a new web presence for Mackenzie Financial, an effort which involved performance testing (which is always a good practice). During that testing, I discovered that ShallowEtagHeaderFilter was generating a lot of garbage and a lot of time was being spent in it. That didn’t seem ideal so I dove in to see if I could improve that situation.

Spoiler: I ended up improving ShallowEtagHeaderFilter’s performance by ~50% and the improvement is included in Spring 4.2.0 and later for everyone’s benefit.

What is ShallowEtagHeaderFilter?

ShallowEtagHeaderFilter is a way to improve performance by reducing the amount of data transferred from the server to the browser using ETags. When a browser requests a URL, it optionally includes an If-None-Match header in the HTTP request. When the server receives this request, ShallowEtagHeaderFilter buffers the response the server generates, and when the server is done generating the response, ShallowEtagHeaderFilter generates an ETag by hashing the response body. If the request contains an If-None-Match HTTP header and the value of that header matches the ETag generated by ShallowEtagHeaderFilter, ShallowEtagHeaderFilter will send HTTP 304 (Not Modified) and no response body; this saves bandwidth speeding up the experience for the browser. If the request doesn’t contain an If-None-Match header or if the value of that header doesn’t match the generated ETag, then ShallowEtagHeaderFilter returns HTTP 200 with the response body as well an ETag header that the browser can use next time in an If-None-Match header.

The ETag mechanism can really improve performance for end users. Notably, since the response is still generated, the server still has to do all the work (that’s why this approach is called “Shallow” ETags); it benefits only the client. Particularly for bandwidth limited mobile devices, ETags can mean the difference between a usable experience and a painfully slow one.

Changing How ShallowEtagHeaderFilter Buffers the Response Body

The profiler showed that ShallowEtagHeaderFilter allocates a lot of memory which makes sense as it needs to buffer the entirely of the response body. If the size of the response body is known ahead of time, it can allocate a buffer of the exact size. Otherwise, it needs to take a guess and if the buffer fills up, it needs to grow the buffer. The interesting part is how to grow the buffer.

ShallowEtagHeaderFilter buffers the response in a ResizableByteArrayOutputStream instance. When it needs to grows the response buffer, it uses ResizableByteArrayOutputStream.resize(int). ResizableByteArrayOutputStream then creates a new buffer, copies the old buffer into the new buffer, then (implicitly) releases the old buffer. This process has a major drawbacks: it needs to allocate large chunks of memory frequently and it copies a lot of data. Here’s how it looks:

  1. Create a buffer of size 1024 bytes
  2. When the buffer is full, a new buffer of 2048 bytes is allocated, then the 1024 from the old buffer is copied to the new buffer
  3. When that buffer is full, a new buffer of 4096 bytes is allocated, then the 2048 from the old buffer is copied to the new buffer
  4. When that buffer is full, a new buffer of 8192 bytes is allocated, then the 4096 from the old buffer is copied to the new buffer
  5. When that buffer is full, a new buffer of 16394 bytes is allocated, then the 8192 from the old buffer is copied to the new buffer
  6. When that buffer is full, a new buffer of 32768 bytes is allocated, then the 16394 from the old buffer is copied to the new buffer
  7. When that buffer is full, a new buffer of 65536 bytes is allocated, then the 32768 from the old buffer is copied to the new buffer
  8. When that buffer is full, a new buffer of 131072 bytes is allocated, then the 65536 from the old buffer is copied to the new buffer
  9. Now that response is done being written, the ETag itself is calculated using DigestUtils.appendMd5DigestAsHex which allocates a 4096 byte buffer. Then, the 99,000 bytes from the response are copied into that buffer 4096 bytes at a time.

So, when all is said and done, (1024+2048+4096+8192+16394+3276+65536+131072) = 231638 bytes of buffers were allocated. (71024+62048+54096+48192+316394+232768+1*65536) + 100000 = 352958 bytes were copied. That’s a lot of memory allocation and copying.

ShallowEtagHeaderFilter could use a different buffer implementation than ResizableByteArrayOutputStream which I called FastByteArrayOutputStream. The buffer could start with an initial byte array, then when that is filled, allocate a new byte array but only store the new bytes in it and keep the old bytes in the old array. Here’s how this approach looks:

  1. Create a buffer of size 1024 bytes
  2. When the buffer is full, a new buffer of 2048 bytes is allocated (no copying is done)
  3. When the buffer is full, a new buffer of 4096 bytes is allocated (no copying is done)
  4. When the buffer is full, a new buffer of 8192 bytes is allocated (no copying is done)
  5. When the buffer is full, a new buffer of 16394 bytes is allocated (no copying is done)
  6. When the buffer is full, a new buffer of 32768 bytes is allocated (no copying is done)
  7. When the buffer is full, a new buffer of 65536 bytes is allocated (no copying is done)
  8. Now that response is written (the 99000 bytes are spread across the 6 buffers), the ETag itself is calculated. No buffers are allocated – each byte is copied/updated to the MessageDigest.

So, when all is said and done, (1024+2048+4096+8192+16394+3276+65536) = 100566 bytes of buffers were allocated. 100000 bytes were copied.

In summary, this change:

  • Reduces the work necessary by the GC by reducing the number of objects allocated
  • Copies fewer bytes improving overall performance by doing less work
  • Each request now requires 131072 bytes less memory to process, allowing for greater concurrency
  • Fewer large allocations are done reducing heap fragmentation (in this example, the large 131072 allocation isn’t done at all)

Now Included in Spring

After some reviews by the Spring team and jmh benchmarking proving that the change really does improve performance by ~50%, Spring accepted this change and included it in Spring 4.2.0 and later.

The improvement was reflected by improved real world performance experienced by users of the web site. I like to think that this work helped contribute to Mackenzie Financial winning the 2015 Kasina Award as Top Canadian Website for Financial Advisors.

CC BY-SA 4.0 Improving Performance of Spring’s ShallowEtagHeaderFilter by 50% by Craig Andrews is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.