The One and the Many

lighttpd response streaming

Today I updated my HTTP server, lighttpd, from version 1.4.39 to 1.4.43. This came through Debian repository updates. I use Debian testing (currently stretch), so I don't expect extreme stability, but I thought today's update revealed a bug. In the end, it turned out to be a change in lighttpd's default behaviour rather than a bug. This post is about how I figured this out. I also include how to change this behaviour back.

I recently built an application to stream audio via HTTP (called audiostreamer because I like imaginative names). I tested it today because I was going to link it to someone, and I found it wasn't working. I thought I must have introduced a bug, and began debugging.

The request/response path to the application looks like this:

HTTP client <--- HTTP ---> lighttpd <--- FastCGI ---> application

A client makes an HTTP request to lighttpd, which in turn talks to the application using the FastCGI protocol. A key point to understand about this application is its response uses Transfer-Encoding: chunked, and the response never stops (until the client closes its connection). The idea is to stream audio from a live source, forever!

I found that when I made the request through lighttpd, zero bytes came through to the client. If I made the request to the application directly, I saw a stream of data (MP3 data).

I still believed my application was the problem. I added debug logs to the application, and saw that from its perspective it was sending plenty of data to the client. I began to question whether lighttpd was doing something different with long running responses since the update, so I patched my application to send a small amount of data and then end the response. When the response ended, the client received all the data at once. It looked like lighttpd was buffering the data from the application, something it wasn't doing before.

I thought I might have a lighttpd bug on my hands, specifically a bug in its FastCGI implementation. But I was not sure it was a bug. Could the old behaviour have been violating the FastCGI specification? I read through the specification to see if it said anything about this. As far as I can tell, it does not. I wondered why it had changed in lighttpd.

I re-installed the old version of lighttpd and tested to verify my theory that it was a change in response buffering. The older version streamed data to the client before the response ended, the new did not. As a final test, I wrote a small test application that sends large chunks of data slowly via FastCGI and verified the change in behaviour showed up with this application as well. A key to reproducing this is to flush the buffer on the application side so lighttpd sees it immediately.

I rubbed my hands together with glee because I had been looking for an excuse to do some lighttpd development. I set up a Docker container (Debian stretch) and built and installed lighttpd 1.4.39. I configured it to talk to my test application with FastCGI, and then verified the streaming behaviour. The HTTP client saw the data immediately as it went out of the application.

I then built and installed lighttpd versions 1.4.44 down to 1.4.40 and tested each one. I wanted to find the version the buffering change was introduced so I could run a diff. In each version, the lighttpd buffered the response and the HTTP client saw nothing until the response completed. The behaviour changed in 1.4.40.

I examined the change log for 1.4.40. I noticed a change under the heading "selected new features": "server.stream-response-body to control streaming, buffering of response". This sounded like it was an option to change the behaviour I was interested in. I went to the lighttpd documentation for this option.

The default (0) says "buffer entire response body before sending to client". This is what I saw starting in version 1.4.40+. Another value (2) says "stream response body to client; minimal buffering might block backend". Changing this option to 2 reverted the behaviour to how it was in 1.4.39. The comment about "might block backend" is concerning though!

In the end the behaviour turned out to be an intentional change rather than a bug. While I didn't get to make any changes to contribute back, it was good that I could easily toggle an option to return behaviour to how I wanted.