nginx+gzip module might silently corrupt data upon backend failure
2011/11/04 § 3 Comments
There are several elements that make absolutely certain the page you’re reading in your browser is an accurate representation of the resource the HTTP server meant to send you1. Disregarding caching for a minute, we have two elements making sure the representation you get is protected from errors. The first protecting element is, of course, TCP, making sure that if the server wrote two-hundred bytes in a particular order, either they’ll all arrive to your end (in order and without errors) or your TCP stack will realize something bad happened and give your user-agent (your browser) a chance to cope with the error. The need for the second protecting element is a bit more sneaky: TCP will guarantee everything the server wrote will arrive, i.e., bytes for which the server called write(2) or equivalent will arrive (or you’ll know something went wrong). But what about bytes the server should have written but didn’t write all – for example, because some component on the server’s side failed?
The original HTTP (HTTP 0.9, 1996 time) didn’t cope with this situation at all. The signal to the client that the server finished talking was to disconnect the TCP session, which, from the client’s side, is a vague signal. Did the TCP server disconnect because it finished or because it ran into trouble (software fault, sysadmin action, kernel behaviour due to memory pressure or even a bug, etc)? Thankfully, current HTTP kicks in to complement TCP, allowing the server to do one of several things in order to make sure you’ll at least know you didn’t receive the whole picture. By far the two most common thing the server will do are to specify a Content-Length in the response’s header or to use a Transfer-Encoding, most probably chunked transfer encoding.
Content length is simple to grasp. The server wishes to say 200 bytes. It explicitly says: “I will say 200 bytes” in the response header. If the user-agent didn’t receive 200 bytes of response, it knows something went wrong. Chunked transfer encoding is only slightly more complex – the server will send the response in chunks, each chunks prefixed by the length of the chunk. The end of the document is marked by a zero-length chunk. So if the user-agent saw a chunk cut in the middle, or didn’t receive a zero-length chunk, it also knows something went wrong and has a chance to decide what to do about it. For example, when faced with incorrect content length, Chrome displays an ERR_CONNECTION_CLOSED error, whereas Firefox would display the portion of the page it did receive. Different behaviour, yes, but at least both user-agents in this example had a chance realize the response they received is partial. Which is really, really important, you know why? I’ll tell you why.
Enter caching. HTTP caching is a non-trivial matter with many unexpected gotchas and pitfalls, and I can’t cover it all here (why the complexity? I think it’s because all caching is an intentional form of data/state repetition, and repetition is something that in my experience humans often have difficulty reasoning about). By far the best document I know about HTTP caching is this splendid guide, but if you’re in a hurry or impatient, let me summarize the points interesting for this particular post. First, caches might exist in many places, some of them might be surprising, some of them might be slightly broken or at least very aggressive (ISP transparent caches, mutter mutter cough cough). Second, among many other things, HTTP caching lets a server give a client a token together resource, telling the client “next time you request this resource, tell me you have this token; maybe I’ll just tell you that the representation you got with this token is still fresh, without transferring it all over again”. This is called an ETag, and the response that says “just use what you have in your cache” is called HTTP 304 NOT MODIFIED.
Bottom line is, it’s really important to know when a representation of a resource is broken. Which is why I was quite amazed to learn that my HTTP server of choice, nginx, doesn’t validate the Content-Length it receives from its upstreams and is simply unaware when the response it received from an upstream server is chopped off. If your response specifies a content length but closes the connection without delivering enough bytes, nginx will simply stall the request for a long time without closing the connection downstream, even though it has no hope of receiving additional data to push downstream. I tried this both with proxy_pass and uwsgi_pass, but I’m quite confident it’s true for other backends (fastcgi_pass, scgi_pass, etc). This is bad, but not as bad as the case where you want an nginx module to manipulate your content, removing existing content length/transfer encoding and applying its own (the gzip module indeed does that). If a backend error occurs while content-length-oblivious-nginx is altering the data, the content altering module will apply what it applies to the bytes it received, add new content-length/transfer-encoding, assuring everyone the response is OK, and entice user-agents or even proxies to enter the almost-never-recover bad cache scenario I described in the previews paragraph. Ouch!
The proper way to fix this, IMHO, is that nginx simply must start looking at the upstream’s content length (or transfer encoding, once nginx starts using chunked responses with its upstreams). Part of the reason I’m writing this post is that Maxim Dounin, venerable nginx comitter and an OK chap overall, told me he doesn’t consider this a top priority at the moment, but I humbly disagree with his assessment of how serious the issue is. Until such a time as nginx is fixed about this, I think you must disable all content-manipulating nginx modules and instead handle all message length affecting work in your upstream (compression, addition, etc). This is what I opted to do with my django based web app, I replaced nginx’s gzip module with Django’s GZipMiddleware. It’s a terrible shame though. It’s doing the job of nginx for it, probably in a lesser fashion than how nginx could, it violates a must not clause in Python’s WSGI PEP333, and I have empiric proof that Tim Berners-Lee chokes a kitten every time you do it.
But what’s the alternative? Risk invisibly cached corrupt data for an undetermined length of time? Ditch nginx, which I think is the best HTTP server on this planet despite this debacle? Nah. Both are unacceptable.