A few days ago I ran into an interesting post by Ian Bicking about a benchmark that Nicholas Piël ran on WSGI servers. Go ahead and read the original posts, but the the skinny is that Nicholas’ (and many others’) focus was about performance, wherein Ian (and I…) feel more attention should be given to the ‘robustness’ of the server, so to speak.
What’s robustness, really? Well, worker-code can misbehave in a lot of ways (and for a lot of reasons), and a robust server will deal gracefully with these misbehaviours. The puritans in the crowd may well argue that the proper solution is to fix the worker (or any of its hierarchy of dependencies). However, all too often purity is a luxury that a web programmer can’t afford (and maybe shouldn’t bother to afford, but maybe we can talk about that some other time).
So I sat down and jotted something called Labour which is a library/tool to ‘benchmark’ WSGI servers not on the merit of how many jiffies they take to parse HTTP headers and what kind of scheduler algorithm they got, but rather on how they’re handling bad workers (socialist pun intended but not necessarily achieved). Labour lets a test author write elegant (ahem) code to declare a certain mixture of bad worker behaviour, and then run it against any one of several WSGI servers.
Let’s look at the simplest example (you can find a few more in Labour’s repository):
[sourcecode language=”python” wraplines=”false”]
from labour import servers
from labour import client
from labour import behaviours
from labour import report
driver = client.Client()
driver.add_behaviour(behaviours.PlainResponse(), weight=99)
driver.add_behaviour(behaviours.Sleeping(sleep_duration=1),
weight=1)
with servers.WSGIRef() as server:
driver.execute(iterations=1000)
report.trivial_report(driver)
[/sourcecode]
At its current early stage, Labour supports these (self explanatory) behaviours: PlainResponse, Sleeping, IOBound, LeakMemory, PythonWedge and SIGSEGV, more can be added with ease. It also supports the following servers (at least to some extent): WSGIRef, Twisted, CherryPy, Cogen, FAPWS3, Gevent and Paste. Significant work should be done in the areas of forking test clients in parallel (the library is near meaningless without this), supporting more WSGI servers and better exploitation of those already supported, richer and better reporting and cleaning up the potential mess after a test.
This long list of caveats said – it is actually working code and it’s currently on github (link above). So either grab it while it’s hot or follow this space to see updates in the (near) future, I’d love to hear what you think about it.
Comments
3 responses to “Workers of the world (wide web), unite!”
Hi Yaniv,
I applaud your efforts and think that this is a nice first approach. It is true, that in my benchmark the focus was on performance. However, this also said something about the robustness of the server under severe load, which showed itself in successful reply rate, error rate and memory consumption.
I think that for this library to be successful it will not only need to collect these kinds of information but it also needs to be able to outperform the server. You have to be careful that you are nog going to test client performance instead of server performance. Getting a client that will not only be able to fire up lots of concurrent requests but also verify the replies and generate a useful report is a lot of work. So i completely agree with your conclusion that, if you want to write all that code from scratch, it is a significant amount of work.
But you don’t have to, you can make code that wraps around the ‘multi-protocol distributed load testing tool’ Tsung for example. Not only does this already have great reporting tools, easy manipulation of client behavior, verification of server results, automatic management of distributed load generation but it is also able to generate high loads. To be honest, I doubt if firing requests with urllib from child processes is going to generate any stress on the server at all. If it does, one could wonder how realistic the misbehaving server code is. For example, in my benchmark I had a difficult time pushing the top WSGI servers to their limit and this with optimized tools explicitly made for that goal. I had to rely on distributing the request load over multiple machines.
Anyway, I will be looking forward to the results!
Cheers,
Nicholas Piël
Thanks for the input! I also doubt that a few Python processes running on one machine and armed with urllib will be able to generate significant load against a serious server.
That said, and speaking off the top of my head and without inspecting any of the servers (yet), I think that at least for some of the tests, it’s not really necessary to load the server. A server that doesn’t recycle leaking workers will perform poorly at 100rps just as well as at 10,000rps. The opposite isn’t true, of course (performing well at 100 won’t guarantee good behaviour at 10,000)
However, don’t get me wrong, I do agree that Labour forking a ‘serious’ (read: optimized, probably like Tsung) client is an eventual requirement and will yield better results – I just think that significant results, at least for some worker behaviours, can be reached even with a rather crippled client, even a urllib one.
[…] under: Labour | Tags: python, web | I’ve had more time to work on Labour (originally posted here), the WSGI Server Durability Benchmark. I’m relatively happy with progress so far, and will […]