Labour Updated

I’ve had more time to work on Labour (originally posted here, inspired by this and that), the WSGI Server Durability Benchmark. I’m relatively happy with progress so far, and will probably try to ‘market’ Labour a bit shortly. Two things that are still sorely missing from Labour (and are blocking me from posting some news of it to mailing lists et al) are better reporting facilities (read: flashy graphs) and mod_wsgi support.

That said, Labour can already benchmark nine WSGI Servers with a single commandline just seconds after you pull it off the sourcecode repository, and that’s something (do note that you’d have to have the servers installed, Labour can benchmark all installed servers but won’t install them for you). The output is still not pretty, but you can unfold the item below to see sample results from an invocation of Labour. Do note that being a library, this is just one test configuration out of several. So while it’s possible to see numbers-divided-by-second-real-quick for all the benchmark junkies out there, if you want to do anything meaningful, you better read the README file (aww…).
[sourcecode wraplines=”false” collapse=”true” gutter=”false”]
4 test(s) against 9 server(s) finished.

We see that Labour benchmarked nine servers with four different tests. Each test starts the server and hits it with a certain number of HTTP Requests. The requests cause the Labour WSGI Application (contained in the server) to do all sorts of trouble. For example, the Plain test in the table above is comprised of 99% requests that will return normally (200 OK and a body) and 1% of requests that will return 404 NOT FOUND. The SIGSEGV test, on the other hand, is made of 50% OK requests and 50% requests that cause a SIGSEGV signal to be sent to whichever process running out application. The two sleeping tests cause the application to sleep(), eating up server workers, the heavy test more often and for longer times than the lighter variant.

Each cell in the result table contains two numbers – the percentage of ‘failed’ responses accumulated in the test, and the number of responses per second. In a Labout test, a “failed” response is an HTTP response which doesn’t match what’s expected from the Behaviour specified in the HTTP request that triggered it. For example, if we asked the server to respond with 404 NOT FOUND and the server responded 200 OK, that’s considered a failure. Some Behaviours allow quite a broad range of responses (I don’t expect anything from a SIGSEGV Behaviour, even a socket error would be counted as OK), as they are issued to measure their effect on other requests (maybe some WSGI Servers will insulate other requests from a SIGSEGV‘s). More than 10% of failed responses in a test will consider the whole test as failed.

Do note that the data in the table above should not be used to infer anything about the servers listed. First, it was created using only 1,000 requests per test (hey, I’m on vacation and all I got is a crummy lil’ Netbook). Second, so far Labour does nothing to try and tune the various servers with similar parameters. So, for example, maybe some of these servers have multiprocessing support which I didn’t enable, while others forked 16 kids. Third and last, I didn’t think much about the tests I’ve created so far, and am mostly focused with developing Labour itself rather than producing quality comparisons (yet).

That’s it, for now. You’re welcome to try Labour, and by all means I’d be happy to hear comments about this benchmarking methodology, the kind of test profiles you’d like to see constructed (try looking into labour.tester.factories and labour.behaviours, the syntax of test-construction is self-teaching), etc. I won’t wreak havoc on the sources on purpose, but no API is stable until declared otherwise (…).

Comments

One response to “Labour Updated”