Heroku is great! However…

2012/03/09 § 10 Comments

I like Heroku. We’ve recently made our first deployment on it, and all things considered, I don’t think we could’ve made a better platform choice at this time. Deploying to Heroku taught me quite a few things, easily the most important of them was hearing about the 12 Factor App methodology. If you haven’t heard of it1, it’s basically a way to design web applications thus that deploying them will be less of a pain in the arse. The methodology’s manifest was written by @hirodusk, Heroku’s CTO, who probably knows something about cloud deployments. The best thing about it is that it’s elegantly platform/language neutral – so theoretically you don’t need to deploy a 12 Factor App to Heroku, you can deploy to any 12 Factor App compatible PaaS (had there been any), or even roll your own 12 Factor App platform and deploy there. Then again, why not focus on your core business and pay Heroku to use their awesome 12 Factor App platform? Well, ugh. Like I said, if you’re thinking about deploying to Heroku then you probably should, but I’d like to warn you that I don’t think Heroku is an excellent implementation of a 12 Factor App platform.

It’s a good implementation, yes, sometimes maybe just good enough. It saddens me to say it, but I feel there’s significant disparity between the force and clarity I see in the 12 Factor App theory and what I perceive as murky, surprising or faulty implementation details in the Heroku practice. I will try to illustrate exactly what I mean in this post, after these public service announcements: (1) I’ll be using 12 Factor & Heroku jargon freely, so you’d really gain the most from this if you’re familiar with both; (2) the Heroku deployment I did is in Python, so the only stack I know is Celadon Cedar; (3) some of the pain points I mention are related and may be fixed by a single Heroku change, but they still hurt in different ways and finally (4) I’m not looking for cheap shots: I know Heroku’s been grilled for their recent outages, but if you dig in my Twitter feed you’ll see proof that this post has been in the making for a while now, and has nothing to do with this or that outage (maybe a topic in itself, but not now).

Handling of static assets is a broken mess

Other people wrote enough about strategies to deal with the fact that Cedar has no ‘official’ way to serve static files as Bamboo had. I can’t say I’m happy Heroku’s own documentation about this important issue – for example, the introduction to django document conveniently sidesteps the issue altogether. But of all the various solutions out there, I didn’t find one that alleviates the much uglier (from a 12 Factor App perspective) underlying issue. See, your statics will either reside in your slug or elsewhere. If you plan on putting them in the slug, you can either compile2 them during slug creation (with a custom buildback, what else), or locally on your computer and push them. Slug creation time is at best an annoying environment to work with and debug on, if not for anything else (and there is a lot of else), then because of the lack of Polyglotism and the prevalence of multi language asset management tools (also, good luck building Flash/Silverlight/other necessary evils during slug creation).

So you opt to build locally, ignoring the alarm bells as you shatter factors I, V and X (at least). How will you get the built statics to Heroku? You’ll commit and push build products to your VCS, right <puke/>? And who guarantees two deployments will be identical, since you’re developing on Linux and your coworker on OSX? This goes on. The ramifications of serving your statics from an external storage service are the same, but now you also need to get your files to the storage service. But how? You aren’t supposed to have your S3 secrets on your dev machine, and Heroku’s support for configuration-during-slug-creation has serious caveats and a frightening warning attached to it. This is a serious wart, and I don’t think writing documentation will solve it (it will be better than nothing!), because I don’t see how this can be solved with the building blocks Heroku offers today (the buildpack mechanism as it is, git transport, etc). An nontrivial change is needed for truly awesome 12 Factor static support. I don’t always expect much from my PaaS, but when I do, it’s because I pay five cents per dyno hour.

There’s no buffering proxy in Cedar

This one isn’t even funny. If you’re not sure what a buffering reverse proxy is or why you need one if your server uses sync workers, you must read this. Summarizing for the impatient, sync workers are resource hogs, so you never want them idling about waiting clients (which might be slow). A common pattern is to have a cheap async “thing” buffering between your sync workers and the wild Internet. As far as “async things” go, nginx is a great choice, and indeed Heroku’s previous stack, Bamboo, used to use it. However, in Cedar it disappeared, with very serious ramifications. Don’t get me wrong: I’m not too happy with Bamboo’s behaviour, either: it’s very possible an app wouldn’t want buffering: sometimes you want to read the request as it comes along (for example, for various Comet techniques), and you’re designed to handle it (mostly by making the app itself your “async thing”). Bottom line, a one-HTTP-layer-fits-all treatment is at best an inconvenience, and possibly something much worse.

Indeed, you could solve this by simply using async workers, but one does not “simply” use async workers (there are serious ramifications which are out of scope for now). Choosing your threading model based on a missing feature in your platform sucks, not to mention if you already have a working and field-proven codebase you’re migrating to this platform. I guess the best approach to solve this would be to have a Routefile at the root of the project describing the routes to reach your app: this URL over buffered HTTP, this URL over raw HTTP, maybe even that port over plain old raw UDP. But even if you don’t have this feature, I’d expect warnings, explanations and best-practices to be sprayed all over the documentation. Alas, the documentation doesn’t help even one bit (on the contrary sometimes; I judge this FAQ answer to be somewhere between misleading and plain wrong). To top it all, it seems that activating Heroku’s SSL support suddenly makes you go through something which smells to me like remnants from Bamboo, with sudden appearance of nginx and buffered requests. Again, all in utter silence from the documentation.

Polyglot programming’s great, but not on Heroku

I hear the term polyglot programming in relation to Heroku all the time. “A true polyglot platform”, they say. I don’t get these claims. Heroku isn’t polyglot. The buildpack mechanism (a good idea in itself) is built in such a way that buildpacks are given a chance to detect if your push matches their language, but the first matching buildpack is used to create your app’s slug, ignoring all other buildpacks and potential languages. So your app really has to be in one language. The workarounds are horrendous, the least-horrific one is done by the official Ruby buildpack which “vendors” node.js into the slug if a certain gem (execjs) is required in the Gemfile. Of course, it’s just a package specific hack, it inflates the slugs needlessly, and since it’s just a hack, similar support isn’t built into the Python buildpack for the PyExecJS Python package. Admittedly, for the latter point, the brilliance of custom buildpacks eases the pain it sure as heck doesn’t cure it.

A thousand cuts: the smaller things

  • I think separating you from your application (i.e., you’re unable to ssh into a running slug) is brilliant and leads to better design overall. But when I worked on XIV’s grid storage product, I fought to keep controlled ssh access to the discrete storage modules open (for us developers, not for customers), and I think Heroku should provide the same. Show me a big warning, let me click through whatever, but then let me bloody attach my strace/pdb/etc to my running instance.
  • I applaud factor X, dev/prod parity, but Heroku and especially its add-ons force me to integrate with backing services with unknown and possibly changing configuration. I think a Vagrant box providing a Cedar-like dev environment with some add-ons could help here. This bit me once as I misused the parsed results of REDISTOGO_URL and ignored the password (it was a bit more complex than that, but I see you yawning). A trivial bug really, but it was time consuming to diagnose, because it worked fine against my password-less development Redis (which now does require a password, thank you very much).
  • Factor III, configuration, says you should store configuration in the environment. But Heroku offers nothing to help you set this environment up during development, and share it across a team of five developers. Sure it’s a small thing to arrange on your own, but together with other small cuts regarding local development, I feel Heroku does too little to accommodate multi-developer teams, period (update: since I first wrote this, @kennethreitz wrote autoenv, which is a good first step in the right direction).
  • Putting everything (your language’s runtime, “vendored” binaries, possibly static assets) in your slug makes it easy to go past the 25mb warning mark, and forces you to be tight fisted about slug size. I understand the cause of this limitation, but if every dyno had more stuff built into it, things could be easier and I wouldn’t be counting mere megabytes of disk space in 2012 (just node, Python, Ruby, some C libraries… don’t think I’m asking for much).
  • I’ve been pulling my hair out wondering why my video processing worker dynos are getting R14 “Memory Quota” exceeded warnings, and only after I instrumented my code to take memory snapshots (ps -ef + cat /proc/meminfo at relevant timings) I got to suspecting that buffer cache is counted against me in my memory quota (why isn’t this documented? why can’t Heroku’s support resolve my ticket asking for explanations regarding my ps -ef outputs for a few weeks now?). We get false R14 positives all the time.
  • git is a terrific DVCS, but a poor deployment transport – see for example above how it complicates things with deploying compiled static files, or how it makes pushing private submodules impossible (at least now there’s some labs support for submodules).

There’s more, of course, but this post is already insanely long and I feel I got the main things off my chest. I humbly argue that some of these issues actually have a very material impact on Heroku app development/deployment while being not so hard for Heroku to fix, or at least make much more bearable or even just thoroughly documented. You could say many (all?) these things aren’t so bad if you’re a single developer developing a very early prototype and expecting little load, but I think these issues quickly become much worse in a complex professional project with multiple developers and even modest-medium load, which is who I imagine is Heroku’s prime target audience, rather than some Jekyll blog that gets 50 hits per day.

I’d like to close this post with repeating that I like Heroku, and at our current size, I definitely like it more than the alternatives (because deployment is such a bitch). They have many awesome things going that I didn’t list here, others wrote enough favorite reviews as it is. But Heroku still has a lot of hard work to do to get their implementation of a 12 Factor App platform on par with the elegance of the theory itself. I get the feeling the “D” stack beyond Cedar is right around the corner; I’ll be so happy to use a better Heroku 12 Factor stack. Or anyone else’s 12 Factor stack, for that matter. I’m an unloyal bastard this way. Challenge accepted, anyone?


1 If you’re not familiar with The 12 Factor App, I wholeheartedly recommend you go read it right now, it’s an important read. If you’ve got your act together with regard to web app deployment you probably do many/all of these things instinctively, but still, the 12 Factor App puts things in terseness and clarity I didn’t see before and I feel insisting on it (almost to the letter) really helped me design a better app.

2 The way I see it, the toughest thing about static assets is the fact that most of them became sneakily dynamic without us watching. Sprites, CoffeeScript, LESS/scss, heck, even .swf if you have to use them (webcams, anyone?) – they’re all compiled assets with a source form and a built form (sometimes more than one; minimized vs unminimized CoffeeScript, for instance). I think this complicates separation of build time vs. run time and minimization of dev/prod parity. I wrote more about this previously.

Advertisement

I wish someone wrote django-static-upstream… maybe even… me!

2011/12/28 § 7 Comments

I used to think serving static files (aka static assets) is really easy: configure nginx to serve a directory and you’re done. But things quickly became more complicated as issues like asset compilation, CDNs/scalability, file-specific custom headers, deployment complexity and development/production parity rear their ugly heads. Judging by the huge number of different asset management packages on djangopackages, it seems like I’m not the only one who ran into this problem and felt not-entirely-satisfied with all the other solutions out there. Things actually took a turn for the worse ever since I started drinking the Heroku cool-aid, and for the live of me I just can’t make sense of their best-practice with regard to serving static files. The heroku-django quickstart conveniently sidesteps the issue of statics, and while there are a few support articles that lightly touch on neighboring subjects, nothing I found was spot-on and hands-on (this is an exception to the rule; the Heroku cool-aid is otherwise very tasty and easy to drink, to my taste so far). Ugh, why can’t there be a silver bullet to solve all this? Let me tell you about my “wishlist” for the best static serving method evar.

First, I want to be able to take any checkout from my VCS, maybe run an easy bootstrap function, and get a working development environment with statics served. In production, I want to serve my statics from a CDN with aggressive caching, so I need some versioning system, but I’d like to minimize deployment complexity and I want fine-grained cache invalidation of my statics. I want my statics to be served the same way (same headers, same versioning mechanism) in development and production without having to update two different locations (i.e., my S3 syncing script and my development nginx configuration). I also don’t want to have to “garbage-collect” my old statics from S3 every so and so days. Like I said, I’d like some of my statics to be served with some bells and whistles, like various custom headers (Access-Control-Allow-Origin, anyone?) or gzip compression. Speaking of bells and whistles, how about a whole marching band, since I want to serve statics that require compilation (minify/concatenate/compile scss+coffe/spritalize/etc), but I don’t want to have to rerun a ‘build process’ every time I touch a coffee script file in development. Finally, and this isn’t something I’m not very adamant about, but I prefer my statics to be served from a different subdomain (static, not www), I think it’s cleaner, I don’t need my clients’ cookies with every static request and it allows for some tricks (like using a CDN with support for a custom origin).

And nothing I found does all that, definitely not easily. In my dream, there’s a package called django-static-upstream, which is designed to provide a holistic approach to all these issues. I’m thinking:

  • a pure Python/django static HTTP server (probably just django.views.static with support for the bells and whistles as mentioned above), and yeah, I think I should bloody use this server as a backend to serve my in production
  • a “vhost” middleware that will replace request.urlconf based on the Host: header; if the host starts with some prefix (say, static), the request will be served by the webserver above
  • a couple of template tags like {% static "images/logo.png" %} that will create content-hashed links to the static webserver (i.e., //static.example.com/829dd67168a3/images/logo.png); the static server will know to ignore the content-hash bit
  • this isn’t really up to the package, but it should be built to support easily setting a custom-origin supporting CDN (like CloudFront) as the origin URL; this is both to serve the statics from nearby edges CDN but also (maybe more importantly) to serve as a caching reverse proxy so the dynamic server will be fairly idle
  • support for compiling some static types on the fly (coffeescript, scss, etc) and returning the rendered result; the result may have to be cached using django’s cache (keyd by a content hash), but this is more to speed up multi-browser development where there is no CDN to serve as a reverse caching proxy than because I worry about production

So now I’m thinking maybe I should write something like this. There are two reasons I’m blogging about a package before I even wrote it; first, since I wanted to flesh out in my mind what is it that I want from it. But second, and more importantly, because I’d like to tread carefully (1) before I have the hubris to start yet another assets related django package, (2) before I start serving static content with a dynamic language (what am I, mad?) and (3) because compiling static assets on runtime in violation of the fifth factor in Adam Wiggins’ twelve-factor app manifesto (what, you didn’t read it yet? what’s wrong with you?). These are quite a few warning signs to cross, and I’d like to get some feedback before I go there. But I honestly think I’ll be happier if I had a package to do all this, I don’t think writing it should be so hard and I hope you’d be happy using it, too.

The ball’s in your court, comment away.

Walking Python objects recursively

2011/12/11 § 6 Comments

Here’s a small function that walks over any* Python object and yields the objects contained within (if any) along with the path to reach them. I wrote it and am using it to validate a deserialized datastructure, but you can probably use it for many things. In fact, I’m rather surprised I didn’t find something like this on the web already, and perhaps it should go in itertools.

Edit: Since the original post I added infinite recursion protection following Eli and Greg’s good advice, added Python 3 compatibility and did some refactoring (which means I had to add proper unit test). You will always be able to get the latest version here, on ActiveState’s Python Cookbook (at least until it makes its way into stdlib, fingers crossed…).

from collections import Mapping, Set, Sequence

# dual python 2/3 compatability, inspired by the "six" library
string_types = (str, unicode) if str is bytes else (str, bytes)
iteritems = lambda mapping: getattr(mapping, 'iteritems', mapping.items)()

def objwalk(obj, path=(), memo=None):
    if memo is None:
        memo = set()
    iterator = None
    if isinstance(obj, Mapping):
        iterator = iteritems
    elif isinstance(obj, (Sequence, Set)) and not isinstance(obj, string_types):
        iterator = enumerate
    if iterator: 
        if id(obj) not in memo:
            memo.add(id(obj))
            for path_component, value in iterator(obj):
                for result in objwalk(value, path + (path_component,), memo):
                    yield result
            memo.remove(id(obj))
    else:       
        yield path, obj

And here’s a little bit of sample usage:

>>> tuple(objwalk(True))
(((), True),)
>>> tuple(objwalk({}))
()
>>> tuple(objwalk([1,2,3]))
(((0,), 1), ((1,), 2), ((2,), 3))
>>> tuple(objwalk({"http": {"port": 80, "interface": "0.0.0.0"}}))
((('http', 'interface'), '0.0.0.0'), (('http', 'port'), 80))
>>> 

"any" is a strong word and Python is flexible language; I wrote this function to work with container objects that respect the ABCs in the collections module, which mostly cover the usual builtin types and their subclasses. If there’s something significant I missed, I’d be happy to hear about it.

enqueue: CLI utility to queue command execution

2011/11/19 § 3 Comments

Update: As you can see in the comments below, and as I feared, it turned out that Lluís Batlle i Rossell already implemented something much like Enqueue, only better in many regards. I doubt I’ll keep maintaining enqueue, there’s no reason to. Oh well, it was a nice afternoon project.

Something that always bugged me with my shell workflow is the problem of queueing commands to run one after the other, while adding commands to the queue as they previous commands are being executed.

Take, for example, a simple usecase: we want to move three large files from diskA to diskB. The problem is that you don’t know the name of the files in advance, perhaps because you’re renaming them manually and it takes you time to type, or because you’re hunting for them in the directory tree, or whatever. Here are some solutions to this:

  • Start one command in the background, then do something like fg ; second-command. Then prepare the third command, but only run it after you saw the second finished. Meh. Or,
  • Just let the jobs run concurrently in the background as you run them (using the & control operator). But since each command is maxing out a resource (CPU, disk, etc), this becomes woefully inefficient really quickly. Or,
  • Use a mad concoction of Ctrl-Z, fg/bg, wait n or (if you left the shell and want to add something to the queue) use a madder concoction of while pgrep -f 'mv /disk/A/foo' > /dev/null; do ... (I’ll leave it as an exercise to the frustrated reader to finish that little one liner). But then again, you could also spend that time getting a paper-cut at the edge of your nostril, and it would probably be just as much fun and maybe even less error prone. Or,
  • Start a shell process reading from a named pipe (mkfifo(1)), and write the commands to the named pipe (credit to my friend and colleague m0she for this sneaky idea). In practice, I found it unwieldy at best, and impossible to extend with bells-and-whistle features if you need them, first and foremost, easily listing the queue and your position in it.

I reckon you could think of a few more ways, but I doubt (and hope! :) none would be more convenient than to simply use Enqueue, a simple and hopefully lightweight Python/twisted command line queuer (written today by yours truly). Usage looks a bit like so:

$ alias nq=enqueue
$ nq add mv /disk1/file1 /disk2/file1
$ nq add mv /disk1/file2 /disk2/file2
$ nq add beep
$ nq list
* mv /disk1/file1 /disk2/file1
  mv /disk1/file2 /disk2/file2
  beep
$

Nice and easy. Enqueue is still a bit rough around the edges and not very feature rich, but it does the job for me and I hope you’d like it too. Queues are managed by a twisted daemon that talks to the CLI client over UNIX domain sockets, and the whole thing fits in about 300 lines of Python. Feel free to open issues/send pull requests on GitHub if you find bugs or want to suggest something, I’ll try to keep up. Promise.


p.s.: Why is it that every time I dabble in Python packaging I end up (a) horribly frustrated and (b) feeling the result is awfully inadequate? Yes, I know I could probably package enqueue better, I know packaging isn’t Python’s strongest side and I know the future is better than the present, but the present sucks and I just had to say this. Yeah, I also walk around in the summertime sayin’ “how about this heat”.

nginx+gzip module might silently corrupt data upon backend failure

2011/11/04 § 3 Comments

There are several elements that make absolutely certain the page you’re reading in your browser is an accurate representation of the resource the HTTP server meant to send you1. Disregarding caching for a minute, we have two elements making sure the representation you get is protected from errors. The first protecting element is, of course, TCP, making sure that if the server wrote two-hundred bytes in a particular order, either they’ll all arrive to your end (in order and without errors) or your TCP stack will realize something bad happened and give your user-agent (your browser) a chance to cope with the error. The need for the second protecting element is a bit more sneaky: TCP will guarantee everything the server wrote will arrive, i.e., bytes for which the server called write(2) or equivalent will arrive (or you’ll know something went wrong). But what about bytes the server should have written but didn’t write all – for example, because some component on the server’s side failed?

The original HTTP (HTTP 0.9, 1996 time) didn’t cope with this situation at all. The signal to the client that the server finished talking was to disconnect the TCP session, which, from the client’s side, is a vague signal. Did the TCP server disconnect because it finished or because it ran into trouble (software fault, sysadmin action, kernel behaviour due to memory pressure or even a bug, etc)? Thankfully, current HTTP kicks in to complement TCP, allowing the server to do one of several things in order to make sure you’ll at least know you didn’t receive the whole picture. By far the two most common thing the server will do are to specify a Content-Length in the response’s header or to use a Transfer-Encoding, most probably chunked transfer encoding.

Content length is simple to grasp. The server wishes to say 200 bytes. It explicitly says: “I will say 200 bytes” in the response header. If the user-agent didn’t receive 200 bytes of response, it knows something went wrong. Chunked transfer encoding is only slightly more complex – the server will send the response in chunks, each chunks prefixed by the length of the chunk. The end of the document is marked by a zero-length chunk. So if the user-agent saw a chunk cut in the middle, or didn’t receive a zero-length chunk, it also knows something went wrong and has a chance to decide what to do about it. For example, when faced with incorrect content length, Chrome displays an ERR_CONNECTION_CLOSED error, whereas Firefox would display the portion of the page it did receive. Different behaviour, yes, but at least both user-agents in this example had a chance realize the response they received is partial. Which is really, really important, you know why? I’ll tell you why.

Enter caching. HTTP caching is a non-trivial matter with many unexpected gotchas and pitfalls, and I can’t cover it all here (why the complexity? I think it’s because all caching is an intentional form of data/state repetition, and repetition is something that in my experience humans often have difficulty reasoning about). By far the best document I know about HTTP caching is this splendid guide, but if you’re in a hurry or impatient, let me summarize the points interesting for this particular post. First, caches might exist in many places, some of them might be surprising, some of them might be slightly broken or at least very aggressive (ISP transparent caches, mutter mutter cough cough). Second, among many other things, HTTP caching lets a server give a client a token together resource, telling the client “next time you request this resource, tell me you have this token; maybe I’ll just tell you that the representation you got with this token is still fresh, without transferring it all over again”. This is called an ETag, and the response that says “just use what you have in your cache” is called HTTP 304 NOT MODIFIED.

How is this relevant to HTTP responses cut in the middle? Well, if servers didn’t have a way of telling the user-agent how long is the document, and if the response was cut in the middle due to a server fault as described above, the user-agent/sneaky-caching-proxy might cache incorrect responses. If the server also sends an ETag along with the response, the caching entity will store this ETag along with the invalid cached representation, and even when it’s time to check the representation’s freshness with the origin server, the server will just take a look at the ETag, say “yep, this is fine”, tell the cache to keep using the bad representation and <sinister>never ever let it recover</sinister>. If this happens on a large ISP’s transparent cache, easily tens of thousand of your users could be affected. If the affected resource is a common element in many of your pages with strict syntax checking, like a javascript resource, you’re kinda screwed. The only hope in such a condition is that the client, for some reason, will specify Cache-Control: no-cache in the request, and that caching entities along the path to the server will honour this request. Browsers like caching, so they won’t usually request no-cache, although AFAIK, recently Chrome started sending no-cache when the user explicitly requests a force-reload (Cmd-R on a Mac). Other browsers don’t fare as well, and I think that hoping one of your Chrome users will force reload the bad resource in time to save the day is hardly a sturdy solution.

Bottom line is, it’s really important to know when a representation of a resource is broken. Which is why I was quite amazed to learn that my HTTP server of choice, nginx, doesn’t validate the Content-Length it receives from its upstreams and is simply unaware when the response it received from an upstream server is chopped off. If your response specifies a content length but closes the connection without delivering enough bytes, nginx will simply stall the request for a long time without closing the connection downstream, even though it has no hope of receiving additional data to push downstream. I tried this both with proxy_pass and uwsgi_pass, but I’m quite confident it’s true for other backends (fastcgi_pass, scgi_pass, etc). This is bad, but not as bad as the case where you want an nginx module to manipulate your content, removing existing content length/transfer encoding and applying its own (the gzip module indeed does that). If a backend error occurs while content-length-oblivious-nginx is altering the data, the content altering module will apply what it applies to the bytes it received, add new content-length/transfer-encoding, assuring everyone the response is OK, and entice user-agents or even proxies to enter the almost-never-recover bad cache scenario I described in the previews paragraph. Ouch!

The proper way to fix this, IMHO, is that nginx simply must start looking at the upstream’s content length (or transfer encoding, once nginx starts using chunked responses with its upstreams). Part of the reason I’m writing this post is that Maxim Dounin, venerable nginx comitter and an OK chap overall, told me he doesn’t consider this a top priority at the moment, but I humbly disagree with his assessment of how serious the issue is. Until such a time as nginx is fixed about this, I think you must disable all content-manipulating nginx modules and instead handle all message length affecting work in your upstream (compression, addition, etc). This is what I opted to do with my django based web app, I replaced nginx’s gzip module with Django’s GZipMiddleware. It’s a terrible shame though. It’s doing the job of nginx for it, probably in a lesser fashion than how nginx could, it violates a must not clause in Python’s WSGI PEP333, and I have empiric proof that Tim Berners-Lee chokes a kitten every time you do it.

But what’s the alternative? Risk invisibly cached corrupt data for an undetermined length of time? Ditch nginx, which I think is the best HTTP server on this planet despite this debacle? Nah. Both are unacceptable.


1 This post assumes convenient values of “absolutely certain”; also, everything related to security/content tampering is out of scope in this post. I’m talking about possibly misbehaving but certainly well-meaning components.

zsh and virtualenv

2010/10/14 § 8 Comments

A week ago or so I finally got off my arse and did the pragmatic programmer thing, setting aside those measly ten minutes to check out virtualenv (well, I also checked out buildout, but I won’t discuss it in this post). I knew pretty much what to expect, but I wanted to get my hands dirty with them so I could see what I assumed I’ve been missing out on for so long (and indeed I promptly kicked myself for not doing it sooner, yada yada, you probably know the drill about well-known-must-know-techniques-and-tools-that-somehow-you-don’t-know). Much as I liked virtualenv, there were two things I didn’t like about environment activation in virtualenv. First, I found typing ‘source bin/activate’ (or similar) cumbersome, I wanted something short and snazzy that worked regardless of where inside the virtualenv I am so long as I’m somewhere in it (it makes sense to me to say that I’m ‘in’ a virtualenv when my current working directory is somewhere under the virtualenv’s directory). Note that being “in” a virtualenv isn’t the same as activating it; you can change directory from virtualenv foo to virtualenv bar, and virtualenv foo will remain active. Indeed, this was the second problem I had: I kept forgetting to activate my virtualenv as I started using it or to deactivate the old one as I switched from one to another.

zsh to the rescue. You may recall that I already mentioned the tinkering I’ve done to make it easier to remember my current DVCS branch. Briefly, I have a function called _rprompt_dvcs which is evaluated whenever zsh displays my prompt and if I’m in a git/Mercurial repository it sets my right prompt to the name of the current branch in blue/green. You may also recall that while I use git itself to tell me if I’m in a git repository at all and which branch I’m at (using git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'), I had to resort to a small C program (fast_hg_root) in order to decide whether I’m in a Mercurial repository or not and then I manually parse the branch with cat. As I said in the previous post about hg and prompt, I’m not into giving hg grief about speed vs. git, but when it comes to the prompt things are different.

With this background in mind, I was perfectly armed to solve my woes with virtualenv. First, I changed fast_hg_root to be slightly more generic and search for a user-specified “magic” filename upwards from the current working directory (I called the outcome walkup, it’s really simple and nothing to write home about…). For example, to mimic fast_hg_root with walkup, you’d run it like so: $ walkup .hg. Using $ walkup bin/activate to find my current virtualenv (if any at all), I could easily add the following function to my zsh environment:

act () {
        if [ -n "$1" ]
        then
                if [ ! -d "$1" ]
                then
                        echo "act: $1 no such directory"
                        return 1
                fi
                if [ ! -e "$1/bin/activate" ]
                then
                        echo "act: $1 is not a virtualenv"
                        return 1
                fi
                if which deactivate > /dev/null
                then
                        deactivate
                fi
                cd "$1"
                source bin/activate
        else
                virtualenv="$(walkup bin/activate)" 
                if [ $? -eq 1 ]
                then
                        echo "act: not in a virtualenv"
                        return 1
                fi
                source "$virtualenv"/bin/activate
        fi
}

Now I can type $ act anywhere I want in a virtualenv, and that virtualenv will become active; this saves figuring out the path to bin/activate and ending up typing something ugly like $ source ../../bin/activate. If you want something that can work for you without a special binary on your host, there’s also a pure-shell version of the same function in the collapsed snippet below.

function act() {
    if [ -n "$1" ]; then
        if [ ! -d "$1" ]; then
            echo "act: $1 no such directory"
            return 1
        fi
        if [ ! -e "$1/bin/activate" ]; then
            echo "act: $1 is not a virtualenv"
            return 1
        fi

        if which deactivate > /dev/null; then
            deactivate
        fi
        cd "$1"
        source bin/activate
    else
        stored_dir="$(pwd)"
        while [ ! -f bin/activate ]; do
            if [ $(pwd) = / ]; then
                echo "act: not in a virtualenv"
                cd "$stored_dir"
                return 1
            fi
            cd ..
        done
        source bin/activate
        cd "$stored_dir"
    fi
}

This was nice, but only solved half the problem: I still kept forgetting to activate the virtualenv, or moving out of a virtualenv and forgetting that I left it activated (this can cause lots of confusion, for example, if you’re simultaneously trying out this, this, this or that django-facebook integration modules, more than one of them thinks that facebook is a good idea for a namespace to take!). To remind me, I wanted my left prompt to reflect my virtualenv in the following manner (much like my right prompt reflects my current git/hg branch if any):

  1. If I’m not in a virtualenv and no virtualenv is active, do nothing.
  2. If I’m in a virtualenv and it is not active, display its name as part of the prompt in white.
  3. If I’m in a virtualenv and it is active, display its name as part of the prompt in green.
  4. If I’m not in a virtualenv but some virtualenv is active, display its name in yellow.
  5. Finally, if I’m in one virtualenv but another virtualenv is active, display both their names in red.

So, using walkup, I wrote the virtualenv parsing functions:

function active_virtualenv() {
    if [ -z "$VIRTUAL_ENV" ]; then
        # not in a virtualenv
        return
    fi

    basename "$VIRTUAL_ENV"
}

function enclosing_virtualenv() {
    if ! which walkup > /dev/null; then
        return
    fi
    virtualenv="$(walkup bin/activate)"
    if [ -z "$virtualenv" ]; then
        # not in a virtualenv
        return
    fi

    basename $(grep VIRTUAL_ENV= "$virtualenv"/bin/activate | sed -E 's/VIRTUAL_ENV="(.*)"$/\1/')
}

All that remained was to change my lprompt function to look like so (remember I have setopt prompt_subst on):

function _lprompt_env {
    local active="$(active_virtualenv)"
    local enclosing="$(enclosing_virtualenv)"
    if [ -z "$active" -a -z "$enclosing" ]; then
        # no active virtual env, no enclosing virtualenv, just leave
        return
    fi
    if [ -z "$active" ]; then
        local color=white
        local text="$enclosing"
    else
        if [ -z "$enclosing" ]; then
            local color=yellow
            local text="$active"
        elif [ "$enclosing" = "$active" ]; then
            local color=green
            local text="$active"
        else
            local color=red
            local text="$active":"$enclosing"
        fi
    fi
    local result="%{$fg[$color]%}${text}$rst "
    echo -n $result
}

function lprompt {
    local col1 col2 ch1 ch2
    col1="%{%b$fg[$2]%}"
    col2="%{$4$fg[$3]%}"
    ch1=$col1${1[1]}
    ch2=$col1${1[2]}

    local _env='$(_lprompt_env)'

    local col_b col_s
    col_b="%{$fg[green]%}"
    col_s="%{$fg[red]%}"

    PROMPT="\
$bgc$ch1\
$_env\
%{$fg_bold[white]%}%m:\
$bgc$col2%B%1~%b\
$ch2$rst \
$col2%#$rst "
}

A bit lengthy, but not very difficult. I suffered a bit until I figured out that I should escape the result of _lprompt_virtualenv using a percent sign (like so: "%{$fg[$color]%}${text}$rst "), or else the ANSII color escapes are counted for cursor positioning purposes and screw up the prompt’s alignment. Meh. Also, remember to set VIRTUAL_ENV_DISABLE_PROMPT=True somewhere, so virtualenv’s simple/default prompt manipulation functionality won’t kick in and screw things up for you, and we’re good to go.

The result looks like so (I still don’t know how to do a terminal-“screenshot”-to-html, here’s a crummy png):

Voila! Feel free to use these snippets, and happy zshelling!

Python’s Innards: Hello, ceval.c!

2010/09/02 § 5 Comments

The “Python’s Innards” series owes its existence, at least in part, to hearing one of the Python-Fu masters in my previous workplace say something about a switch statement so large that it was needed to break it up just so some compilers won’t choke on it. I remember thinking then: “Choke the compiler with a switch? Hrmf, let me see that code.” Turns out that this switch can be found in ./Python/ceval.c: PyEval_EvalFrameEx and it switches over the current opcode, invoking its implementation. If I had to summarize all of CPython into one line, I’d probably choose that switch (actually I’d refuse, but humour me by assuming I was at gunpoint or something). This choice is rather subjective, as arguably there are more complex/interesting bits in Python’s object system (explored here and there) or parser/compiler related code. But I can’t help seeing that line, and its surrounding function and file, as the ‘do-work’ heart of CPython.

The reason I didn’t start the series from this heart is that I thought it would be too hard (mostly for the author…). Thanks to what we (well, at least I) learned in the previous posts, I think we can now understand it quite well. I’ll try to link backwards as necessary throughout the article, but if you haven’t followed the series so far, you’d probably do much better if you went back and read some of the previous articles before tackling this one. Also, for brevity’s sake in this post, I won’t qualify the file ./Python/ceval.c and the function PyEval_EvalFrameEx in it. Finally, remember that usually in the series when I quote code, I may note that I edited it, and in that case I often prefer clarity and brevity over accuracy; this is true for this post as well, only much more so, excerpts here might bear only slight resemblance to the real code.

So, where were we… Ah, yes, monstrous switch statement. Well, as I said, this switch can be found in the rather lengthy file ceval.c, in the rather lengthy function PyEval_EvalFrameEx, which takes more than half the file’s lines (it’s roughly 2,250 lines, the file is about 4,400). PyEval_EvalFrameEx implements CPython’s evaluation loop, which is to say that it’s a function that takes a frame object and iterates over each of the opcodes in its associated code object, evaluating (interpreting, executing) each opcode within the context of the given frame (this context is chiefly the associated namespaces and interpreter/thread states). There’s more to ceval.c than PyEval_EvalFrameEx, and we may discuss some of the other bits later in this post (or perhaps a follow-up post), but PyEval_EvalFrameEx is obviously the most important part of it.

Having described the evaluation loop in the previous paragraph, let’s see what it looks like in C (edited):

PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
    /* variable declaration and initialization stuff */
    for (;;) {
        /* do periodic housekeeping once in a few opcodes */
        opcode = NEXTOP();
        if (HAS_ARG(opcode)) oparg = NEXTARG();
        switch (opcode) {
            case NOP:
                goto fast_next_opcode;
            /* lots of more complex opcode implementations */
            default:
                /* become rather unhappy */
        }
        /* handle exceptions or runtime errors, if any */
    }
    /* we are finished, pop the frame stack */
    tstate->frame = f->f_back;
    return retval;
}

As you can see, iteration over opcodes is infinite (forever: fetch next opcode, do stuff), breaking out of the loop must be done explicitly. CPython (reasonably) assumes that evaluated bytecode is correct in the sense that it terminates itself by raising an exception, returning a value, etc. Indeed, if you were to synthesize a code object without a RETURN_VALUE at its end and execute it (exercise to reader: how?1), you’re likely to execute rubbish, reach the default handler (raises a SystemError) or maybe even segfault the interpreter (I didn’t check this thoroughly, but it looks plausible).

The evaluation loop may look fairly simple so far, but I kept back an important piece: I snipped about 1,450 lines of opcode implementations from within that big switch, all of them presumably more complex than a NOP. In order for you to be able to get a feel for what more serious opcode implementations look like, here’s the (edited) implementation of three more opcodes, illustrating a few more principles:

            case BINARY_SUBTRACT:
                w = *--stack_pointer; /* value stack POP */
                v = stack_pointer[-1];
                x = PyNumber_Subtract(v, w);
                stack_pointer[-1] = x; /* value stack SET_TOP */
                if (x != NULL) continue;
                break;
            case LOAD_CONST:
                x = PyTuple_GetItem(f->f_code->co_consts, oparg);
                *stack_pointer++ = x; /* value stack PUSH */
                goto fast_next_opcode;
            case SETUP_LOOP:
            case SETUP_EXCEPT:
            case SETUP_FINALLY:
                PyFrame_BlockSetup(f, opcode, INSTR_OFFSET() + oparg,
                           STACK_LEVEL());
                continue;

We see several things. First, we see a typical value manipulation opcode, BINARY_SUBTRACT. This opcode (and many others) works with values on the value stack as well as with a few temporary variables, using CPython’s C-API abstract object layer (in our case, a function from the number-like object abstraction) to replace the two top values on the value stack with the single value resulting from subtraction. As you can see, a small set of temporary variables, such as v, w and x are used (and reused, and reused…) as the registers of the CPython VM. The variable stack_pointer represents the current bottom of the stack (the next free pointer in the stack). This variable is initialized at the beginning of the function like so: stack_pointer = f->f_stacktop;. In essence, together with the room reserved in the frame object for that purpose, the value stack is this pointer. To make things simpler and more readable, the real (unedited by me) code of ceval.c defines several value stack manipulation/observation macros, like PUSH, TOP or EMPTY. They do what you imagine from their names.

Next, we see a very simple opcode that loads values from somewhere into the valuestack. I chose to quote LOAD_CONST because it’s very brief and simple, although it’s not really a namespace related opcode. “Real” namespace opcodes load values into the value stack from a namespace and store values from the value stack into a namespace; LOAD_CONST loads constants, but doesn’t fetch them from a namespace and has no STORE_CONST counterpart (we explored all this at length in the article about namespaces). The final opcode I chose to show is actually the single implementation of several different control-flow related opcodes (SETUP_LOOP, SETUP_EXCEPT and SETUP_FINALLY), which offload all details of their implementation to the block stack manipulation function PyFrame_BlockSetup; we discussed the block stack in our discussion of interpreter stacks.

Something we can observe looking at these implementations is that different opcodes exit the switch statement differently. Some simply break, and let the code after the switch resume. Some use continue to start the for loop from the beginning. Some goto various labels in the function. Each exit has different semantic meaning. If you break out of the switch (the ‘normal’ route), various checks will be made to see if some special behaviour should be performed – maybe a code block has ended, maybe an exception was raised, maybe we’re ready to return a value. Continuing the loop or going to a label lets certain opcodes take various shortcuts; no use checking for an exception after a NOP or a LOAD_CONST, for instance.

That’s pretty much it. I can’t really say we’re done (not at all), but this is pretty much the gist of PyEval_EvalFrameEx. Simple, eh? Well, yeah, simple, but I lied a bit with the editing to make it simpler. For example, if you look at the code itself, you will see that none of the case expressions for the big switch are really there. The code for the NOP opcode is actually (remember this series is about Python 3.x unless noted otherwise, so this snippet is from Python 3.1.2):

        TARGET(NOP)
            FAST_DISPATCH();

TARGET? FAST_DISPATCH? What are these? Let me explain. Things may become clearer if we’d look for a moment at the implementation of the NOP opcode in ceval.c of Python 2.x. Over there the code for NOP looks more like the samples I’ve shown you so far, and it actually seems to me that the code of ceval.c gets simpler and simpler as we look backwards at older revisions of it. The reason is that although I think PyEval_EvalFrameEx was originally written as a really exceptionally straightforward piece of code, over the years some necessary complexity crept into it as various optimizations and improvements were implemented (I’ll collectively call them ‘additions’ from now on, for lack of a better term).

To further complicate matters, many of these additions are compiled conditionally with preprocessor directives, so several things are implemented in more than one way in the same source file. In the larger code samples I quoted above, I liberally expanded some preprocessor directives using their least complex expansion. However, depending on compilation flags, these and other preprocessor directives might expand to something else, possibly a more complicated something. I can understand trading simplicity to optimize a tight loop which is used very often, and the evaluation loop is probably one of the more used loops in CPython (and probably as tight as its contributors could make it). So while this is all very warranted, it doesn’t help the readability of the code.

Anyway, I’d like to enumerate these additions here explicitly (some in more depth than others); this should aid future discussion of ceval.c, as well as prevent me from feeling like I’m hiding too many important things with my free spirited editing of quoted code. Fortunately, most if not all these additions are very well commented -actually, some of the explanations below will be just summaries or even taken verbatim from these comments, as I believe that they’re accurate (eek!). So, as you read PyEval_EvalFrameEx (and indeed ceval.c in general), you’re likely to run into any of these:

“Threaded Code” (Computed-GOTOs)

Let’s start with the addition that gave us TARGET, FAST_DISPATCH and a few other macros. The evaluation loop uses a “switch” statement, which decent compilers optimize as a single indirect branch instruction with a lookup table of addresses. Alas, since we’re switching over rapidly changing opcodes (it’s uncommon to have the same opcode repeat), this would have an adverse effect on the success rate of CPU branch prediction. Fortunately gcc supports the use of C-goto labels as values, which you can generally pass around and place in an array (restrictions apply!). Using an array of adresses in memory obtained from labels, as you can see in ./Python/opcode_targets.h, we create an explicit jump table and place an explicit indirect jump instruction at the end of each opcode. This improves the success rate of CPU prediction and can yield as much as 20% boost in performance.

Thus, for example, the NOP opcode is implemented in the code like so:

        TARGET(NOP)
            FAST_DISPATCH();

In the simpler scenario, this would expand to a plain case statement and a goto, like so:

        case NOP:
            goto fast_next_opcode;

But when threaded code is in use, that snippet would expand to (I highlighted the lines where we actually move on to the next opcode, using the dispatch table of label-values):

        TARGET_NOP:
            opcode = NOP;
            if (HAS_ARG(NOP))
                oparg = NEXTARG();
        case NOP:
            {
                if (!_Py_TracingPossible) {
                    f->f_lasti = INSTR_OFFSET();
                    goto *opcode_targets[*next_instr++];
                }
                goto fast_next_opcode;
            }

Same behaviour, somewhat more complicated implementation, up to 20% faster Python. Nifty.

Opcode Prediction

Some opcodes tend to come in pairs. For example, COMPARE_OP is often followed by JUMP_IF_FALSE or JUMP_IF_TRUE, themselves often followed by a POP_TOP. What’s more, there are situations where you can determine that a particular next-opcode can be run immediately after the execution of the current opcode, without going through the ‘outer’ (and expensive) parts of the evaluation loop. PREDICT (and a few others) are a set of macros that explicitly peek at the next opcode and jump to it if possible, shortcutting most of the loop in this fashion (i.e., if (*next_instr == op) goto PRED_##op). Note that there is no relation to real hardware here, these are simply hardcoded conditional jumps, not an exploitation of some mechanism in the underlying CPU (in particular, it has nothing to do with “Threaded Code” described above).

Low Level Tracing

An addition primarily geared towards those developing CPython (or suffering from a horrible, horrible bug). Low Level Tracing is controlled by the LLTRACE preprocessor name, which is enabled by default on debug builds of CPython (see --with-pydebug). As explained in ./Misc/SpecialBuilds.txt: when this feature is compiled-in, PyEval_EvalFrameEx checks the frame’s global namespace for the variable __lltrace__. If such a variable is found, mounds of information about what the interpreter is doing are sprayed to stdout, such as every opcode and opcode argument and values pushed onto and popped off the value stack. Not useful very often, but very useful when needed.

This is the what the low level trace output looks like (slightly edited):

>>> def f():
...     global a
...     return a - 5
... 
>>> dis(f)
  3           0 LOAD_GLOBAL              0 (a) 
              3 LOAD_CONST               1 (5) 
              6 BINARY_SUBTRACT      
              7 RETURN_VALUE         
>>> exec(f.__code__, {'__lltrace__': 'foo', 'a': 10})
0: 116, 0
push 10
3: 100, 1
push 5
6: 24
pop 5
7: 83
pop 5
# trace of the end of exec() removed
>>> 

As you can guess, you’re seeing a real-time disassembly of what’s going through the VM as well as stack operations. For example, the first line says: line 0, do opcode 116 (LOAD_GLOBAL) with the operand 0 (expands to the global variable a), and so on, and so forth. This is a bit like (well, little more than) adding a bunch of printf calls to the heart of VM.

Advanced Profiling

Under this heading I’d like to briefly discuss several profiling related additions. The first relies on the fact that some processors (notably Pentium descendants and at least some PowerPCs) have built-in wall time measurement capabilities which are cheap and precise (correct me if I’m wrong). As an aid in the development of a high-performance CPython implementation, Python 2.4’s ceval.c was instrumented with the ability to collect per-opcode profiling statistics using these counters. This instrumentation is controlled by the somewhat misnamed --with-tsc configuration flag (TSC is an Intel Pentium specific name, and this feature is more general than that). Calling sys.settscdump(True) on an instrumented interpreter will cause the function ./Python/ceval.c: dump_tsc to print these statistics every time the evaluation loop loops.

The second advanced profiling feature is Dynamic Execution Profiling. This is only available if Python was built with the DYNAMIC_EXECUTION_PROFILE preprocessor name. As ./Tools/scripts/analyze_dxp.py says, [this] will tell you which opcodes have been executed most frequently in the current process, and, if Python was also built with -DDXPAIRS, will tell you which instruction _pairs_ were executed most frequently, which may help in choosing new instructions. One last thing to add here is that enabling Dynamic Execution Profiling implicitly disables the “Threaded Code” addition.

The third and last addition in this category is function call profiling, controlled by the preprocessor name CALL_PROFILE. Quoting ./Misc/SpecialBuilds.txt again: When this name is defined, the ceval mainloop and helper functions count the number of function calls made. It keeps detailed statistics about what kind of object was called and whether the call hit any of the special fast paths in the code.

Extra Safety Valves

Two preprocessor names, USE_STACKCHECK and CHECKEXC include extra assertions. Testing an interpreter with these enabled may catch a subtle bug or regression, but they are usually disabled as they’re too expensive.

These are the additions I found, grepping ceval.c for #ifdef. I think we’ll call it a day here, although we’re by no means finished. For example, I’d like to devote a separate post to exceptions, which is where we can discuss the tail of the evaluation loop (everything after the big switch and before the end of the big for), which we merely skimmed today. I’d also like to devote a whole post to locking and synchronization (including the GIL), which we touched upon before but never covered properly. Last but really not least, there’s about 2,000 other lines in ceval.c which we didn’t cover today; none of them are as important as PyEval_EvalFrameEx, but we need to talk at least about some of them.

All these things taken into account, I think we can say that today we finally conquered the evaluation loop. This isn’t the end of the series, far from it, but I do see it as a milestone. “Hooray”, I believe the saying goes. I hope you’re enjoying the show, thanks for the supportive comments (they keep me going), and I’ll see you in the next post.


I would like to thank Nick Coghlan for reviewing this article; any mistakes that slipped through are my own.

1Lazy or timid readers may choose to defer to Nick Coghlan’s example of one way he did it; I urge you not to look there and solve it on your own, it’s rather easy.

Python’s Innards: Interpreter Stacks

2010/07/22 § 4 Comments

Those of you who have been paying attention know that this series is spiraling towards what can be considered the core of Python’s Virtual Machine, the “actually do work function” ./Python/ceval.c: PyEval_EvalFrameEx. The (hopefully) last hurdle on our way there is to understand the three significant stack data structures used for CPython’s code evaluation: the call stack, the value stack and the block stack (I’ve called them collectively “Interpreter Stacks” in the title, this isn’t a formal term). All three stacks are tightly coupled with the frame object, which will also be discussed today. If you give me a minute to put on my spectacles, I’ll read to you what Wikipedia says about call stacks in general: In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program… A call stack is composed of stack frames (…). These are machine dependent data structures containing subroutine state information. Each stack frame corresponds to a call to a subroutine which has not yet terminated with a return. Hrmf. Jim, I don’t understand… how does this translate to a virtual machine?

Well, since CPython implements a virtual machine, its call stack and stack frames are dependant on this virtual machine, not on the physical machine it’s running on. And also, as Python tends to do, this internal implementation detail is exposed to Python code, either via the C-API or pure Python, as frame objects (./Include/frameobject.h: PyFrameObject). We know that code execution in CPython is really the evaluation (interpretation) of a code object, so every frame represents a currently-being-evaluated code object. We’ll see (and already saw before) that frame objects are linked to one another, thus forming a call stack of frames. Finally, inside each frame object in the call stack there’s a reference to two frame-specific stacks (not directly related to the call stack), they are the value stack and the block stack.

The value stack (you may know this term as an ‘evaluation stack’) is where manipulation of objects happens when object-manipulating opcodes are evaluated. We have seen the value stack before on various occasions, like in the introduction and during our discussion of namespaces. Recalling an example we used before, BINARY_SUBTRACT is an opcode that effectively pops the two top objects in the value stack, performs PyNumber_Subtract on them and sets the new top of the value stack to the result. Namespace related opcodes, like LOAD_FAST or STORE_GLOBAL, load values from a namespace to the stack or store values from the stack to a namespace. Each frame has a value stack of its own (this makes sense in several ways, possibly the most prominent is simplicity of implementation), we’ll see later where in the frame object the value stack is stored.

This leaves us with the block stack, a fairly simple concept with some vaguely defined terminology around it, so pay attention. Python has a notion called a code block, which we have discussed in the article about code objects and which is also explained here. Completely unrelatedly, Python also has a notion of compound statements, which are statements that contain other statements (the language reference defines compound statements here). Compound statements consist of one or more clauses, each made of a header and a suite. Even if the terminology wasn’t known to you until now, I expect this is all instinctively clear to you if you have almost any Python experience: for, try and while are a few compound statements.

So where’s the confusion? In various places throughout the code, a block (sometimes “frame block”, sometimes “basic block”) is used as a loose synonym for a clause or a suite, making it easier to confuse suites and clauses with what’s actually a code block or vice versa. Both the compilation code (./Python/compile.c) and the evaluation code (./Python/ceval.c) are aware of various suites and have (ill-named) data structures to deal with them; but since we’re more interested in evaluation in this series, we won’t discuss the compilation-related details much (or at all). Whenever I’ll think wording might get confusing, I’ll mention the formal terms of clause or suite alongside whatever code term we’re discussing.

With all this terminology in mind we can look at what’s contained in a frame object. Looking at the declaration of ./Include/frameobject.h: PyFrameObject, we find (comments were trimmed and edited for your viewing pleasure):

typedef struct _frame {
   PyObject_VAR_HEAD
   struct _frame *f_back;   /* previous frame, or NULL */
   PyCodeObject *f_code;    /* code segment */
   PyObject *f_builtins;    /* builtin symbol table */
   PyObject *f_globals;     /* global symbol table */
   PyObject *f_locals;      /* local symbol table */
   PyObject **f_valuestack; /* points after the last local */
   PyObject **f_stacktop;   /* current top of valuestack */
   PyObject *f_trace;       /* trace function */

   /* used for swapping generator exceptions */
   PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;

   PyThreadState *f_tstate; /* call stack's thread state */
   int f_lasti;             /* last instruction if called */
   int f_lineno;            /* current line # (if tracing) */
   int f_iblock;            /* index in f_blockstack */

   /* for try and loop blocks */
   PyTryBlock f_blockstack[CO_MAXBLOCKS];

   /* dynamically: locals, free vars, cells and valuestack */
   PyObject *f_localsplus[1]; /* dynamic portion */
} PyFrameObject;

We see various fields used to store the state of this invocation of the code object as well as maintain the call stack’s structure. Both in the C-API and in Python these fields are all prefixed by f_, though not all the fields of the C structure PyFrameObject are exposed in the pythonic representation. I hope some of the fields are intuitively clear to you, since these fields relate to many topics we have already covered. We already mentioned the relation between frame and code objects, so the f_code field of every frame points to precisely one code object. Insofar as structure goes, frames point backwards thus that they create a stack (f_back) as well as point “root-wards” in the interpreter state/thread state/call stack structure by pointing to their thread state (f_tstate), as explained here. Finally, since you always execute Python code in the context of three namespaces (as discussed there), frames have the f_builtins, f_globals and f_locals fields to point to these namespaces. These are the fields (I hope) we already know.

Before we dig into the other fields of a frame object, please notice frames are a variable size Python object (they are a PyObject_VAR_HEAD). The reason is that when a frame object is created it should be dynamically allocated to be large enough to contain references (pointers, really) to the locals, cells and free variables used by its code object, as well as the value stack needed by the code objects ‘deepest’ branch. Indeed, the last field of the frame object, f_localsplus (locals plus cells plus free variables plus value stack…) is a dynamic array where all these references are stored. PyFrame_New will show you exactly how the size of this array is computed.

If the previous paragraph doesn’t sit well with you, I suggest you read the descriptions I wrote for co_nlocals, co_cellvars, co_freevars and co_stacksize – during evaluation, all these ‘dead’ parts of the inert code object come to ‘life’ in space allocated at the end of the frame. As we’ll probably see in the next article, when the frame is evaluated, these references at the end of the frame will be used to get (or set) “fast” local variables, free variables and cell variables, as well as to the variables on the value stack (“fast” locals was explained when we discussed namespaces). Looking back at the commented declaration above and given what I said here, I believe you should now understand f_valuestack, f_stacktop and f_localsplus.

We can now look at f_blockstack, keeping in mind the terminology clarification from before. As you can maybe imagine, compound statements sometimes require state to be evaluated. If we’re in a loop, we need to know where to go in case of a break or a continue. If we’re raising an exception, we need to know where is the innermost enclosing handler (the suite of the closest except header, in more formal terms). This state is stored in f_blockstack, a fixed size stack of PyTryBlock structures which keeps the current compound statement state for us (PyTryBlock is not just for try blocks; it has a b_type field to let it handle various types of compound statements’ suites). f_iblock is an offset to the last allocated PyTryBlock in the stack. If we need to bail out of the current “block” (that is, the current clause), we can pop the block stack and find the new offset in the bytecode from which we should resume evaluation in the popped PyTryBlock (look at its b_handler and b_level fields). A somewhat special case is a raised exception which exhausts the block stack without being caught, as you can imagine, in that case a handler will be sought in the block stack of the previous frames on the call stack.

All this should easily click into place now if you read three code snippets. First, look at this disassembly of a for statement (this would look strikingly similar for a try statement):

>>> def f():
...     for c in 'string':
...             my_global_list.append(c)
...
>>> diss(f)
 2           0 SETUP_LOOP              27 (to 30)
             3 LOAD_CONST               1 ('string')
             6 GET_ITER
       >>    7 FOR_ITER                19 (to 29)
            10 STORE_FAST               0 (c)

 3          13 LOAD_GLOBAL              0 (my_global_list)
            16 LOAD_ATTR                1 (append)
            19 LOAD_FAST                0 (c)
            22 CALL_FUNCTION            1
            25 POP_TOP
            26 JUMP_ABSOLUTE            7
       >>   29 POP_BLOCK
       >>   30 LOAD_CONST               0 (None)
            33 RETURN_VALUE
>>>

Next, look at how the opcodes SETUP_LOOP and POP_BLOCK are implemented in ./Python/ceval.c. Notice that SETUP_LOOP and SETUP_EXCEPT or SETUP_FINALLY are rather similar, they all push a block matching the relevant suite unto the block stack, and they all utilize the same POP_BLOCK:

       TARGET_WITH_IMPL(SETUP_LOOP, _setup_finally)
       TARGET_WITH_IMPL(SETUP_EXCEPT, _setup_finally)
       TARGET(SETUP_FINALLY)
       _setup_finally:
           PyFrame_BlockSetup(f, opcode, INSTR_OFFSET() + oparg,
                      STACK_LEVEL());
           DISPATCH();

       TARGET(POP_BLOCK)
           {
               PyTryBlock *b = PyFrame_BlockPop(f);
               UNWIND_BLOCK(b);
           }
           DISPATCH();

Finally, look at the actual implementation of ./Object/frameobject.c: PyFrame_BlockSetup and ./Object/frameobject.c: PyFrame_BlockPop:

void
PyFrame_BlockSetup(PyFrameObject *f, int type, int handler, int level)
{
   PyTryBlock *b;
   if (f->f_iblock >= CO_MAXBLOCKS)
       Py_FatalError("XXX block stack overflow");
   b = &f->f_blockstack[f->f_iblock++];
   b->b_type = type;
   b->b_level = level;
   b->b_handler = handler;
}

PyTryBlock *
PyFrame_BlockPop(PyFrameObject *f)
{
   PyTryBlock *b;
   if (f->f_iblock <= 0)
       Py_FatalError("XXX block stack underflow");
   b = &f->f_blockstack[--f->f_iblock];
   return b;
}

There, now you’re smart. If you keep the terminology straight, f_blockstack turns out to be rather simple, at least in my book.

We’re left with the rather esoteric fields, some simpler, some a bit more arcane. In the ‘simpler’ range we have f_lasti, an integer offset into the bytecode of the last instructions executed (initialized to -1, i.e., we didn’t execute any instruction yet). This index lets us iterate over the opcodes in the bytecode stream. Heading towards the ‘more arcane’ area we see f_trace and f_lineno. f_trace is a pointer to a tracing function (see sys.settrace; think implementation of a tracer or a debugger). f_lineno contains the line number of the line which caused the generation of the current opcode; it is valid only when tracing (otherwise use PyCode_Addr2Line). Last but not least, we have three exception fields (f_exc_type, f_exc_value and f_exc_traceback), which are rather particular to generators so we’ll discuss them when we discuss that beast (there’s a longer comment about these fields in ./Include/frameobject.h if you’re curious right now).

On a parting note, we can mention when frames are created. This happens in ./Objects/frameobject.c: PyFrame_New, usually called from ./Python/ceval.c: PyEval_EvalCodeEx (and ./Python/ceval.c: fast_function, a specialized optimization of PyEval_EvalCodeEx). Frame creation occurs whenever a code object should be evaluated, which is to say when a function is called, when a module is imported (the module’s top-level code is executed), whenever a class is defined, for every discrete command entered in the interactive interpreter, when the builtins eval or exec are used and when the -c switch is used (I didn’t absolutely verify this is a 100% exhaustive list, but it think it’s rather complete).

Looking at the list in the previous paragraph, you probably realized frames are created very often, so two optimizations are implemented to make frame creation fast: first, code objects have a field (co_zombieframe) which allows them to remain associated with a ‘zombie’ (dead, unused) frame object even when they’re not evaluated. If a code object was already evaluated once, chances are it will have a zombie frame ready to be reanimated by PyFrame_New and returned instead of a newly allocated frame (trading some memory to reduce the number of allocations). Second, allocated and entirely unused stack frames are kept in a special free-list (./Objects/frameobject.c: free_list), frames from this list will be used if possible, instead of actually allocating a brand new frame. This is all kindly commented in ./Objects/frameobject.c.

That’s it, I think. Oh, wait: if you’d like to play with frames in your interpreter, take a look at the inspect module, maybe especially this part of it. In gdb, I used a rather crude method to look at the call stack (I dereferenced the global variable interp_head and went on from there). There’s probably a better way, but I didn’t bother looking. Now that’s really it. In fact, I believe at last we covered enough material to analyze ./Python/ceval.c: PyEval_EvalFrameEx. Ladies and Gentlemen, we can read it. We have the technology.

But, alas, we’ll only do it in the next post, and who knows when that will arrive. And until it does, do good, avoid doing bad and keep clearing your mind. Siddhārtha Gautama said that, and I tend to think that if that particular bloke lived today he’d have some serious Python-Fu going for him, so heed his words.


I would like to thank Nick Coghlan for reviewing this article; any mistakes that slipped through are my own.

Python’s Innards: for my wife

2010/07/04 § 1 Comment

The other day the best wife I’ll ever have had trouble sleeping and asked me to tell her something to put her to sleep. Since she’s not quite a hacker, I figured some discussion of what I usually write about may do the trick (okay, maybe ‘not quite a hacker’ is an understatement, she’s an un-hacker-hippie married to a hacker and it actually works out great thanks for asking). I don’t think this article is likely to be actually useful to the usual readers of this blog (unless you have a less-computer-initiated spouse with trouble sleeping), but I hope maybe this particular bedtime story might entertain some of you.

Yaniv: What is my blog about… Let’s start at the beginning. See, a computer is a machine, just a machine, you know, like one with gears and levers and stuff. Like all machines, it’s not smart, but unlike many machines, it’s has lots of pieces and is very very fast. Think about the musical boxes we saw today at the Museum’s souvenir store. They are also machines, but smaller and slower. When you twist the ratchet lever the pin cylinder turns around to make music. But even though the music is great, it isn’t a ‘smart’ machine, it just plays the music it was built to play. A machine with such and such a pin cylinder will play Mozart, and a machine with a different pin cylinder will play Bach. All right?

Inbal: An encouraging nod.

Yaniv: So think about a music box with replacable cylinder. Instead of a cylinder with small pins it could have a… eh… a punched card! A long strip of plastic or cardboard you insert into machine, and this strip has some holes punched in it. When the card is fed into the machine, the music box’s comb’s teeth encounter these holes like the encounter the pins and music is played. Computers are very much like that, only they don’t play music, they punch holes in a fresh card according to the card that was inserted. That’s how you program them, and that’s how they answer. You insert a card with holes arranged in just the right way, and you read the answer encoded as holes in the card the spew out. Not magic, just how the internals of the computer are built, like maybe an abacus controlled with cards. If you restart it ten times and feed it the same card, it will give the same answer each time. It’s a machine.

Inbal: Is that why people like you always tell people like me to restart the computer?

Yaniv: Yes, quite. We want to know exactly what kind of punched cards were fed into the computer from the moment it was last started, so we know what state it is in, so maybe we can actually figure out why it’s behaving like it is. Another encouraging nod, and wide open eyes. My wife shows general interest in my work, but never anything as low-level as this shit. I was wondering how far can we go on this topic. The thing is, you obviously know computers don’t really use punched cards anymore. Many years ago smart people figured out more elegant ways to talk with a computer, so we feed it with keyboards and USB sticks and wifi and we see the output on a screen or a printer. But to the computer, what you type and when you move the mouse and whatever you bring from the Internet or hard drive or USB stick – it all looks another part of an incredibly long series of holes in a gigantic punched card, and what you see on the screen is really just a series of new holes the computer punched as a result of what you fed it so far.

Inbal: Is this what you look at in the black screen with the green letters? The punched card inside the computer?

Yaniv: Uhm, not usually. I don’t need to. If a neurologist gets a head ache, she doesn’t undergo CT, she takes a pill. But even if I hardly ever look into whatever modern computers use to mimic their huge punched cards, I never forget this is how they are really built, and that all the wonderful things a computer can do are really performed by a giant abacus or music box with a series of holes in a long card instead of a cylinder.

Inbal: And that’s what you write about?

Yaniv: No, not yet. See, different computers, like my netbook or your iPhone react differently to cards punched in the same manner. One wouldn’t work with the other, because their internal machinery is different. You can think of it a bit like two music boxes with differently toothed combs. They’d play different music given the same pin cylinder, or, rather, one would play music and the other is likely to make icky noise, not unlike the noise you get when you tune the TV to the wrong channel.

Inbal: I hate that noise. A music box would never make that noise.

Yaniv: If it spun very quickly and had many many different teeth on the comb and had a bad cylinder inserted to it, then it would sound quite like the TV noise. The point is that different computers have different teeth combs and can play different cylinders. Python is a very special such cylinder. Python is a bit like a music box inside a music box. Python, the ‘inner’ music box affects the behaviour of the ‘outer’ music box, which is the computer. Silence. That didn’t go well. Look, you like the different apps I put on your iPhone, right?

Inbal: (frown) Mostly. Ugh. Come on, it was a farting app, I’m just a guy, how was I possibly supposed to resist it?!

Yaniv: Well, imagine an app that displays a music box on the screen and lets you tap on the pin cylinder to make your on music and play it from the iPhone. A music box inside the iPhone. That’s very imaginable, right?

Inbal: Uhm, yeah… that’s a nice idea.

Yaniv: Well, think about the pin cylinder of the music box in that app. If I would write such a program both for the iPhone and for the netbook, it makes sense that I could make it play the same music for the same pin cylinder, even tough it’s running on different computers. Right?

Inbal: Hmm-hmm. Yawn. Yikes, we better cut to the chase.

Yaniv: Well, as I said, and it’s sometimes even hard for me to believe but you have to take my word for it, that the iPhone just a punching machine itself, the music box app is really a punch card I put in the iPhone. So why not make a punching machine app? You could see it on the screen, and punch holes in a visual card, and drag it into the machine with your finger, and see the result coming out on the screen. A punching machine in a punching machine. I’d make a long punched card, this is the punching machine app, and after I feed it I will put in a second punched card, this is already running on the ‘inner’ punching machine, the app. OK?

Inbal: Mostly closed eyes, nod.

Yaniv: Well, so why not make a ‘first’ card for one machine, and a different ‘first’ card to another machine, such that the same ‘second’ card would work the same on both different machines. Just like I could program different music boxes that would behave the same way on different computers, just like Firefox acts the same on my Linux and on your Windows even though they’re different computers. Well, at least sort-of-ish.

Inbal: Long silence. Closed eyes. Ugh, she fell asleep just when I was really having fun. (sleepily) Only if the first card is really really long and the music box had many many teeth. Look, maybe her choice of words wasn’t so eloquent, but she did say a sentence very much to this effect almost in her sleep! I took that as proof she actually got what I was talking about so far.

Yaniv: (triumphant) Yes! Yes exactly! Well not exactly, but unbelievably close. Python is a very long punched card, and it runs on several different punching machines with many many teeth, but it makes all these machines behave the same way.

Inbal: My excitement must’ve woken her up a bit, because she frowned with closed eyes. All computers behave the same way? Isn’t that a bad thing?

Yaniv: For people like me, it can be a great thing! We can write software once, and run it on many different computers. And the second punch card is much easier to write compared to the first punched card we always reuse. Before things like Python, we had to think harder and punch cards differently for different computers. It wasted lots of time, so we couldn’t advance as rapidly. Today, we can do cool things in weeks. I know this is only very roughly true. But her breathing was getting very regular and I really wanted to finish the explanation. This is what I’m writing about. I’m explaining the structure of long long punched card which is Python, how it sets up the music box to be in just a particular way to eat the second punched card, which is a Python program, that’s easier to write and can run on any computer. That’s what’s my series of articles is all about, it lives happily ever after, the end.

Silence. I was wide awake and very happy. It’s so gratifying, to explain your work to your spouse. I think it’s even more so for those of us who have more arcane and less understood jobs. I stroked her dreadlocks (she puts all kinds of seashells and beads in them, it’s really nice) and finally fell asleep myself.

I can’t say her recollection the next day was perfect (You said something about music boxes… no? I remember it was a beautiful story!), but it was still quite a terrific experience for me. I wrote a very rough draft of this post in the morning when I woke up (the whole thing was a few weeks ago). As soon as I started writing it I realized this story is an obvious (and probably inferior) kin to Ryan Tomayko’s incredible “How I explained REST to my wife”. I dropped him a line to tell him I may write this piece, and he actually said some people don’t believe he had such a conversation with his wife.

Poor tossers! This is probably the most hard-core incident to date, but more often than not Inbal amazes me with how much she really knows about computers just from hanging around an aging geek like yours truly and hearing my occasional rants or explanations. Here is a plea to all you astrophysicists, microbiologists and mathematicians out there: explain your work, art and passion to your dancer, carpenter and education worker spouses! It’s fun (and you can blog about it, to boot)!

Python’s Innards: Code Objects

2010/07/03 § 5 Comments

This article, part of a series of articles about Python’s internals, will continue our preparation to engage the machinery of code evaluation by discussing Code Objects. To those of you who just now joined in and didn’t even read the introduction (but why?!), please note an important disclaimer: while the series as a whole is CPython 3.x centric and might not ‘apply cleanly’ to other Python implementations, matters of bytecode and evaluation (like this article discusses) are even more likely to deviate between implementations. So some of what I say in this post may apply to other implementations, some not – I’m not even checking at the moment; if and when we’ll discuss implementations like PyPy, Jython, IronPython, etc, I’ll highlight some of the differences. With this disclaimer in mind, we can get back to the plot: Code Objects. The compilation of Python source code emits Python bytecode, which is evaluated at runtime to produce whatever behaviour the programmer implemented. I guess you can think of bytecode as ‘machine code for the Python virtual machine’, and indeed if you look at some binary x86 machine code (like this one: 0x55 0x89 0xe5 0xb8 0x2a 0x0 0x0 0x0 0x5d) and some Python bytecode (like that one: 0x64 0x1 0x0 0x53) they look more or less like the same sort of gibberish. Along with the actual bytecode, Python’s compiler emits additional fields, most of them must be coupled with the bytecode (otherwise it would be meaningless). The bytecode and these fields are lumped together in an object called a code object, our subject for this article.

You might initially confuse function objects with code objects, but shouldn’t. Functions are higher level creatures that execute code by relying on a lower level primitive, the code object, but adding more functionality on top of that (in other words, every function has precisely one code object directly associated with it, this is the function’s __code__ attribute, or f_code in Python 2.x). For example, among other things, a function keeps a reference to the global namespace (remember that?) in which it was originally defined, and knows the default values of arguments it receives. You can sometimes execute a code objects without a function (see eval and exec), but then you will have to provide it with a namespace or two to work in. Finally, just for accuracy’s sake, please note that tp_call of a function object isn’t exactly like exec or eval; the latter don’t pass in arguments or provide free argument binding (more below on these). If this doesn’t sit well with you yet, don’t panic, it just means functions’ code objects won’t necessarily be executable using eval or exec. I hope we have that settled.

Let’s see when code objects are created. Code objects are created whenever a block of Python code is compiled. We have mentioned blocks briefly before, the fine material defines them as “a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition.” (the fine material also lists other but less-interesting-to-us code blocks, like every command in the interactive interpreter, the string passed to Python’s executable’s -c switch, etc). As usual, I don’t want to dig too deeply into compilation, but basically when a code block is encountered, it has to be successfully transformed into an AST (which requires mostly that its syntax will be correct), which is then passed to ./Python/compile.c: PyAST_Compile, the entry point into Python’s compilation machinary. A kind comment in ./Python/compile.c explains the general execution flow of this function.

Next, let’s discuss what is in a code object; I said it has stuff other than bytecode, but what? To whet our appetite about the various fields of a code object, we can look at the compiled Python sample from the first paragraph and disassemble it ourselves; it’s easier if we know beforehand that both samples implement a function which simply returns the value 42. Unlike the x86 machine code sample, which is self-contained and should be ready to run (<cough>assuming I didn’t botch it</cough>) the Python bytecode sample doesn’t include the constant value 42 in it at all. You absolutely can’t run this code meaningfully without its constants, and indeed 42 is referred to by one of the extra fields of the code object. We will best see the interaction between the actual bytecode and the accompanying fields as we do a manual disassembly.

From the interpreter (as usual, slight editing for readability):

# the opcode module has a mapping of opcode
#  byte values to their symbolic names
>>> import opcode
>>> def return42(): return 42
... 
# this is the function's code object
>>> return42.__code__
<code object return42 ... >
# this is the actual bytecode
>>> return42.__code__.co_code
b'd\x01\x00S'
# this is the field holding constants
>>> return42.__code__.co_consts
(None, 42)
# the first opcode is LOAD_CONST
>>> opcode.opname[return42.__code__.co_code[0]]
'LOAD_CONST'
# LOAD_CONST has one word as an operand
#  let's get its value
>>> return42.__code__.co_code[1] + \
... 256 * return42.__code__.co_code[2]
1
# and which constant can we find in offset 1?
>>> return42.__code__.co_consts[1]
42
# finally, the next opcode
>>> opcode.opname[return42.__code__.co_code[3]]
'RETURN_VALUE'
>>> 

I hope this was educational, albeit doing it all the time could get boring. Fortunately, we have dis to do this work for us (>>> from dis import dis, you already saw I aliased and augmented it as diss). In addition to dis, the function show_code from the same module is useful to look at code objects (I aliased and augmented a bit as ssc). So let’s look at return42 with diss and ssc:

>>> diss(return42)
  1           0 LOAD_CONST               1 (42) 
              3 RETURN_VALUE         
>>> ssc(return42)
Name:              return42
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  0
Stack size:        1
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 42
>>> 

We see diss and ssc generally agree with our disassembly, though ssc further parsed all sorts of other fields of the code object which we didn’t handle so far (you can run dir on a code object to see them yourself). We have also seen that our value of 42 is indeed referred to by a field of the code object, rather than somehow be encoded in the bytecode.

Code objects are immutable and their fields don’t hold any references (directly or indirectly) to mutable objects. This immutability is useful in simplifying many things, one of which is the handling of nested code blocks. An example of a nested code block is a class with two methods: the class is built using a code block, and this code block nests two inner code blocks, one for each method. This situation is recursively handled by creating the innermost code objects first and treating them as constants for the enclosing code object (much like an integer or a string literal would be treated). You may be wondering how mutable object literals (a = [1, 2, 3]) are represented in a code object, and the answer is that rather than referring to the mutable object with the code object, the ‘recipe’ to prepare it is kept (try >>> import dis; dis.dis(compile("a=(1,2,[3,4,{5:6}])", "string", "exec")) to make this immediately clear).

Now that we have seen the relation between the bytecode and a code object field (co_consts), let’s take a look at the myriad of other fields in a code object. To be honest, I’m not sure this list would be particularly exciting. Many of these fields are just integer counters or tuples of strings representing how many or which variables of various sorts are used in a code object. But looking to the horizon where ceval.c and frame object evaluation is waiting for us, I can tell you that we need an immediate and crisp understanding of all these fields and their exact meaning, subtleties included. So I’ll (tediously?) list and categorize them all, building on the rather terse description you can find in the standard type hierarchy. If this seems to boring right now, you best skim it now but keep it as a reference for later posts; trust me, it’s useful.

Identity or origin (strings)
co_name
A name (a string) for this code object; for a function this would be the function’s name, for a class this would be the class’ name, etc. The compile builtin doesn’t let you specify this, so all code objects generated with it carry the name <module>.
co_filename
The filename from which the code was compiled. Will be <stdin> for code entered in the interactive interpreter or whatever name is given as the second argument to compile for code objects created with compile.
Different types of names (string tuples)
co_varnames
A tuple containing the names of the local variables (including arguments). To parse this tuple properly you need to look at co_flags and the counter fields listed below, so you’ll know which item in the tuple is what kind of variable. In the ‘richest’ case, co_varnames contains (in order): positional argument names (including optional ones), keyword only argument names (again, both required and optional), varargs argument name (i.e., *args), kwds argument name (i.e., **kwargs), and then any other local variable names. So you need to look at co_argcount, co_kwonlyargcount and co_flags to fully interpret this tuple.
co_cellvars
A tuple containing the names of local variables that are stored in cells (discussed in the previous article) because they are referenced by lexically nested functions.
co_freevars
A tuple containing the names of free variables. Generally, a free variable means a variable which is referenced by an expression but isn’t defined in it. In our case, it means a variable that is referenced in this code object but was defined and will be dereferenced to a cell in another code object (also see co_cellvars above and, again, the previous article).
co_names
A tuple containing the names which aren’t covered by any of the other fields (they are not local variables, they are not free variables, etc) used by the bytecode. This includes names deemed to be in the global or builtin namespace as well as attributes (i.e., if you do foo.bar in a function, bar will be listed in its code object’s names).
Counters and indexes (integers)
co_argcount
The number of positional arguments the code object expects to receive, including those with default values. For example, def foo(a, b, c=3): pass would have a code object with this value set to three. The code object of classes accept one argument which we will explore when we discuss class creation.
co_kwonlyargcount
The number of keyword arguments the code object can receive.
co_nlocals
The number of local variables used in the code object (including arguments).
co_firstlineno
The line offset where the code object’s source code began, relative to the module it was defined in, starting from one. In this (and some but not all other regards), each input line typed in the interactive interpreter is a module of its own.
co_stacksize
The maximum size required of the value stack when running this object. This size is statically computed by the compiler (./Python/compile.c: stackdepth when the code object is created, by looking at all possible flow paths searching for the one that requires the deepest value stack. To illustrate this, look at the diss and ssc outputs for a = 1 and a = [1,2,3]. The former has at most one value on the value stack at a time, the latter has three, because it needs to put all three integer literals on the stack before building the list.
Other stuff (various)
co_code
A string representing the sequence of bytecode instructions, contains a stream of opcodes and their operands (or rather, indexes which are used with other code object fields to represent their operands, as we saw above).
co_consts
A tuple containing the literals used by the bytecode. Remember everything in a code object must be immutable, running diss and ssc on the code snippets a=(1,2,3) versus [1,2,3] and yet again versus a=(1,2,3,[4,5,6]) recommended to dig this field.
co_lnotab
A string encoding the mapping from bytecode offsets to line numbers. If you happen to really care how this is encoded you can either look at ./Python/compile.c or ./Lib/dis.py: findlinestarts.
co_flags
An integer encoding a number of flags regarding the way this code object was created (which says something about how it should be evaluated). The list of possible flags is listed in ./Include/code.h, as a small example I can give CO_NESTED, which marks a code object which was compiled from a lexically nested function. Flags also have an important role in the implementation of the __future__ mechanism, which is still unused in Python 3.1 at the time of this writing, as no “future syntax” exists in Python 3.1. However, even when thinking in Python 3.x terms co_flags is still important as it facilitates the migration from the 2.x branch. In 2.x, __future__ is used when enabling Python 3.x like behaviour (i.e., from __future__ import print_function in Python 2.7 will disable the print statement and add a print function to the builtins module, just like in Python 3.x). If we come across flags from now on (in future posts), I’ll try to mention their relevance in the particular scenario.
co_zombieframe
This field of the PyCodeObject struct is not exposed in the Python object; it (optionally) points to a stack frame object. This can aid performance by maintaining an association between a code object and a stack frame object, so as to avoid reallocation of frames by recycling the frame object used for a code object. There’s a detailed comment in ./Objects/frameobject.c explaining zombie frames and their reanimation, we may mention this issue again when we discuss stack frames.

Phew! This is everything in a code object. In making this list I’ve compiled quite a few code blocks, looking how changes in the Python source changes the resulting code object. I recommend you do something similar, and I actually bothered to make it ultra-easy for you to look into how various code blocks affect these fields: in the Mercurial repository I have for this series I created a directory called code_objects, within it you can find a self-explaining little utility that can facilitate looking at a few sample code blocks I wrote alongside with their disassembly and show_code output. Not all fields are necessarily covered in the sample code blocks I provided, you should be able to add a few more samples (if anything intrigues you) yourself and see them disassembled/analyzed. Also, I’m sorry, I’m totally a *NIX bigot (and will erase all flame or even flame-ish comments about that) and this toy might not run on Windows. There’s no good reason for that, I just wanted to use less for pagination, etc, and couldn’t be bothered with achieving the same effect on Windows.

This pretty much sums up what I have to say about code objects at the moment. Time permitting, I sincerely hope we’ll soon reach the next article, where we’ll tackle the final frontier before ./Python/ceval.c: PyEval_EvalFrameEx itself: frame objects. ¡Olé!


I would like to thank Nick Coghlan for reviewing this article; any mistakes that slipped through are my own.

Where Am I?

You are currently browsing entries tagged with python at NIL: .to write(1) ~ help:about.