Python’s Innards: Interpreter Stacks

Those of you who have been paying attention know that this series is spiraling towards what can be considered the core of Python’s Virtual Machine, the “actually do work function” ./Python/ceval.c: PyEval_EvalFrameEx. The (hopefully) last hurdle on our way there is to understand the three significant stack data structures used for CPython’s code evaluation: the call stack, the value stack and the block stack (I’ve called them collectively “Interpreter Stacks” in the title, this isn’t a formal term). All three stacks are tightly coupled with the frame object, which will also be discussed today. If you give me a minute to put on my spectacles, I’ll read to you what Wikipedia says about call stacks in general: In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program… A call stack is composed of stack frames (…). These are machine dependent data structures containing subroutine state information. Each stack frame corresponds to a call to a subroutine which has not yet terminated with a return. Hrmf. Jim, I don’t understand… how does this translate to a virtual machine?

Well, since CPython implements a virtual machine, its call stack and stack frames are dependant on this virtual machine, not on the physical machine it’s running on. And also, as Python tends to do, this internal implementation detail is exposed to Python code, either via the C-API or pure Python, as frame objects (./Include/frameobject.h: PyFrameObject). We know that code execution in CPython is really the evaluation (interpretation) of a code object, so every frame represents a currently-being-evaluated code object. We’ll see (and already saw before) that frame objects are linked to one another, thus forming a call stack of frames. Finally, inside each frame object in the call stack there’s a reference to two frame-specific stacks (not directly related to the call stack), they are the value stack and the block stack.

The value stack (you may know this term as an ‘evaluation stack’) is where manipulation of objects happens when object-manipulating opcodes are evaluated. We have seen the value stack before on various occasions, like in the introduction and during our discussion of namespaces. Recalling an example we used before, BINARY_SUBTRACT is an opcode that effectively pops the two top objects in the value stack, performs PyNumber_Subtract on them and sets the new top of the value stack to the result. Namespace related opcodes, like LOAD_FAST or STORE_GLOBAL, load values from a namespace to the stack or store values from the stack to a namespace. Each frame has a value stack of its own (this makes sense in several ways, possibly the most prominent is simplicity of implementation), we’ll see later where in the frame object the value stack is stored.

This leaves us with the block stack, a fairly simple concept with some vaguely defined terminology around it, so pay attention. Python has a notion called a code block, which we have discussed in the article about code objects and which is also explained here. Completely unrelatedly, Python also has a notion of compound statements, which are statements that contain other statements (the language reference defines compound statements here). Compound statements consist of one or more clauses, each made of a header and a suite. Even if the terminology wasn’t known to you until now, I expect this is all instinctively clear to you if you have almost any Python experience: for, try and while are a few compound statements.

So where’s the confusion? In various places throughout the code, a block (sometimes “frame block”, sometimes “basic block”) is used as a loose synonym for a clause or a suite, making it easier to confuse suites and clauses with what’s actually a code block or vice versa. Both the compilation code (./Python/compile.c) and the evaluation code (./Python/ceval.c) are aware of various suites and have (ill-named) data structures to deal with them; but since we’re more interested in evaluation in this series, we won’t discuss the compilation-related details much (or at all). Whenever I’ll think wording might get confusing, I’ll mention the formal terms of clause or suite alongside whatever code term we’re discussing.

With all this terminology in mind we can look at what’s contained in a frame object. Looking at the declaration of ./Include/frameobject.h: PyFrameObject, we find (comments were trimmed and edited for your viewing pleasure):

typedef struct _frame {
   PyObject_VAR_HEAD
   struct _frame *f_back;   /* previous frame, or NULL */
   PyCodeObject *f_code;    /* code segment */
   PyObject *f_builtins;    /* builtin symbol table */
   PyObject *f_globals;     /* global symbol table */
   PyObject *f_locals;      /* local symbol table */
   PyObject **f_valuestack; /* points after the last local */
   PyObject **f_stacktop;   /* current top of valuestack */
   PyObject *f_trace;       /* trace function */
 
   /* used for swapping generator exceptions */
   PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;
 
   PyThreadState *f_tstate; /* call stack's thread state */
   int f_lasti;             /* last instruction if called */
   int f_lineno;            /* current line # (if tracing) */
   int f_iblock;            /* index in f_blockstack */
 
   /* for try and loop blocks */
   PyTryBlock f_blockstack[CO_MAXBLOCKS];
 
   /* dynamically: locals, free vars, cells and valuestack */
   PyObject *f_localsplus[1]; /* dynamic portion */
} PyFrameObject;

We see various fields used to store the state of this invocation of the code object as well as maintain the call stack’s structure. Both in the C-API and in Python these fields are all prefixed by f_, though not all the fields of the C structure PyFrameObject are exposed in the pythonic representation. I hope some of the fields are intuitively clear to you, since these fields relate to many topics we have already covered. We already mentioned the relation between frame and code objects, so the f_code field of every frame points to precisely one code object. Insofar as structure goes, frames point backwards thus that they create a stack (f_back) as well as point “root-wards” in the interpreter state/thread state/call stack structure by pointing to their thread state (f_tstate), as explained here. Finally, since you always execute Python code in the context of three namespaces (as discussed there), frames have the f_builtins, f_globals and f_locals fields to point to these namespaces. These are the fields (I hope) we already know.

Before we dig into the other fields of a frame object, please notice frames are a variable size Python object (they are a PyObject_VAR_HEAD). The reason is that when a frame object is created it should be dynamically allocated to be large enough to contain references (pointers, really) to the locals, cells and free variables used by its code object, as well as the value stack needed by the code objects ‘deepest’ branch. Indeed, the last field of the frame object, f_localsplus (locals plus cells plus free variables plus value stack…) is a dynamic array where all these references are stored. PyFrame_New will show you exactly how the size of this array is computed.

If the previous paragraph doesn’t sit well with you, I suggest you read the descriptions I wrote for co_nlocals, co_cellvars, co_freevars and co_stacksize – during evaluation, all these ‘dead’ parts of the inert code object come to ‘life’ in space allocated at the end of the frame. As we’ll probably see in the next article, when the frame is evaluated, these references at the end of the frame will be used to get (or set) “fast” local variables, free variables and cell variables, as well as to the variables on the value stack (“fast” locals was explained when we discussed namespaces). Looking back at the commented declaration above and given what I said here, I believe you should now understand f_valuestack, f_stacktop and f_localsplus.

We can now look at f_blockstack, keeping in mind the terminology clarification from before. As you can maybe imagine, compound statements sometimes require state to be evaluated. If we’re in a loop, we need to know where to go in case of a break or a continue. If we’re raising an exception, we need to know where is the innermost enclosing handler (the suite of the closest except header, in more formal terms). This state is stored in f_blockstack, a fixed size stack of PyTryBlock structures which keeps the current compound statement state for us (PyTryBlock is not just for try blocks; it has a b_type field to let it handle various types of compound statements’ suites). f_iblock is an offset to the last allocated PyTryBlock in the stack. If we need to bail out of the current “block” (that is, the current clause), we can pop the block stack and find the new offset in the bytecode from which we should resume evaluation in the popped PyTryBlock (look at its b_handler and b_level fields). A somewhat special case is a raised exception which exhausts the block stack without being caught, as you can imagine, in that case a handler will be sought in the block stack of the previous frames on the call stack.

All this should easily click into place now if you read three code snippets. First, look at this disassembly of a for statement (this would look strikingly similar for a try statement):

>>> def f():
...     for c in 'string':
...             my_global_list.append(c)
...
>>> diss(f)
 2           0 SETUP_LOOP              27 (to 30)
             3 LOAD_CONST               1 ('string')
             6 GET_ITER
       >>    7 FOR_ITER                19 (to 29)
            10 STORE_FAST               0 (c)
 
 3          13 LOAD_GLOBAL              0 (my_global_list)
            16 LOAD_ATTR                1 (append)
            19 LOAD_FAST                0 (c)
            22 CALL_FUNCTION            1
            25 POP_TOP
            26 JUMP_ABSOLUTE            7
       >>   29 POP_BLOCK
       >>   30 LOAD_CONST               0 (None)
            33 RETURN_VALUE
>>>

Next, look at how the opcodes SETUP_LOOP and POP_BLOCK are implemented in ./Python/ceval.c. Notice that SETUP_LOOP and SETUP_EXCEPT or SETUP_FINALLY are rather similar, they all push a block matching the relevant suite unto the block stack, and they all utilize the same POP_BLOCK:

TARGET_WITH_IMPL(SETUP_LOOP, _setup_finally)
TARGET_WITH_IMPL(SETUP_EXCEPT, _setup_finally)
TARGET(SETUP_FINALLY)
_setup_finally:
    PyFrame_BlockSetup(f, opcode, INSTR_OFFSET() + oparg,
               STACK_LEVEL());
    DISPATCH();
 
TARGET(POP_BLOCK)
    {
        PyTryBlock *b = PyFrame_BlockPop(f);
        UNWIND_BLOCK(b);
    }
    DISPATCH();

Finally, look at the actual implementation of ./Object/frameobject.c: PyFrame_BlockSetup and ./Object/frameobject.c: PyFrame_BlockPop:

void
PyFrame_BlockSetup(PyFrameObject *f, int type, int handler, int level)
{
   PyTryBlock *b;
   if (f->f_iblock >= CO_MAXBLOCKS)
       Py_FatalError("XXX block stack overflow");
   b = &f->f_blockstack[f->f_iblock++];
   b->b_type = type;
   b->b_level = level;
   b->b_handler = handler;
}
 
PyTryBlock *
PyFrame_BlockPop(PyFrameObject *f)
{
   PyTryBlock *b;
   if (f->f_iblock <= 0)
       Py_FatalError("XXX block stack underflow");
   b = &f->f_blockstack[--f->f_iblock];
   return b;
}

There, now you’re smart. If you keep the terminology straight, f_blockstack turns out to be rather simple, at least in my book.

We’re left with the rather esoteric fields, some simpler, some a bit more arcane. In the ‘simpler’ range we have f_lasti, an integer offset into the bytecode of the last instructions executed (initialized to -1, i.e., we didn’t execute any instruction yet). This index lets us iterate over the opcodes in the bytecode stream. Heading towards the ‘more arcane’ area we see f_trace and f_lineno. f_trace is a pointer to a tracing function (see sys.settrace; think implementation of a tracer or a debugger). f_lineno contains the line number of the line which caused the generation of the current opcode; it is valid only when tracing (otherwise use PyCode_Addr2Line). Last but not least, we have three exception fields (f_exc_type, f_exc_value and f_exc_traceback), which are rather particular to generators so we’ll discuss them when we discuss that beast (there’s a longer comment about these fields in ./Include/frameobject.h if you’re curious right now).

On a parting note, we can mention when frames are created. This happens in ./Objects/frameobject.c: PyFrame_New, usually called from ./Python/ceval.c: PyEval_EvalCodeEx (and ./Python/ceval.c: fast_function, a specialized optimization of PyEval_EvalCodeEx). Frame creation occurs whenever a code object should be evaluated, which is to say when a function is called, when a module is imported (the module’s top-level code is executed), whenever a class is defined, for every discrete command entered in the interactive interpreter, when the builtins eval or exec are used and when the -c switch is used (I didn’t absolutely verify this is a 100% exhaustive list, but it think it’s rather complete).

Looking at the list in the previous paragraph, you probably realized frames are created very often, so two optimizations are implemented to make frame creation fast: first, code objects have a field (co_zombieframe) which allows them to remain associated with a ‘zombie’ (dead, unused) frame object even when they’re not evaluated. If a code object was already evaluated once, chances are it will have a zombie frame ready to be reanimated by PyFrame_New and returned instead of a newly allocated frame (trading some memory to reduce the number of allocations). Second, allocated and entirely unused stack frames are kept in a special free-list (./Objects/frameobject.c: free_list), frames from this list will be used if possible, instead of actually allocating a brand new frame. This is all kindly commented in ./Objects/frameobject.c.

That’s it, I think. Oh, wait: if you’d like to play with frames in your interpreter, take a look at the inspect module, maybe especially this part of it. In gdb, I used a rather crude method to look at the call stack (I dereferenced the global variable interp_head and went on from there). There’s probably a better way, but I didn’t bother looking. Now that’s really it. In fact, I believe at last we covered enough material to analyze ./Python/ceval.c: PyEval_EvalFrameEx. My dear readers, we are ready for it. We have the technology.

But, alas, we’ll only do it in the next post, and who knows when that will arrive. And until it does, do good, avoid doing bad and keep clearing your mind. Siddhārtha Gautama said that, and I tend to think that if that particular bloke lived today he’d have some serious Python-Fu going for him, so heed his words.


I would like to thank Nick Coghlan for reviewing this article; any mistakes that slipped through are my own.


Comments

9 responses to “Python’s Innards: Interpreter Stacks”

  1. One Nitpick

    compilation code (./Python/compile.c) and the evaluation code
    (./Python/ceval.c) are aware of various suites and have (ill-named)
    data structures to deal with them; since we’re more interested in
    -compilation than in evaluation in this series, we’ll discuss more of the
    +evaluation than in compilation in this series, we’ll discuss more of the
    latter. Whenever I’ll think wording might get confusing, I’ll mention the
    formal terms of clause or suite alongside whatever code term we’re
    discussing.

    1. Pfft, thanks! Nick actually was on to this confusion in his review of the post, and yet I managed to botch both my original wording and the correction following his comments… I hope now it makes sense, I changed the wording a bit 🙂

  2. “We can read it. We have the technology.”
    Thanks.
    Unfortunately doesn’t answer my question yet, but an interesting read apart from the careless sexism, which, no doubt was tongue in cheek, but every little bit helps.

    1. Assuming the careless sexism was in use of the word “Gentlemen”, it’s now fixed. Sorry; I’m not quite a native English speaker, and Hebrew explicitly refers to third person of unknown gender as male (that in itself can be argued about, but out of scope for me…).

      1. Nothing wrong at all with your English or your tone. Let’s accept that some people will actively look for excuses to be offended and move on to more interesting horizons. Thanks for your article.

  3. Hi there,

    Nice post, congratulations. I have a question regarding the frame object and function parameters. I try to catch the input given for built-in C functions (e.g math.cos or time.sleep). For that I have tried to profile my execution, but C calls works in a different way. Let me give an example:

    I have the following program part:

    def foo(x):
    y = math.cos(x)
    bar(y+1)
    time.sleep(y+1)

    def bar(g):
    pass

    def profiler(frame, event, arg):
    args, varargs, varkw, value_dict = inspect.getargvalues(frame)
    print(event, frame.f_code.co_name, “->”, arg, “f_locals:”, frame.f_locals)

    sys.setprofile(profiler)

    foo(100)

    When I run the trace, I get:
    call bar -> None f_locals: {‘g’: 1.862318872287684}
    c_call foo -> f_locals: {‘x’: 100, ‘y’: 0.8623188722876839}

    I don’t understand the parameter passing difference between own Python functions and a C functions (and I couldn’t find the related program code part). The example above shows that the bar function gets the evaluated value of y+1, but a C call doesn’t.

    Could you clarify this topic or help where to find an appropriate place to catch the parameters of built-in C calls (for example to print out 1.862.. for time.sleep)?

    Thanks,
    Rich

  4. With Python’s innards blogs, you are doing Gods work. Its a pleasure reading it.

  5. How was that sexist?

  6. Thanks for this blog post – it’s still helpful in understanding Python innards, even after 8 years 🙂 I’m curious if you considered mentioning the heap in this post.

    Thanks for being receptive to Ann’s comment, Yaniv. Please also consider including non-binary folks in your salutation.