2010/06/19 § 3 Comments
This blog is intended for technical matters, not to tell you about my (ineffably fascinating) life. However, I reckon at times I owe my readers at least a small update to explain some absence. So there: I’m travelling India at the moment, and will keep doing so until early October. Between the horrible heat and the splendid
weed vistas, I haven’t gotten to writing much (truth is, the main reason is that within the regions traveled and the budget of this trip I absolutely can’t find a decent table to type on).
This doesn’t mean this blog and particularly the Python’s Innards series are dead, nor even on-hold until October, it just means they’re taking their time. I hope to publish the next post soon; after all, in this part of the world, सुब कुछ मिलेगा.
2010/06/05 § 13 Comments
Today’s article in our series of articles about Python internals’ will discuss naming, which is the ability to bind names to an object, like we can see in the statement a = 1 (in other words, this article is roughly about what many languages call variables). Naturally, naming is central to Python's behaviour and understanding both its semantics and mechanics are important precursors to our quickly approaching discussions of code evaluation, code objects and stack frames. That said, it is also a delicate subject because anyone with some programming experience knows something about it, at least instinctively (you’ve done something like a = 1 before, now haven’t you?). This rarely-formally-learned familiarity laden with language-specific nuances might lead to false assumptions and gut-feeling-led vagueness, and while Python aims not to surprise, some subtle details which you weren’t aware of might turn out to be surprising nonetheless. So to solidify our foundations, we’ll begin with some formality and terminology.
When we evaluate a = b = c = , we create one list and give it three different names. In formal terms, we’d say that the newly instantiated list object is now bound to three identifiers that refer to it (I also saw and liked the word referent used for the now-referred-to object in the weakref module’s documentation). This distinction between names and the objects bound to them is important. If we evaluate a.append(1), we will see that b and c are also affected; we didn’t mutate a, we mutated its referent, so the mutation is uniformly visible via any name the object was referred to. On the other hand, if we will now do a b = , a and c will not change, since we didn’t actually change the object which b referred to but rather did a re-binding of the name b to a (newly created and empty) list object. Also recall that binding is one of the ways to increase the referent’s reference count, this is worthy of noting even though reference counting isn’t our subject at the moment.
A name binding is commonly created by use of the assignment statement, which is a statement that has an ‘equals’ symbol (=) in the middle, “stuff to assign to” or targets on the left, and “stuff to be assigned” (an expression) on the right. A target can be a name (more formally called an identifier) or a more complex construct, like a sequence of names, an attribute reference (primary_name.attribute) or a subscript (primary_name[subscript]). The highly recommended formal definition of assignment is here, what I present here is a coarse simplification aimed to suit our bare minimum needs for the article. Name binding is undone with the deletion statement del, which is roughly “del followed by comma-separated targets to unbind” (formal specification here). Naturally, much like binding a name to an object increases that object’s reference count, so does deletion decrease it. Finally, note that name binding can be done without an assignment as bindings are also created by def, class, import (and others), this is also of less importance to us now.
Scope is a term relating to the visibility of an identifier throughout a block, or a piece of Python code executed as a unit: a module, a function body and a class definition are blocks (control-blocks like those of if and while are not code blocks in Python). A namespace is an abstract environment where the mapping between names and the objects they refer to is made (incidentally, in current CPython, this is indeed implemented with the dict mapping type). The rules of scoping determine in which namespace will a name be sought after when it is used, or rather resolved. You probably know instinctively that a name bound in function foo isn’t visible in an unrelated function bar, this is because by default names created in a function will be stored in a namespace that will not be looked at when name resolution happens in another, unrelated function. Scope determines not just when a name will be visible as it is resolved or ‘read’ (i.e., if you do spam = eggs, where will eggs come from) but also as it is bound or ‘written’ (i.e., in the same example, where will spam go to). When a namespace will no longer be used (for example, the private namespace of a function which returns) all the names in it are unbound (this triggers reference count decrease and possibly deallocation, but this doesn’t concern us now).
Scoping rules change based on the lexical context in which code is compiled. For example, in simpler terms, code compiled as a plain function’s body will resolve names slightly differently when evaluated when compared with code compiled as part of a module’s initialization code (the module top-level code). Special statements like global and nonlocal exist and can be applied to names thus that resolution rules for these names will change in the current code block, we’ll look into that later. There are many subtleties to the formal rules of scoping and this text does not aim to explain them fully. It’s not that I think that scoping isn’t important enough, but because I’d like us to talk more about the mechanics of scoping (how does it behave like it does) rather than its semantics (what should it behave like). If you want to learn more about the semantics of scoping then by all means read the fine material.
This (important) disclaimer in place, the crash course to scoping is like so: when Python code is evaluated, it is evaluated within three namespaces: locals, globals and builtins. When we resolve a name, it will be sought after in the local scope, then the global scope, then the builtin scope (then a NameError will be raised). When we bind a name with a name binding statement (i.e., an assignment, an import, a def, etc) the name will be bound in the local scope, and hide any existing names in the global or builtin scope. This hiding does not mean the hidden name was changed (formally: the hidden name was not re-bound), it just means it is no longer visible in the current block’s scope because the newly created binding in the local namespace overshadows it. We said scoping changes according to context, and one such case is when functions are lexically nested within one another (that is, a function defined inside the body of another function): resolution of a name from within a nested function will first search in that function’s scope, then in the local scopes of its outer function(s) and only then proceed normally (in the globals and builtins) scope.
Lexical scoping is an interesting behaviour, let’s look at it closely:
$ cat scoping.py ; python3.1 def outer(): a = 1 # creating a lexically nested function bar def inner(): # a is visible from outer's locals return a b = 2 # b is here for an example later on return inner # inner_nonlexical will be called from within # outer_nonlexical but it is not lexically nested def inner_nonlexical(): return a # a is not visible def outer_nonlexical(): a = 1 inner = inner_nonlexical b = 2 # b is here for an example later on return inner_nonlexical >>> from scoping import * >>> outer()() 1 >>> outer_nonlexical()() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "scoping.py", line 13, in inner_nonlexical return a # a is not visible >>>
As the example demonstrates, a is visible in the lexically nested inner but not in the call-stack nested but not lexically nested inner_nonlexical. This is interesting. How is this implemented? I mean, Python is dynamic, everything is runtime, how does inner_nonlexical fail if it has the same Python code and is called in a similar fashion from within a similar environment as the original inner was called? What’s the difference between these two seemingly identical functions? Further more, we can see that inner is actually called after outer has terminated: how can it use a value from a namespace that was already destroyed? These are terrific questions. To answer them, let’s put aside our formal and semantic hat for now, and think about implementation for a bit.
Once again, let’s look at the bytecode emitted for the simple statement spam = eggs - 1:
>>> diss("spam = eggs - 1") 1 0 LOAD_NAME 0 (eggs) 3 LOAD_CONST 0 (1) 6 BINARY_SUBTRACT 7 STORE_NAME 1 (spam) 10 LOAD_CONST 1 (None) 13 RETURN_VALUE >>>
Recall that BINARY_SUBTRACT will pop two arguments from the value-stack and feed them to PyNumber_Subtract, which is a C function that accepts two PyObject * pointers and certainly doesn’t know anything about scoping. What gets the arguments onto the stack are the LOAD_NAME and LOAD_CONST opcodes, and what will take the result out of the stack and into wherever it is heading is the STORE_NAME ocopde. It is opcodes like this that implement the rules of naming and scoping, since the C code implementing them is what will actually look into the dictionaries representing the relevant namespaces trying to resolve the name and bring the resulting object unto the stack, or store whatever object is to be stored into the relevant namespace.
Not every opcode which loads a value into the value stack necessarily has to do with scoping. For example, take LOAD_CONST; this opcode loads a constant value unto the value stack, but it isn’t about scoping (constants don’t have a scope, by definition they aren’t variables and they’re never ‘hidden’). Fortunately for you, I’ve already grepped the sources for ‘suspect’ opcodes ($ egrep -o '(LOAD|STORE)(_[A-Z]+)+' Include/opcode.h | sort) and believe I’ve mapped out the opcodes that actually implement scoping, so we can concentrate on the ones that really implement scoping (if you think I missed something, by all means let me know). Note that among the list of opcodes I chose not to address are the ones that handles attribute reference and subscripting; I chose so since these opcodes rely on a different opcode to get the primary reference (the name before the dot or the square brackets) on the value stack and thus aren’t really about scoping. So, as far as I could see, we should discuss four pairs of opcode:
- LOAD_NAME and STORE_NAME
- LOAD_FAST and STORE_FAST
- LOAD_GLOBAL and STORE_GLOBAL
- LOAD_DEREF and STORE_DEREF
To save me some typing, we can refer to each of these pairs by their suffix, i.e., *_NAME or *_GLOBAL. I suggest we discuss each pair along with the situations in which the compiler chooses to emit an opcode of that pair in order to satisfy the semantics of scoping. This is not necessarily an exhaustive listing of these opcodes’ uses (it might be, I’m not checking if it is or isn’t), but it should develop an understanding of these opcodes’ behaviour and allow us to figure out other cases where the compiler chooses the emit them on our own; so if you ever see any of these in a disassembly, you’ll be covered.
I’d like to begin with the obvious pair, *_NAME; it is simple to understand (and I suspect it was the first to be implemented). Explaining the *_NAME pair of opcodes is easiest by writing rough versions of them in Python-like psuedocode (you can and should read the actual implementation in ./Python/ceval.c: PyEval_EvalFrameEx):
def LOAD_NAME(name): try: return current_stack_frame.locals[name] except KeyError: try: return current_stack_frame.globals[name] except KeyError: try: return current_stack_frame.builtins[name] except KeyError: raise NameError('name %r is not defined' % name) def STORE_NAME(name, value): current_stack_frame.locals[name] = value
It’s easy to see that this is pretty much the vanilla case of scoping as we described it earlier, so there should be no surprises here (precisely when and how are the locals, globals and builtins namespaces created and set on the current stack frame will be covered when we talk about stack frames or code evaluation). While they are the ‘vanilla’ case, *_NAME, in some cases they are not emitted at all as more specialized opcodes can achieve the same functionality in a faster manner. As we explore the other scoping-related opcodes, we will see why.
A commonly used pair of scoping related opcodes is the *_FAST pair, which were originally implemented a long time ago as a speed enhancement over the *_NAME pair. These opcodes are used in cases where compile time analysis can infer that a variable is used strictly in the local namespace. This is possible when compiling code which is a part of a function, rather than, say, at the module level (some subtleties apply about the meaning of ‘function’ in this context, a class’ body may also use these opcodes under some circumstances, but this is of no interest to us at the moment; also see the comments below). If we can decide at compile time which names are used in precisely one namespace, and that namespace is private to one code block, it may be easy to implement a namespace with cheaper machinery than dictionaries. Indeed, these opcodes rely on a local namespace implemented with a statically sized array, which is far faster than a dictionary lookup as in the global namespace and other namespaces. In Python 2.x it was possible to confuse the compiler thus that it will not be able to use these opcodes in a particular function and have to revert to *_NAME, this is no longer possible in Python 3.x (also see the comments).
Let’s look at the two *_GLOBAL opcodes. LOAD_GLOBAL (but not STORE_GLOBAL) is also generated when the compiler can infer that a name is resolved in a function’s body but was never bound inside that body. This behaviour is conceptually similar to the ability to decide when a name is both bound and resolved in a function’s body, causing the generation of the *_FAST opcodes as we’ve seen above. This may become more apparant with a quick (slightly edited for brevity) disassembly:
>>> def func(): ... a = 1 ... a = b ... return a ... >>> diss(func) 2 0 LOAD_CONST 1 (1) 3 STORE_FAST 0 (a) 3 6 LOAD_GLOBAL 0 (b) 9 STORE_FAST 0 (a) 4 12 LOAD_FAST 0 (a) 15 RETURN_VALUE >>>
As described for *_FAST, we can see that a was bound within the function, which places it in the local scope private to this function, which means the *_FAST opcodes can and are used for a. On the other hand, we can see (and the compiler could also see…) that b was resolved before it was ever bound in the function. The compiler figured it must either exist elsewhere or not exist at all, which is exactly what LOAD_GLOBAL does: it bypasses the local namespace and searches only the global and builtin namespaces (and then raises a NameError).
This explanation leaves us with missing functionality: what if you’d like to re-bind a variable in the global scope? Recall that binding a new name normally binds it locally, so if you have a module defining foo = 1, a function setting foo = 2 locally “hides” the global foo. But what if you want to re-bind the global foo? Note this is not to mutate object referred to by foo but rather to bind the name foo in the global scope to a different referent; if you’re not clear on the distinction between the two, skim back in this post until we’re on the same page. To do so, we can use the global statement which we mentioned in passing before; this statement lets you tell the compiler to treat a name always as a global both for resolving and for binding within a particular code block, generating only *_GLOBAL opcodes for manipulation of that name. When binding is required, STORE_GLOBAL performs the new binding (or a re-binding) in the global namespace, thus allowing Python code to explicitly state which variables should be stored and manipulated in the global scope. What happens if you use a variable locally, and then use the global statement to make it global? Let’s look (slightly edited):
>>> def func(): ... a = 1 ... global a ... <stdin>:3: SyntaxWarning: name 'a' is assigned to before global declaration >>> diss(func) 2 0 LOAD_CONST 1 (1) 3 STORE_GLOBAL 0 (a) 3 6 LOAD_CONST 0 (None) 9 RETURN_VALUE >>>
…the compiler still treats the name as a global all through the code block, but warns you not to shoot yourself (and other maintainers of the code) in the foot. Sensible.
We are left only with LOAD_DEREF and STORE_DEREF. To explain these, we have to revisit the notion of lexical scoping, which is what started our inspection of the implementation. Recall that we said that nested functions’ resolution of names tries the namespaces’ of all lexically enclosing functions (in order, innermost outwards) before it hits the global namespace, we also saw an example of that in code. So how did inner return a value resolved from this no-longer-existing namespace of outer? When resolution of names is attempted in the global namespace (or in builtins), the name may or may not be there, but for sure we know that the scope is still there! How do we resolve a name in a scope which doesn’t exist?
The answer is quite nifty, and becomes apparent with a disassembly (slightly edited) of both functions:
# see the example above for the contents of scoping.py >>> from scoping import * # recursion added to 'diss'; you can see metablogging->tools above >>> diss(outer, recurse=True) 2 0 LOAD_CONST 1 (1) 3 STORE_DEREF 0 (a) 3 6 LOAD_CLOSURE 0 (a) 9 BUILD_TUPLE 1 12 LOAD_CONST 2 (<code object inner ...) 15 MAKE_CLOSURE 0 18 STORE_FAST 0 (inner) 5 21 LOAD_CONST 3 (2) 24 STORE_FAST 1 (b) 6 27 LOAD_FAST 0 (inner) 30 RETURN_VALUE recursing into <code object inner ...>: 4 0 LOAD_DEREF 0 (a) 3 RETURN_VALUE >>>
Sweet. We can see that outer (the outer function!) already treats a, the variable which will be used outside of its scope, differently than it treats b, a ‘simple’ variable in its local scope. a is loaded and stored using the *_DEREF variants of the loading and storing opcodes, in both the outer and inner functions. The secret sauce here is that at compilation time, if a variable is seen to be resolved from a lexically nested function, it will not be stored and will not be accessed using the regular naming opcodes. Instead, a special object called a cell is created to store the value of the object. When various code objects (the outer function, the inner function, etc) will access this variable, the use of the *_DEREF opcodes will cause the cell to be accessed rather than the namespace of the accessing code object. Since the cell is actually accessed only after outer has finished executing, you could even define inner before a was defined, and it would still work just the same (!).
This is automagical for name resolution, but for outer scope rebinding the nonlocal statement exists. nonlocal was decreed by PEP 3014 and it is somewhat similar to the global statement. nonlocal explicitly declares a variable to be used from an outer scope rather than locally, both for resolution and re-binding. It is illegal to use nonlocal outside of a lexically nested function, and it must be nested inside a function that defines the identifiers listed by nonlocal. There are several small gotchas about lexical scoping, but overall things behave as you would probably expect (for example, you can’t cause a name to be used locally and as a lexically nested name in the same code block, as the collapsed snippet below demonstrates).
>>> def outer(): ... a = 1 ... def inner(): ... b = a ... a = 1 ... return a,b ... return inner ... >>> outer()() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in inner UnboundLocalError: local variable 'a' referenced before assignment >>>
This sums up the mechanics of naming and scoping, I hope you enjoyed the ride. As I warned, the formal description of the semantics is rather sketchy here and shouldn’t be relied upon, though I hope the description of the mechanics was useful to you. If you feel your brain is melting, that’s normal, drink a glass of cold water and be thankful that at least you didn’t have to write this article (not to mention write the implementation of scoping…). Our next article will discuss code objects, revisiting some of the terms we saw today and further beefing us up towards tackling the frame object and the actual evaluation function. For great justice.
I would like to thank Nick Coghlan and Thomas Wouters for reviewing this article; any mistakes that slipped through are my own.
2010/06/05 § Leave a comment
When discussing what I called “Pythonic threads”, which are threads created and managed by Python, which have a PyThreadState structure allocated to them and that are able to call into the Python API and run Python code, I said: “theoretically threads can be created which will not be under the interpreter’s control; these threads won’t have a PyThreadState structure and must never call a Python API; this is not very common“. This is sort of correct, but misleading: such threads are quite obvious to have in a non-Python applications that has a Python interpreter embedded in it (something Graham probably knows a thing or two about :). These ‘foreign’ (I don’t know of formal term) threads indeed can’t call the Python API and run Python code unless they’re ‘migrated’ (don’t know of a formal term) to come under the control of a Python interpreter, having the correct bookkeeping structures allocated and initialized for them.
Graham also noted correctly that such ‘migration’ is easier when there’s just one Python interpreter in the process, and harder if there’s more than one. If you have just one interpreter, you can use the (somewhat confusingly named) PyGILState_Ensure and PyGILState_Release calls to do the job. These calls allow a thread to ensure it has the GIL (blocking if necessary until it can be acquired), but importantly can be called by a thread that isn’t under Python’s control at all; the necessary work to allow the thread to call Python’s API will be done for you. These calls assume there is only one Python interpreter (interp_head points to a PyInterpreterState which points to NULL), hence they mustn’t be used in a multi-interpreter process, if that is your case you’ll have to ‘migrate’ the threat manually with lower level primitives.
As per my errata policy, I’ve chosen to update the original post as if the error never happened and write this post note the mistake for future reference and for those subscribed to the RSS feed of the Errata category.
I’d like to thank Graham for the important correction.