Python’s Innards: Objects 101

2010/05/12 § 16 Comments

As I said in the introduction to this series (which was rather successful; thank you everyone, your hits and comments literally keep me going!) – today’s post will be about Python 3.x’s implementation of objects. When I set out to write this post, I thought we’ll start with objects because it would be a gentle start for our series. After reading all the code I had to read to write this post, I can hardly say Python’s object system is, uhm, ‘gentle’ (and I definitely can’t say I grokked it). However, if anything, I was only further convinced that the implementation of objects is a good place for us to start because as we’ll see in later posts it’s so fundamental to the innards of Python, yet I suspect it’s not very often understood in its full glory even among Python veterans. Objects are not very tightly coupled with anything else in Python (I hardly visited ./Python for this post, I frequented mostly ./Objects and ./Include), so I found it easiest to look at the implementation of objects as if they’re unrelated to the ‘rest’, as if they’re a general purpose C API for creating an object subsystem. Maybe you will benefit from that line of thought, too: remember these are just a bunch of structures and some functions to manipulate them.

Without further ado, let’s start. Mostly everything in Python is an object, from integer to dictionaries, from user defined classes to built-in ones, from stack frames to code objects. Given a pointer to a piece of memory, the very least you must expect of it to treat it as an object are just a couple of fields defined in a C structure called ./Include/object.h: PyObject:

typedef struct _object {
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

Many objects extend this structure to accommodate other variables required to represent the object’s value, but these two fields must always exist: a reference count and type (in special debug builds, a couple other esoteric fields are added to track references).

The reference count is an integer which counts how many times the object is referenced. >>> a = b = c = object() instantiates an empty object and binds it to three different names: a, b and c. Each of these names creates another reference to it even though the object is allocated only once. Binding the object to yet another name or adding the object to a list will create another reference – but will not create another object! There is much more to say about reference counting, but that’s less central to the overall object system and more related to Garbage Collection. I’d rather consider writing a separate post about that later than elaborate here, we’ve a heady post ahead of us. However, before we leave this subject for now, I’d just like to note that we can now better understand the ./Include/object.h: Py_DECREF macro we’ve seen used in the introduction and didn’t know how to explain: It simply decrements ob_refcnt (and initiates deallocation, if ob_refcnt hit zero). That’s all we’ll say about reference counting for now.

Which leaves us with ob_type, a pointer to an object’s type, a central piece of Python’s object model (bear this deep in mind: in Python 3, type and class effectively mean the same thing; historic reasons lead to the usage of one name over the other depending on context). Every object has exactly one type, which never changes during the lifetime of the object (under extremely unusual conditions it can be changed, no API exists for that and if you handle type-changing objects you’re not likely to be reading this). Possibly most importantly, the type of an object (and only the type of an object) determines what can be done with an object (see the collapsed snippet at the end of this paragraph for a demonstration in code). Recall from the introduction that when the interpreter evaluates the subtraction opcode, a single C function (PyNumber_Subtract) will be called regardless of whether its operands are an integer and an integer, an integer and a float or even something nonsensical (subtract an exception from a dictionary).

# n2w: the type, not the instance, determines what can be done with an instance
>>> class Foo(object):
...     "I don't have __call__, so I can't be called"
... 
>>> class Bar(object):
...     __call__ = lambda *a, **kw: 42
... 
>>> foo = Foo()
>>> bar = Bar()
>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Foo' object is not callable
>>> bar()
42
# will adding __call__ to foo help?
>>> foo.__call__ = lambda *a, **kw: 42
>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Foo' object is not callable
# how about adding it to Foo?
>>> Foo.__call__ = lambda *a, **kw: 42
>>> foo()
42
>>> 

Initially, this is a peculiar thing. How can a single C function be used to handle any kind of object that is thrown at it? It can receive a void * pointer (actually it receives a PyObject * pointer, which is also opaque insofar as the object’s data is concerned), but how will it know how to manipulate the object it is given? In the object’s type lies the answer. A type is in itself a Python object (it also has a reference count and a type of its own, the type of almost all types is type), but in addition to the refcount and the type of the type, there are many more fields in the C structure describing type objects. This page has some information about types as well as type‘s structure’s definition, which you can also find it at ./Include/object.h: PyTypeObject, I suggest you refer to the definition occasionally as you read this post. Many of the fields a type object has are called slots and they point to functions (or to structures that point to a bunch of related functions). These functions are what will actually be called when Python C-API functions are invoked to operate on an object instantiated from that type. So while you think you’re calling PyNumber_Subtract on both a, say, int and a float, in reality what happens is that the types of it operands are dereferenced and the type-specific subtraction function in the ‘subtraction’ slot is used. So we see that the C-API functions aren’t generic, but rather rely on types to abstract the details away and appear as if they can work on anything (valid work is also just to raise a TypeError).

Let’s see this play out in detail: PyNumber_Subtract calls a generic two-argument function called ./Object/abstract.c: binary_op, and tells it to operate on the number-like slot nb_subtract (similar slots exists for other functionality, like, say, the number-like slot nb_negative or the sequence-like slot sq_length). binary_op is an error-checking wrapper around binary_op1, the real ‘do work’ function. ./Objects/abstract.c: binary_op1 (an eye-opening read in itself) receives BINARY_SUBTRACT‘s operands as v and w, and then tries to dereference v->ob_type->tp_as_number, a structure pointing to many numeric slots which represents how v can be used as a number. binary_op1 will expect to find at tp_as_number->nb_subtract a C function that will either do the subtraction or return the special value Py_NotImplemented, to signal that these operands are ‘insubtracticable’ in relation to one another (this will cause a TypeError exception to be raised).

If you want to change how objects behave, you can write an extension in C which will statically define its own PyObjectType structure in code and fill the slots away as you see fit. But when we create our own types in Python (make no mistake, >>> class Foo(list): pass creates a new type, class and type are the same thing), we don’t manually allocate a C structure and we don’t fill up its slots. How come these types behave just like built-in types? The answer is inheritance, where typing plays a significant role. See, Python arrives with some built-in types, like list or dict. As we said, these types have a certain set of functions populating their slots and thus objects instantiated from them behave in a certain way, like a mutable sequence of values or like a mapping of keys to values. When you define a new type in Python, a new C structure for that type is dynamically allocated on the heap (like any other object) and its slots are filled from whichever type it is inheriting, which is also called its base (what about multiple inheritance, you may ask? some other post, I might answer). Since the slots are copied over, the newly created sub-type has mostly identical functionality to its base. Python also arrives with a featureless base object type called object (PyBaseObject_Type in C), which has mostly null slots and which you can extend without inheriting any particular functionality.

So you never really ‘create’ a type in pure Python, you always inherit one (if you define a class without inheriting anything explicitly, you will implicitly inherit object; in Python 2.x, not inheriting anything explicitly leads to the creation of a so called ‘classic class’, which is out of our scope). Of course, you don’t have to inherit everything. You can, obviously, mutate the behaviour of a type created in pure Python, as I’ve demonstrated in the code snippet earlier in this post. By setting the special method __call__ on our class Bar, I made instances of that class callable. Someone, sometime during the creation of our class, noticed this __call__ method exists and wired it into our newly created type’s tp_call slot. ./Objects/typeobject.c: type_new, an elaborate and central function, is that function. We shall revisit that function, at length, in “Objects 102″ (or 103, or 104…), but for now, let’s look at a small line right at the end, after the new type has been fully created and just before returning: fixup_slot_dispatchers(type);. This function iterates over the correctly named methods defined for the newly created type and wires them to the correct slots in the type’s structure, based on their particular name (where are these methods stored? later!).

Another thing remains unanswered in the sea of small details: we’ve demonstrated already that setting the method __call__ on a type after it’s created will also make objects instantiated from that type callable (even objects already instantiated from that type). How come? With elegance, my friends. Recall that a type is an object, and that the type of a type is type (if your head is spinning, try: >>> class Foo(list): pass ; type(Foo)). So when we do stuff to a class (I could use the word type instead of class just as well, but since we use the word ‘type’ so much in different context, I’ll call our type a class for a bit), like calling a class, or subtracting a class, or, indeed, setting an attribute on a class, what happens is that the class’ object’s ob_type member is dereferenced, finding that the class’ type is type. Then the type->tp_setattro slot is used to do the actual attribute setting. So a class, like an integer or a list can have its own attribute-setting function. And the type-specific attribute-setting function (./Objects/typeobject.c: type_setattro, if you didn’t know its name and want to befriend it in Facebook) calls the very same function that fixup_slot_dispatchers uses to actually do the fixup work (update_one_slot) after it has set a new attribute on a class. Another details sussed out!

This is it as far as our introduction to Python objects is concerned. I hope you enjoyed the ride and that you’re still with me. I have to admit this post was far harder to write than I initially imagined (and without Antoine Pitrou‘s and Mark Dickins‘s jolly help late at night in #python-dev, I might have given up!). I’ve left so much out that I’m appalled: which operand’s slot is used in binary opcodes? what happens in multiple inheritance and all the goryminute details it entails? what about this thing called metaclass? what about __slots__ and weak references? what about the actual implementation of all the built-in objects, how do the dictionary, the list, the set and their friends actually do their functional work? And finally, how about this little gem:

>>> a = object()
>>> class C(object): pass
... 
>>> b = C()
>>> a.foo = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'foo'
>>> b.foo = 5
>>>

How come I can set an arbitrary attribute to b, which is an instance of C, which is a class inheriting object and not changing anything, and yet I can’t do the same with a, an instance of that very same object? Some wise crackers can say: b has a __dict__ and a doesn’t, and that’s true, but where in Guido’s name did this new (and totally non-trivial!) functionality come from if I didn’t inherit it?!

Ha! I’m delighted you asked! The answers are coming, but at another post.


A short reading list on the matter, for those so inclined: Try the documentation of the data model for an explanation on how to use things from Python; the C-API documentation about abstract and concrete objects for an explanation on using things from C; descrintro, AKA Unifying types and classes in Python 2.2, an archeological finding which is long, mind-boggling and hugely relevant (I argue it should be added as an easter egg to the interpreter, I suggest >>> import THAT); but above all else, read ./Objects/typeobject.c. Read it over and over, until you cry yourself to sleep. Sweet dreams.

About these ads

Tagged: , , ,

§ 16 Responses to Python’s Innards: Objects 101

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading Python’s Innards: Objects 101 at NIL: .to write(1) ~ help:about.

meta

Follow

Get every new post delivered to your Inbox.

Join 33 other followers

%d bloggers like this: