Correction for ‘Python’s Innards: Objects 102’

2010/05/20 § 1 Comment

Alas, it has happened, the first mistake in the ‘Python’s Innards’ series has been found. I was trying to answer a question raised by one of my Reddit readers regarding properties, and realized that I have overlooked a fine point about descriptors in my post. Oops.

As was originally (and correctly) written in the post, descriptors are objects whose type has their tp_descr_get and/or tp_descr_set slots set to non-NULL. However, I also wrote, incorrectly, that descriptors take precedence over regular instance attributes (i.e., attributes in the object’s __dict__). This is partly correct but misleading, as it doesn’t distinguish non-data descriptors from data-descriptors. An object is said to be a data descriptor if its type has its tp_descr_set slot implemented (there’s no particularly special term for a non-data descriptor). Only data descriptors override regular object attributes, non-data descriptors do not.

I apologize for the mistake, but alas can’t promise it will never happen again. I’ve created an Errata Policy page which explains how I intend to treat mistakes that will be found from now on. Briefly, it says that the policy is to correct the original page as if the mistake never existed, and if the mistake is significant enough, also write a separate post (like this one) which explains what happened. I plan not to tag errata posts with all the tags of the original post (I’ve done it with this post as an exception), so unless you explicitly register to the RSS feed of the Errata category in my blog, you won’t get further errata via Planet Python or other aggregation services that rely on the ‘python’ or ‘pythons-innards’ tags.

I hope you will keep enjoying and relying on this blog.

Advertisement

Python’s Innards: Objects 102

2010/05/19 § 4 Comments

Welcome to Object 102, the third post in our series of Python internals and a direct continuation to the earlier post, Objects 101 (reading this post without reading 101 totally voids your warranty, so maybe you should head there first if you haven’t yet). In this post we will touch upon a central subject we have tiptoed around so far – attributes. You have surely used attributes in the past if you’ve done any Python programming: an object’s attributes are other objects related to it and accessible by invoking the . (dot) operator, like so: >>> my_object.attribute_name. Let’s begin with a quick overview of observable attribute access behaviour in Python, a behaviour determined by the accessed object’s type (as we’ve demonstrated ad nauseum about the behaviour of all operations on an object).

A type can define one (or more) specially named methods that will customize attribute access to its instances, these special methods are listed here (and we already know they will be wired into the type’s slots using fixup_slot_dispatchers when the type is created… you did read Objects 101, right?). These methods can do pretty much anything they want; whether writing your type in C or pure Python, you could write such methods that store and retrieve the attributes in some insane backend if you were so inclined, you could radio the attributes to the International Space Station and back or even store them in a relational database. However, under less exciting circumstances these methods simply store the attribute as a key/value pair (attribute name/attribute value) in some object-specific dictionary when an attribute is set and retrieve the attribute from that dictionary when an attribute is get (or raise an AttributeError if the dictionary doesn’t have a key matching the requested attribute’s name). This is so straightforward and lovely, thanks for listening, this concludes Objects 102.

Not. My friends, the fecal material is accelerating rapidly towards the spinning wind generation device, let me tell you that much. Let’s get into trouble together by looking closely at how this works in the interpreter, asking annoying questions as we often do. Below is a collapsed example of simple use of attributes, you can either read it in code (the surprising bits highlighted) or proceed to the English description of it in the next paragraph (the surprising bits are emphasized).

>>> print(object.__dict__)
{'__ne__': <slot wrapper '__ne__' of 'object' objects>, ... , '__ge__': <slot wrapper '__ge__' of 'object' objects>}
>>> object.__ne__ is object.__dict__['__ne__']
True
>>> o = object()
>>> o.__class__
<class 'object'>
>>> o.a = 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'a'
>>> o.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute '__dict__'
>>> class C:
...     A = 1
... 
>>> C.__dict__['A']
1
>>> C.A
1
>>> o2 = C()
>>> o2.a = 1
>>> o2.__dict__
{'a': 1}
>>> o2.__dict__['a2'] = 2
>>> o2.a2
2
>>> C.__dict__['A2'] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dict_proxy' object does not support item assignment
>>> C.A2 = 2
>>> C.__dict__['A2'] is C.A2
True
>>> type(C.__dict__) is type(o2.__dict__)
False
>>> type(C.__dict__)
<class 'dict_proxy'>
>>> type(o2.__dict__)
<class 'dict'>
>>> 

In English: we can see that object (as in, the most basic built-in type which we’ve discussed before) has a private dictionary, and we see that stuff we access on object as an attribute is identical to what we find in object.__dict__. We might have been somewhat surprised to see that instances of object (o, in the example) don’t support arbitrary attribute assignment and don’t have a __dict__ at all, though they do support some attribute access (try o.__class__, o.__hash__, etc; these do return things). After that we created our own class, C, derived from object and adding an attribute A, and saw that A was accessible via C.A and C.__dict__['A'] just the same, as expected. We then instantiated o2 from C, and demonstrated that as expected, attribute assignment on it indeed mutates its __dict__ and vice versa (i.e., mutations to its __dict__ are exposed as attributes). We were then probably more surprised to learn that even though attribute assignment on the class (C.A2) worked fine, our class’ __dict__ is actually read-only. Finally, we saw that our class’ __dict__ is not of the same type as our object’s __dict__, but rather an unfamiliar beast called dict_proxy. And if all that wasn’t enough, recall the mystery from the end of Objects 101: if plain object instances like o have no __dict__, and C extends object without adding anything significant, why do instances of C like o2 suddenly do have a __dict__?

Curiouser and curiouser indeed, but worry not: all will be revealed in due time. First, we shall look at the implementation of a type’s __dict__. Looking at the definition of PyObjectType (a zesty and highly recommended exercise), we see a slot called tp_dict, ready to accept a pointer to a dictionary. All types must have this slot, and all types have a dictionary placed there when ./Objects/typeobject.c: PyType_Ready is called on them, either when the interpreter is first initialized (remember Py_Initialize? It invokes _Py_ReadyTypes which calls PyType_Ready on all known types) or when the type is created dynamically by the user (type_new calls PyType_Ready on the newborn type before returning). In fact, every name you bind within a class statement will turn up in the newly created type’s __dict__ (see ./Objects/typeobject.c: type_new: type->tp_dict = dict = PyDict_Copy(dict);). Recall that types are objects too, so they have a type, their type is type and it has slots with functions that satisfy attribute access in a manner befitting types. These functions use the dictionary each type has and pointed to by tp_dict to store/retrieve the attributes, that is, getting attributes on a type is directly wired to dictionary assignment for the type instance’s private dictionary pointed to by the type’s structure. Reasonably similar stuff happens when attributes are set or deleted (in the eyes of the interpreter, deletion is just a set operation of NULL value, by the way). So far I hope it’s been rather simple, and explains types’ attribute retrieval.

Before we proceed to explain how instances’ attribute access works, allow me to introduce an unfamiliar (but very important!) term: Descriptor. Descriptors play a special role in instances’ attribute access, and I’d like to provide their bare definition here for a moment to simplify things in the next paragraph. An object is said to be a descriptor if it’s type has one or two slots (tp_descr_get and/or tp_descr_set) filled with non-NULL value. These slots are wired to the special method names __get__, __set__ and __delete__, when the type is defined in pure Python (i.e., if you create a class which has a __get__ method it will be wired to its tp_descr_get slot, and if you instantiate an object from that class, the object is a descriptor). Finally, an object is said to be a data descriptor if its type has a non-NULL tp_descr_set slot (there’s no particularly special term for a non-data descriptor). Descriptors play a special role in attribute access, as we shall soon see, I will provide further explanations and relevant links to documentation soon.

So we’ve defined descriptors, and we know how types’ dictionaries and attribute access work. But most objects aren’t types, that is to say, their type isn’t type, it’s something more mundane like int or dict or a user defined class. All these rely on generic attribute access functions, which are either set on the type explicitly or inherited from the type’s base when the type is created (that’s slot inheritance; this was also covered in Object 101). The generic attribute-getting function (PyObject_GenericGetAttr) and its algorithm is like so: (a) search the accessed instance’s type’s dictionary, and then all the type’s bases’ dictionaries. If a data descriptor was found, invoke it’s tp_desr_get function and return the results. If something else is found, set it aside (we’ll call it X). (b) Now search the object’s dictionary, and if something is found, return it. (c) If nothing was found in the object’s dictionary, inspect X, if one was set aside at all; if X is a non-data descriptor, invoke it’s tp_descr_get function and return the result, and if it’s a plain object it returns it. (d) Finally, if nothing was found, it raise an AttributeError exception.

So we learn that descriptors can execute code when they’re accessed as an attribute (so when you do foo = o.a or o.a = foo, a runs code). A powerful notion, that, and it’s used in several cases to implement some of Python’s more ‘magical’ features. Data-descriptors are even more powerful, as they take precedence over instance attributes (if you have an object o of class C, class C has a foo data-descriptor and o has a foo instance attribute, when you do o.foo the descriptor will take precedence). You can read about what are descriptors and how to implement descriptors. The former link (‘what are’) is particularly recommended: while it might seem daunting at first, if you read it slowly and literally you will see it’s actually simple and more concisely written than my drivel. There’s also a terrific article by Raymond Hettinger which describes descriptors in Python 2.x; other than the disappearance of unbound methods, it’s still very relevant to py3k and is highly recommended. While descriptors are really important and you’re advised to take the time to understand them, for brevity and due to the well written resources I’ve just mentioned I will explain them no further, other than show you how they behave in the interpreter (super simple example!):

>>> class ShoutingInteger(int):
...     # __get__ implements the tp_descr_get slot
...     def __get__(self, instance, owner):
...             print('I was gotten from %s (instance of %s)'
...                   % (instance, owner))
...             return self
... 
>>> class Foo:
...     Shouting42 = ShoutingInteger(42)
... 
>>> foo = Foo()
>>> 100 - foo.Shouting42
I was gotten from <__main__.Foo object at 0xb7583c8c> (instance of <class __main__.'foo'>)
58
# Remember: descriptors are only searched on types!
>>> foo.Silent666 = ShoutingInteger(666)
>>> 100 - foo.Silent666
-566
>>>

Let’s take a moment to note that we have completed our understanding of Python’s Object Oriented inheritance: since attributes are searched in the object’s type, and then in all its bases, then we now understand that accessing attribute A on object O instantiated from class C1 which inherits C2 which inherits C3 can return A either from O, C1, C2 or C3, depending on something called the method resolution order and described rather well here. This way of resolving attributes, when coupled with slot inheritance, is enough to explain most of Python’s inheritance functionality (though, as always, the Devil is in the details).

We know quite a lot now, but we still don’t know where the reference to these sneaky instance dictionaries are stored. We’ve seen the definition of PyObject, and it most definitely didn’t have a pointer to a dictionary, so where is the reference the object’s dictionary stored? The answer is rather interesting. If you look closely at the definition of PyTypeObject (it’s good clean and wholesome! read it every day!), you will see a field called tp_dictoffset. This field provides a byte offset into the C-structure allocated for objects instantiated from this type; at this offset, a pointer to a regular Python dictionary should be found. Under normal circumstances, when creating a new type, the size of the memory region necessary to allocate objects of that type will be calculated, and that size will be larger than the size of vanilla PyObject. The extra room will typically be used (among other things) to store the pointer to the dictionary (all this happens in ./Objects/typeobject.c: type_new, see may_add_dict = base->tp_dictoffset == 0; onwards). Using gdb, we can easily see this extra room and the object’s private dictionary:

>>> class C: pass
... 
>>> o = C()
>>> o.foo = 'bar'
>>> o
<__main__.C object at 0x846b06c>
>>>
# break into GDB, see 'metablogging'->'tools' above
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0012d422 in __kernel_vsyscall ()
(gdb) p ((PyObject *)(0x846b06c))->ob_type->tp_dictoffset
$1 = 16
(gdb) p *((PyObject **)(((char *)0x846b06c)+16))
$3 = {u'foo': u'bar'}
(gdb) 

We have created a new class, instantiated an object from it and set some attribute on the object (o.foo = 'bar'), broke into gdb, dereferenced the object’s type (C) and checked its tp_dictoffset (it was 16), and then checked what’s to be found at the address pointed to by the pointer located at 16 bytes’ offset from the object’s C-structure, and indeed we found there a dictionary object with the key foo pointing to the value bar. Of course, if you check tp_dictoffset on a type which doesn’t have a __dict__, like object, you will find that it is zero. You sure have a warm fuzzy feeling now, don’t you.

What could be misleading here is how type dictionaries and the common instance dictionaries look so similar, while in fact they’re rather differently implemented and allocated. Several riddles remain. Let’s recap and see what we’re missing: I define a class C inheriting object and doing nothing much else in Python, and then I instantiate o from that class, causing the extra memory for the dictionary pointer to be allocated at tp_dictoffset (room is reserved from the get-go, but the dictionary itself is allocated on first use of any kind; smart buggers). I then type in my interpreter o.__dict__, which byte-compiles to the LOAD_ATTR opcode, which causes the PyObject_GetAttr function to be called, which dereferences the type of o and finds the slot tp_getattro, which causes the default attribute searching mechanism described earlier in this post and implemented in PyObject_GenericGetAttr. So when all that happens, what returns my object’s dictionary? I know where the dictionary is stored, but I can see that __dict__ isn’t recursively inside itself, so there’s a chicken and egg problem here; who gives me my dictionary when I access __dict__ if it is not in my dictionary?

Someone who has precedence over the object’s dictionary – a descriptor. Check this out:

>>> class C: pass
... 
>>> o = C()
>>> o.__dict__
{}
>>> C.__dict__['__dict__']
<attribute '__dict__' of 'C' objects>
>>> type(C.__dict__['__dict__'])
<class 'getset_descriptor'>
>>> C.__dict__['__dict__'].__get__(o, C)
{}
>>> C.__dict__['__dict__'].__get__(o, C) is o.__dict__
True
>>> 

Gee. Seems like there’s something called getset_descriptor (it’s in ./Objects/typeobject.c), which are groups of functions implementing the descriptor protocol and meant to be attached to an object placed in type’s __dict__. This descriptor will intercept all attribute access to o.__dict__ on instances of this type, and will return whatever it wants, in our case, a reference to the dictionary found at the tp_dictoffset of o. This is also the explanation of the dict_proxy business we’ve seen earlier. If in tp_dict there’s a pointer to a plain dictionary, what causes it to be returned wrapped in this read only proxy, and why? The __dict__ descriptor of the type’s type type does it.

Resuming that example:

>>> type(C)
<class 'type'>
>>> type(C).__dict__['__dict__']
<attribute '__dict__' of 'type' objects>
>>> type(C).__dict__['__dict__'].__get__(C, type)
<dict_proxy object at 0xb767e494>

This descriptor is a function that wraps the dictionary in a simple object that mimics regular dictionaries’ behaviour but only allows read only access to the dictionary it wraps. And why is it so important to prevent people from messing with a type’s __dict__? Because a type’s namespace might hold them specially named methods, like __sub__. When you create a type with these specially named methods or when you set them on the type as an attribute, the function update_one_slot will patch these methods into one of the type’s slots, as we’ve seen in 101 for the subtraction operation. If you were to add these methods straight into the type’s __dict__, they won’t be wired to any slot, and you’ll have a type that looks like it has a certain behaviour (say, has __sub__ in its dictionary), but doesn’t behave that way.

How sad it is that we’re way over the dreaded attention span deadline of 2,000 words, and I still didn’t talk about __slots__, which I really wanted to. How about you go read about them yourself, cowboy? You have at your disposal everything you need to figure them out alone! Read the aforementioned link, play with __slots__ a bit in the interpreter, maybe look at the code or research them in gdb; have fun. In the next post, I think we’ll leave objects alone for a bit and talk about interpreter state and thread state. I hope it sounds interesting, but even if it doesn’t you really should read it – chicks totally dig people who know about these matters, I can certainly promise you that much.

Python’s Innards: Objects 101

2010/05/12 § 16 Comments

As I said in the introduction to this series (which was rather successful; thank you everyone, your hits and comments literally keep me going!) – today’s post will be about Python 3.x’s implementation of objects. When I set out to write this post, I thought we’ll start with objects because it would be a gentle start for our series. After reading all the code I had to read to write this post, I can hardly say Python’s object system is, uhm, ‘gentle’ (and I definitely can’t say I grokked it). However, if anything, I was only further convinced that the implementation of objects is a good place for us to start because as we’ll see in later posts it’s so fundamental to the innards of Python, yet I suspect it’s not very often understood in its full glory even among Python veterans. Objects are not very tightly coupled with anything else in Python (I hardly visited ./Python for this post, I frequented mostly ./Objects and ./Include), so I found it easiest to look at the implementation of objects as if they’re unrelated to the ‘rest’, as if they’re a general purpose C API for creating an object subsystem. Maybe you will benefit from that line of thought, too: remember these are just a bunch of structures and some functions to manipulate them.

Without further ado, let’s start. Mostly everything in Python is an object, from integer to dictionaries, from user defined classes to built-in ones, from stack frames to code objects. Given a pointer to a piece of memory, the very least you must expect of it to treat it as an object are just a couple of fields defined in a C structure called ./Include/object.h: PyObject:

typedef struct _object {
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

Many objects extend this structure to accommodate other variables required to represent the object’s value, but these two fields must always exist: a reference count and type (in special debug builds, a couple other esoteric fields are added to track references).

The reference count is an integer which counts how many times the object is referenced. >>> a = b = c = object() instantiates an empty object and binds it to three different names: a, b and c. Each of these names creates another reference to it even though the object is allocated only once. Binding the object to yet another name or adding the object to a list will create another reference – but will not create another object! There is much more to say about reference counting, but that’s less central to the overall object system and more related to Garbage Collection. I’d rather consider writing a separate post about that later than elaborate here, we’ve a heady post ahead of us. However, before we leave this subject for now, I’d just like to note that we can now better understand the ./Include/object.h: Py_DECREF macro we’ve seen used in the introduction and didn’t know how to explain: It simply decrements ob_refcnt (and initiates deallocation, if ob_refcnt hit zero). That’s all we’ll say about reference counting for now.

Which leaves us with ob_type, a pointer to an object’s type, a central piece of Python’s object model (bear this deep in mind: in Python 3, type and class effectively mean the same thing; historic reasons lead to the usage of one name over the other depending on context). Every object has exactly one type, which never changes during the lifetime of the object (under extremely unusual conditions it can be changed, no API exists for that and if you handle type-changing objects you’re not likely to be reading this). Possibly most importantly, the type of an object (and only the type of an object) determines what can be done with an object (see the collapsed snippet at the end of this paragraph for a demonstration in code). Recall from the introduction that when the interpreter evaluates the subtraction opcode, a single C function (PyNumber_Subtract) will be called regardless of whether its operands are an integer and an integer, an integer and a float or even something nonsensical (subtract an exception from a dictionary).

# n2w: the type, not the instance, determines what can be done with an instance
>>> class Foo(object):
...     "I don't have __call__, so I can't be called"
... 
>>> class Bar(object):
...     __call__ = lambda *a, **kw: 42
... 
>>> foo = Foo()
>>> bar = Bar()
>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Foo' object is not callable
>>> bar()
42
# will adding __call__ to foo help?
>>> foo.__call__ = lambda *a, **kw: 42
>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Foo' object is not callable
# how about adding it to Foo?
>>> Foo.__call__ = lambda *a, **kw: 42
>>> foo()
42
>>> 

Initially, this is a peculiar thing. How can a single C function be used to handle any kind of object that is thrown at it? It can receive a void * pointer (actually it receives a PyObject * pointer, which is also opaque insofar as the object’s data is concerned), but how will it know how to manipulate the object it is given? In the object’s type lies the answer. A type is in itself a Python object (it also has a reference count and a type of its own, the type of almost all types is type), but in addition to the refcount and the type of the type, there are many more fields in the C structure describing type objects. This page has some information about types as well as type‘s structure’s definition, which you can also find it at ./Include/object.h: PyTypeObject, I suggest you refer to the definition occasionally as you read this post. Many of the fields a type object has are called slots and they point to functions (or to structures that point to a bunch of related functions). These functions are what will actually be called when Python C-API functions are invoked to operate on an object instantiated from that type. So while you think you’re calling PyNumber_Subtract on both a, say, int and a float, in reality what happens is that the types of it operands are dereferenced and the type-specific subtraction function in the ‘subtraction’ slot is used. So we see that the C-API functions aren’t generic, but rather rely on types to abstract the details away and appear as if they can work on anything (valid work is also just to raise a TypeError).

Let’s see this play out in detail: PyNumber_Subtract calls a generic two-argument function called ./Object/abstract.c: binary_op, and tells it to operate on the number-like slot nb_subtract (similar slots exists for other functionality, like, say, the number-like slot nb_negative or the sequence-like slot sq_length). binary_op is an error-checking wrapper around binary_op1, the real ‘do work’ function. ./Objects/abstract.c: binary_op1 (an eye-opening read in itself) receives BINARY_SUBTRACT‘s operands as v and w, and then tries to dereference v->ob_type->tp_as_number, a structure pointing to many numeric slots which represents how v can be used as a number. binary_op1 will expect to find at tp_as_number->nb_subtract a C function that will either do the subtraction or return the special value Py_NotImplemented, to signal that these operands are ‘insubtracticable’ in relation to one another (this will cause a TypeError exception to be raised).

If you want to change how objects behave, you can write an extension in C which will statically define its own PyObjectType structure in code and fill the slots away as you see fit. But when we create our own types in Python (make no mistake, >>> class Foo(list): pass creates a new type, class and type are the same thing), we don’t manually allocate a C structure and we don’t fill up its slots. How come these types behave just like built-in types? The answer is inheritance, where typing plays a significant role. See, Python arrives with some built-in types, like list or dict. As we said, these types have a certain set of functions populating their slots and thus objects instantiated from them behave in a certain way, like a mutable sequence of values or like a mapping of keys to values. When you define a new type in Python, a new C structure for that type is dynamically allocated on the heap (like any other object) and its slots are filled from whichever type it is inheriting, which is also called its base (what about multiple inheritance, you may ask? some other post, I might answer). Since the slots are copied over, the newly created sub-type has mostly identical functionality to its base. Python also arrives with a featureless base object type called object (PyBaseObject_Type in C), which has mostly null slots and which you can extend without inheriting any particular functionality.

So you never really ‘create’ a type in pure Python, you always inherit one (if you define a class without inheriting anything explicitly, you will implicitly inherit object; in Python 2.x, not inheriting anything explicitly leads to the creation of a so called ‘classic class’, which is out of our scope). Of course, you don’t have to inherit everything. You can, obviously, mutate the behaviour of a type created in pure Python, as I’ve demonstrated in the code snippet earlier in this post. By setting the special method __call__ on our class Bar, I made instances of that class callable. Someone, sometime during the creation of our class, noticed this __call__ method exists and wired it into our newly created type’s tp_call slot. ./Objects/typeobject.c: type_new, an elaborate and central function, is that function. We shall revisit that function, at length, in “Objects 102” (or 103, or 104…), but for now, let’s look at a small line right at the end, after the new type has been fully created and just before returning: fixup_slot_dispatchers(type);. This function iterates over the correctly named methods defined for the newly created type and wires them to the correct slots in the type’s structure, based on their particular name (where are these methods stored? later!).

Another thing remains unanswered in the sea of small details: we’ve demonstrated already that setting the method __call__ on a type after it’s created will also make objects instantiated from that type callable (even objects already instantiated from that type). How come? With elegance, my friends. Recall that a type is an object, and that the type of a type is type (if your head is spinning, try: >>> class Foo(list): pass ; type(Foo)). So when we do stuff to a class (I could use the word type instead of class just as well, but since we use the word ‘type’ so much in different context, I’ll call our type a class for a bit), like calling a class, or subtracting a class, or, indeed, setting an attribute on a class, what happens is that the class’ object’s ob_type member is dereferenced, finding that the class’ type is type. Then the type->tp_setattro slot is used to do the actual attribute setting. So a class, like an integer or a list can have its own attribute-setting function. And the type-specific attribute-setting function (./Objects/typeobject.c: type_setattro, if you didn’t know its name and want to befriend it in Facebook) calls the very same function that fixup_slot_dispatchers uses to actually do the fixup work (update_one_slot) after it has set a new attribute on a class. Another details sussed out!

This is it as far as our introduction to Python objects is concerned. I hope you enjoyed the ride and that you’re still with me. I have to admit this post was far harder to write than I initially imagined (and without Antoine Pitrou‘s and Mark Dickins‘s jolly help late at night in #python-dev, I might have given up!). I’ve left so much out that I’m appalled: which operand’s slot is used in binary opcodes? what happens in multiple inheritance and all the goryminute details it entails? what about this thing called metaclass? what about __slots__ and weak references? what about the actual implementation of all the built-in objects, how do the dictionary, the list, the set and their friends actually do their functional work? And finally, how about this little gem:

>>> a = object()
>>> class C(object): pass
... 
>>> b = C()
>>> a.foo = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'foo'
>>> b.foo = 5
>>>

How come I can set an arbitrary attribute to b, which is an instance of C, which is a class inheriting object and not changing anything, and yet I can’t do the same with a, an instance of that very same object? Some wise crackers can say: b has a __dict__ and a doesn’t, and that’s true, but where in Guido’s name did this new (and totally non-trivial!) functionality come from if I didn’t inherit it?!

Ha! I’m delighted you asked! The answers are coming, but at another post.


A short reading list on the matter, for those so inclined: Try the documentation of the data model for an explanation on how to use things from Python; the C-API documentation about abstract and concrete objects for an explanation on using things from C; descrintro, AKA Unifying types and classes in Python 2.2, an archeological finding which is long, mind-boggling and hugely relevant (I argue it should be added as an easter egg to the interpreter, I suggest >>> import THAT); but above all else, read ./Objects/typeobject.c. Read it over and over, until you cry yourself to sleep. Sweet dreams.

Where Am I?

You are currently browsing entries tagged with objects at NIL: .to write(1) ~ help:about.