Tools
This page documents (some) of the tools I find useful as I tinker with whatever it is I blog about. Read all about it here. Unless noted otherwise, my platform is Ubuntu, my shell is zsh and my editor is vim (this was written a while ago, I hate admitting I’m on OSX without explaining a bit, but yeah, I’m on OSX now). I know that tools are sometimes a very personal issue for many a hacker and I understand that your shell/editor/distribution is far far better and scales more and has a better license and stuff. No need for these kinds of comments, keep it cool and to the point please. Eventually, hackerdom is also about freedom – if you like the methods I describe below, use them. If not, don’t. And keep using the chomping pieces of binary crap and idiotic methods that you are using today… :)
Use the source, Luke
Obviously, for Python’s Innards, my series on Python internals, I’ve read some C source code. I’m not an expert on the matter, but I’m happy with my source browsing method at this time and figured I might share it. Suppose I’m reading Python 3.1, and the python3.1-dbg package of Ubuntu 10.04 gives me Python 3.1.2. So I obviously download the source of 3.1.2 as a tarball and extract it somewhere. Before anything else, I create a git (hg/bzr/whatever) commit of that source as-is, so if I later change something, I can easily revert / see what I did. Then I use GNU Global ($ apt-get install global) to cross-reference the source code by running $ gtags at the directory where I extracted the source (remember to add the Global index directories to your version control system’s ignore-file, so you won’t see them popping in the status all the time).
I’ve also installed the Global/VIM plugin, and added these lines to my .vimrc:
" mappings of the global source-tagging system map <c-n> :cn<cr> map <c-p> :cp<cr> map <c-\> :GtagsCursor<cr>
Now, if I want to search for an identifier I can type :Gtags PyNumber_Subtract in vim, and get a quickfix window with all the matching files. Ctrl-n and Ctrl-p navigate to the next/previous match. Far more useful is Ctrl-\, which does a Gtags search for whatever is under the cursor. Also, if you’ve jumped around chasing some function, you can use Ctrl-o/Ctrl-i to go backwards/forwards in locations you’ve been to in the code (think browser back/forward). I don’t think I could read C code without GNU Global (or something equivalent).
gdb is your friend
A powerful aid to reading the source is stepping through whatever it is you’re reading using gdb (in my case it’s most commonly Python…). Make sure you install the -dbg packages of whatever it is you’re tracing and make very sure the source code you have matches the exact version of what you’re tracing. A very common pattern for me is to run the Python interpreter under gdb, reach some state which interests me and then break back to the debugger. To force my gdb-debugged Python to drop into gdb prompt, I send it the SIGTRAP signal with kill or pkill. This is so common, that I have this alias set up in my shell: $ alias P3D='gdb -quiet --eval-command=run python3.1-dbg' (P3D means “Python 3 Debug”, I have a similar alias for P2D). Also, I have this command assigned to the “multimedia stop key” on my keyboard: $ pkill -SIGTRAP 'python[23]\.[16]-dbg'. So all I need to do to get a Python prompt ready to be broken into is $ P3D, and all I need to break into it is press the “stop” key. Expand the next snippet for sample usage.
[monica:Python] % P3D (1:27:40:teolicy:(master)) Reading symbols from /usr/bin/python3.1-dbg...done. Starting program: /usr/bin/python3.1-dbg [Thread debugging using libthread_db enabled] Python 3.1.2 (r312:79147, Apr 15 2010, 13:09:16) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class Foo: ... def __sub__(self, other): ... return 42 ... >>> # press "multimedia stop" button Program received signal SIGTRAP, Trace/breakpoint trap. 0x0012d422 in __kernel_vsyscall () (gdb) b abstract.c:PyNumber_Subtract Breakpoint 1 at 0x81c5450: file ../Objects/abstract.c, line 925. (gdb) c Continuing. # Python is unaware of the gdb session, so hit [Return] to get another prompt >>> Foo() - 5 Breakpoint 1, PyNumber_Subtract (v=<foo at="" remote="" 0x846d29c="">, w=5L) at ../Objects/abstract.c:925 925 BINARY_FUNC(PyNumber_Subtract, nb_subtract, "-") (gdb) # sweet
It’s often useful to make note of the memory location of an object (remember that types, stack frames and code objects are also objects) so you could look at it from gdb. the id builtin function is specified to return a “unique identifier” for any given object, but at the time of this writing (and in the foreseeable future), CPython’s implementation of id simply returns the address of the object. Thus, you can do >>> a = object() ; hex(id(a)), and get the address of a. Suppose you received the address 0x12345678, now you can conveniently do something like this in gdb: (gdb) p ((PyObject *)(0x12345678))->ob_refcnt to get the reference count of an object (or any other field of its C structure or linked structures). Useful.
import dis
>>> import dis imports Python’s standard library’s disassembler, an invaluable tool in researching CPython’s inner workings. dis.dis is a function that can disassemble classes, functions, code objects, etc. At times, it’s a bit unwieldy to define a function only to disassemble it, so I created a helper function called diss (disassemble string), which tries to disassemble anything by transforming several known types to a code object (a function has a code object in it, a string can be compiled, etc). I later added ssc (string show code), which does something similar with dis.show_code.
This is useful enough that I have this alias set up: $ alias P3B='PYTHONPATH=~/Projects/python-internals/pythonrc python3.1 -ic "from blog import *"' This gives me a clean Python prompt (no copyright message, etc) with blogging-specific utility functions imported (at the moment, the only one I have is diss). I later found out about issue #6507 which aims to add this functionality to Python 3.2. Even later yet again I added recursive capability to diss and ssc, so now if you try to operate on a function that has a function defined within, both can be inspected in one go. You can find diss and its friends in the repository of my “Python’s Innards” series, it’s really not all that exciting.
Ubuntu and apt-get = winnage
I’ve been a long time Debian user, then switched to OSX for about four years and then switched to Ubuntu. I don’t know what’s the state of other distributions, and maybe it’s just modern times and not at all the doings of Ubuntu, but it feels to me like everything is configured just a bit better with Ubuntu. If I install vim, it comes with the plugins I’d usually install manually. If I install python3.1-dbg, it comes with all sorts of goodies to tinker with Python. YMMV, I know distribution is a touchy subject, but even if I return to OSX at some point (I’m travelling for a long while at the time of this writing and I don’t feel comfortable carrying an expensive Mac with me) – I know all my future research work will be done on Ubuntu. As far as development/research is concerned, everything is so bloody… easy.
Thanks for these! It is really helpful when understanding (debugging) what is going on when dealing with ctypes to wrap third-party libraries with stranges structures…