2010/04/23 § 11 Comments
Oddly enough, just a week or two after I wrote the post “Contributing to Python“, Jesse Noller (of multiprocessing fame) wrote a post called “Why aren’t you contributing (to Python)?“. As a somewhat wannabe contributor, I think it’s a terrific question which deserves wordy and open answers (hey, just like this one). I realize the question has been asked specifically in the context of python.org’s development, but I think some of the answer applies to the whole Python eco-system, alternative implementations and libraries/frameworks included. Before we begin, I think that to answer the question, some distinction should be made about two rather different kinds of contributions.
The first is driven by whatever it is that you’re contributing. Suppose you happen to hack about a better GIL arbitration algorithm, or a sprawling and insanely rich asynchronous networking framework. You want that stuff to get out. I’d go so far as to say the source code itself wants to be out. It’s your baby, more often than not you think it’s great and unless something bad happens, you’ll push it. These are the kinds of things where you’re likely to find yourself obsessing over the stats of the website describing your latest gizmo or taking it personally when some loser on python-dev (like, say, that van-what’s-his-name-guy) says your implementation of goto for Python is not quite the next best thing since sliced lists.
The other, rather different kind, is that you run into something that is rather obviously a bug and wish to open-a-ticket-that-has-a-good-chance-to-be-committed for it. First of all, this is usually a far smaller patch. I doubt many people import threading, discover The Beazley Effect, rework the GIL and open a ticket with a patch. The use-case here is more like “I have a reproducible SIGSEGV” or “I wish import zipfile would support ZIP64″. Two interesting observations about this case: first, people are far less committed to their contribution, and second, more importantly, the realities of life dictate that the J. Random Hacker who ran into this either found a workaround or patched their own Python, so they sidestepped the bug. This is important. In practically all interesting cases, the reporter has already sidestepped the bug before or shortly after posting (sidestepped is a loose term, maybe they even moved to Perl…). I doubt anyone’s schedule is loose enough to allow them to wait without sidestepping a painful thorn even for the next bugfix release. This is a hundred times more true for documentation touchups – if you realized it’s wrong, you probably don’t have to fix it to keep working, you just use whatever knowledge you now know is right.
A rather pathological, tertiary case is the “I am not-Python-core, I have some free time, I wanna contribute to Python and I went bug-hunting in the tracker” one. I think its a pathological case of the second kind of contribution, and one that I suspect happens rather rarely. I’ll lump these two together.
If you agree so far, that we have a commit-driven-contribution (“damn this is so awesome I want this to be part of Python/twisted/Django/etc”) and a contribution-driven-commit (“damn Python is so awesome, it’s a shame to leave this wart unfixed, I’ll help”). As I said, I think very different reasons prevent people from doing either. I’ll start talking about the latter kind, both because it seemed to be the focus of Jesse’s original post and because it’s easiest to answer.
First, almost everything Jesse listed as reasons is true. Don’t know how, don’t know where, etc, almost all true. The best remedy here is to get as many people as possible to have, ugh, “broken their contribution cherry”, so to speak. The easier it will be to submit minor fixes for the first time, the more people will do it. The first time is important, psychologically and setup-ly. I think after a patch of yours has been committed, the fear of the technical part process is gone and the feeling of “gee, I can actually put stuff in Python!” kicks in, and you’re far more likely to start submitting more small patches. So if you want many more people to help with mundane issues, documentations touchups, etc, the community at large should make every effort to make this first time sweet.
How do we make it sweet? I don’t know for sure, but here is a short flurry of ideas which I’d be happy to discuss (and aid implementing!):
- Easy step-by-step instructions for opening a bug report, submitting a patch, for Ubuntu, OSX and Windows, all concentrated in one place, from setup to bug tracker attachment. The “contributing to Python” post I mentioned earlier is a (small) step in what I think is the right direction. We can flesh it out a lot, but make sure it keeps the step-by-step cookbook get-it-done approach, rather than what exists today, which is good, but isn’t aimed at getting-things-done. Compare signing up to Facebook with applying for a Tourist Visa in some foreign country.
- Small-time-Python-contribution-talks material to be made available. This is both to be consumed online in web-talks, but mainly aims to reach out and encourage such talks in LUGs and highschools/colleges (hmm, I love this idea, I should do this sometime…).
- A bit on a limb here, but maybe even doing what’s possible to optimize the review process in favour of first-time contributors. This is quite debatable, and (intentionally) vague, but I cautiously think it will pay off rather quickly.
These means (and probably others I didn’t think of) could probably alleviate the problem of a “contribution-driven-commit”, as I called it. Which leaves us with your fabulous implementation of goto, or “commit-driven-contribution”. I think two factors come into play here, both of them nearly irrelevant for the previous type of contribution. The first is the feeling that whatever it is you’ve done, it’s not good enough (this usually breaks my balls). “Me? Send this? To python-dev? Get outta here.”. And the second, I think, is indeed the feeling of an ‘uphill battle’ against grizzled python-dev grey beards and sharp tongued lurkers that are, I suspect, more likely than not to shred your idea to bits. Let’s face it, hacker communities at large are pretty harsh, and generally for understandable reasons. However, I think at times this tough skin and high barrier for contributing anything significant hurts us.
I used to have a co-worker, a strong hacker, who made two significant open-source packages for Python. I humbly think both are exceptionally elegant and at least one of them could have been a strong addition to stdlib. Every time I hear/read the code of some poor soul who recreated the efforts of this guy with these two packages, I cringe. I wouldn’t like to disclose his name before talking to him, but when I asked him why aren’t these packages part of stdlib, he said something like: “blah, unless you’re part of the python-dev cognoscenti you’ve no chance of putting anything anywhere”. I think he might not have pushed these packages hard enough, I should raid python-dev’s archives to know, but looking at the finesse of these packages on the one hand, and the number of questions on #python at freenode which I can answer by uttering these packages’ names, I think maybe we’re missing out on something. His perception, even if downright wrong (and I suspect it isn’t accurate, but not so wrong) is the bad thing that can happen to make you not contribute that big masterpiece you’ve made in your back yard, and that’s a damn shame. Most people will not survive the School of Hard Knocks, and that’s not necessarily always a good thing.
The issue of contributing big stuff is far more delicate and complex, and I’d be the first to admit I’m probably not rad enough to even discuss it. But it feels wrong to write a post under a heading like the one this one bears without at least mentioning this hacker-subculture-centric and sensitive issue, which affects many OSS projects, and Python as well. Micro-contributions are important, they’re the water and wind which slowly erode the landscape of a project into something cleaner, stabler and more elegant, but let’s not forget them big commits from which the mountains are born, too.
So Jesse, or any other curious soul, this is my answer to you. Should the gauntlet be picked up (by you, me or both of us) regarding the list of items I suggested earler about making micro-contributions more accessible? How about taking this to python-dev?
2010/04/08 § 6 Comments
So you have some free time to donate to Python, and don’t know where to start. Good. If you follow these cookbook instructions, you can also be a Contributor to Python™ 60 minutes from now. Most of what I’ll cover here is covered elsewhere (I’ll link), but this concentrates everything together so you really don’t have any excuses for not contributing anymore. I assume you have at least passing knowledge in Python or C, any decent editor, any version control system (like svn, git, etc), any bug-tracking system and a typical build process (./configure, make, etc; or your platform’s equivalent). You don’t have to be an expert, but if this is your first day in any of these, maybe this isn’t the right document for you.
Let’s get our gear in order. In my experience, it’s easiest to get ready to start using Ubuntu, and Ubuntu examples are what I’m going to give. If you use something else you should do your platform’s equivalent (or, uh, ‘upgrade’…). I assume you have a toolchain (compiler, linker, etc) installed, so start by making sure you have svn (sudo apt-get install subversion), then checkout one of Python’s latest development trees. Checking out is covered here, but I found that explanation a bit tiring. So, briefly: for the foreseeable future, you’re going to want to work either on trunk (Python 2.x development branch, currently 2.7) or on py3k (Python 3.x development branch, currently 3.2). The commands you need are these two:
svn co http://svn.python.org/projects/python/branches/py3k svn co http://svn.python.org/projects/python/trunk
Read below for stuff to do during the checkout, but as soon as Subversion will be done, you best configure and compile the source, to see that you can build Python. I assume you won’t be installing this copy of Python at all, so you can simply run ./configure && make without fiddling much with --prefix et al. By default, Python will not build some of the things you may be used to from whichever build you’ve been using so far. Fixing that isn’t mandatory for many bugs, but if you run into ‘missing features’, note two things. First, to include some modules you’ll have to edit ./Modules/Setup.dist and enable whichever modules you want. Second, Python may finish the build process successfully, but issue a notice like this towards the end of the build:
Python build finished, but the necessary bits to build these modules were not found: _curses _curses_panel _dbm _gdbm _sqlite3 _ssl _tkinter bz2 readline To find the necessary bits, look in setup.py in detect_modules() for the module's name.
The most likely reason is that you don’t have the development version of the listed packages installed. I made almost everything listed above disappear on my system with sudo apt-get install libreadline-dev libbz2-dev libssl-dev libsqlite3-dev libncurses-dev libgdbm-dev. Re-read the error message and the apt-get I issued, you’ll see it’s rather easy to guess the development package name from Python’s error message. Much like you’ve done during the checkout, read below for stuff to do during the configuration / build. If your internet connection is blazing and your CPU screams or you otherwise took your time in choosing a bug and the build process finished before you chose a bug, you can also run the full test suite: ./python -m test.regrtest -uall and see that nothing beyond the really esoteric breaks.
Finding a bug to work on is not hard, but it’s not trivial. Register with the Python bug-tracking site and go to the search page. For a start, I recommend to use this search: Stage: needs patch; Keyword: easy. It’s easiest to start afresh (no patch submitted with the bug), and the bugs marked ‘easy’ are easy. You will notice that not many bugs return from the search – that’s because so many easy bugs already have a patch issued to them; we’ll get back to that later. Of the returned bugs, I chose issues #6610 and #7834. #6610 reports a potential flaw in subprocess; turns out that when forking out a coprocess from a process with closed standard streams, there’s a subtlety in dup2()-ing the child’s standard streams from the pipes the parent inherited to it. #7834 reports an issue with socketmodule.c‘s creation of AF_BLUETOOTH sockets; the implementation of getsockaddrarg() is fragile as it doesn’t zero out a returned sockaddr_l2 struct which might be left with garbage.
First, make sure you understand the bug. For #6610 I read an article that was linked to the bug about the subtlety of coprocess forking for daemons, as well as the relevant code in Python (here and here, the relevant bits are near the dup2 calls). Once you understand what happens, and sometimes even if you're not sure, the best thing is to write test code that will trigger the bug. I didn’t want to touch Python’s test suite quite yet, so I jotted a small script in /tmp that will create a parent with closed stdio streamed that then fork a /bin/cat coprocess. Writing the test forced me to think harder about the bug, and to realize that… it isn’t. Bug #6610 reports a potential bug, but as it turns out, the author(s) of subprocess already protected themselves against the bug’s manifestation. My test code, assuming it was correct, proved my hypothesis – the test worked fine.
Case closed, no? No. While its possible to comment “sorry, not a bug, move along”, I wanted to nail the bug. From what I’ve seen on other bugs, nothing will convince a trusted developer to close the bug more than working coverage of the alleged bug in Python’s suite of unit test. Before touching a test, make sure it runs on your system, otherwise you might waste lots of time hunting issues. After running the subprocess test, I took my jotted script from earlier, cleaned it up and fit it in Lib/test/test_subprocess.py. Important! Python, rightfully, is rather anal about the style of code they accept into the repository (tests are code just like any other code). Save yourself lots of grief, and make certain you abide to PEP8 and PEP7. You don’t have to read them to the letter to do small fixes, but make sure you caught the spirit of what’s in there and especially that your new code fits with its old surrounding as far as style is concerned. Once I was happy with how the new test looked, I re-ran this particular test (./python -m test.regrtest test_subprocess), created an svn patch (running svn diff | tee ~/test_subprocess_with_standard_fds_closed.diff after I modified subprocess’ test was all it took) and posted the patch along with the description of what I did and found.
#7834 is simpler in most regards but much harder in one important aspect. It’s rather obvious the person who submitted the bug really ran into it in real life and probably knows more about Bluetooth sockets than I do. Fixing the bug is trivial, but writing a test for it that will run even without using actual Bluetooth hardware is not. I looked into bluez’ sources and #bluez on Freenode for help, didn’t see a managable way to do it, prepared just a patch of the fix and submitted it. I did however mention my attempt at finding a way to write a test, so whomever reviews my bug will see the complexities involved and decide for themselves whether or not this testless fix is worthy of a commit.
Ha! Two bugs clos… no. Not closed, just updated in the bug tracker. Truth is, most easy/medium bugs have some patch waiting for them, and are pending review. It is my meager understanding that if you really want to help out, reviewing some of these patches, cleaning them up, adding tests and updating documentations are probably the best things you can do for Python. Python could use an alleviation of the GIL, but if you’re an early and casual contributor, I doubt you’ll be the one to do it. What you can do, especially if you’re a student or otherwise have time and are keen to learn, is to improve as many bugs as you can into pristine condition. The less small flaws there are in a bug, the more mature it looks and it becomes easier it is for a trusted/core developer to just commit them and be done with it. You’d like someone to do it with your bugs, why not do it for someone else. Besides, soon enough you’ll be tangled enough being nosy with so many bugs that you’ll get deeper into the scene, find more interesting aspects of seemingly simple bugs, and so on and so forth, until one day you can
overthrowkneel before the BDFL and be knighted to say Ni!
Happy bug hunting!