Switching to mercurial: taming zsh

2010/05/14 § 12 Comments

A quick one, so you’ll know not all my posts must have so many words. PEP 385 is materializing, and its time to learn Mercurial. I can’t say I’m a Mercurial expert, but I thought migrating all my git-oriented-zsh-gizmos would help me along the way. The conversion is almost done and had just one somewhat noteworthy tidbit.

A while back I copied from a friend a rather elaborate zsh prompt (not as elaborate as some people’s…), which includes my current git branch (if any) in it. The code to make it looked like this:

parse_git_branch() {
        git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}

function _rprompt_git {
    local git_branch="$(parse_git_branch)"
    if [ -n "$git_branch" ]; then
        result=":%{$fg[blue]%}$git_branch$rst"
    fi
    echo -n $result
}

I’ve na├»vely added a Mercurial equivalent to parse_git_branch using hg bookmarks or hg branch, and retrofitted/redubbed _rprompt_git into _rprompt_dvcs. It worked well, but was slow. I’m not the kind of person to give hg grief over reasonable speed differences with the speed monster, but you can’t wait 0.15 seconds for your friggin’ prompt, now can you (this is not an invitation for a git/hg/bzr performance holy war in the comments, people). Removing a call to hg by using $ hg root just once, storing that value and using cat to get the actual branch/bookmark didn’t speed things up enough. #mercurial on freenode was kind but didn’t know how to help, other than suggest I buy a really fast computer… Blah, I’ll have to jot something out in C.

One bitbucket account later and a bit of tinkering led to fast_hg_root, which is a dumb C program which acts as a (fast) replacement to hg root. So now my dvcs-prompt-related code includes:

parse_git_branch() {
        git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}

parse_hg_branch() {
        if ! HG_ROOT=$(fast_hg_root) 2> /dev/null; then
            # not an HG repository, quit
            return
        fi
        BOOKMARK=$(cat "$HG_ROOT"/.hg/bookmarks.current 2> /dev/null)
        if [ -n "$BOOKMARK" ]; then
            # have a current bookmark, display that
            echo $BOOKMARK
            return
        fi
        # display the current branch or 'default'
        cat "$HG_ROOT"/.hg/branch 2> /dev/null || echo 'default'
}

function _rprompt_dvcs {
    local git_branch="$(parse_git_branch)"
    if [ -n "$git_branch" ]; then
        result=":%{$fg[blue]%}$git_branch$rst"
    else
        local hg_branch="$(parse_hg_branch)"
        if [ -n "$hg_branch" ]; then
            result=":%{$fg[green]%}$hg_branch$rst"
        fi
    fi
    echo -n $result
}

Which leads to my terminal sessions on my netbook monica looking like so:
DVCS aware zsh prompts

C’est tout. If someone is interested, I’ll open a repository with more of my .zsh.d stuff. By the way, anyone knows of a good way to capture a terminal into HTML, preserving ANSI color? I know of ttyrec and found HTML::FromANSI, but was hoping for a finished program or at least, uhm, a library less in Perl. :p

Advertisements

Searching mailman archives offline (python-dev, anyone?)

2010/05/07 § 8 Comments

Since I’m a newcomer to python-dev, I often need to search the python-dev mailman archives. While I did find this way to do it (using Google with site:), it’s no good for offline searches (and at best it’s a kludge for online searches, too, IMHO). I’m offline quite a lot these days, since cellular 3G isn’t what it never used to be, and as long as I’m travelling the world, pre-paid cellular 3G is even worse. So, I set out looking for a proper solution to search python-dev in an offline manner.

Initially I just downloaded the whole mailing list archive with the shell-concoction listed below, and used grep to fish out what I needed. Obviously, if you’re reading this to look into something which isn’t python-dev (but why would you?!), replace MAILMAN_URL with wherever the mailing list you care about is archived.

MAILMAN_URL=http://mail.python.org/pipermail/python-dev/
for FILENAME in $(wget -O - -q $MAILMAN_URL |
                         egrep -o 'href="[^"]+.txt.gz"' |
                         cut -f2 -d\" )
do
    wget $MAILMAN_URL/$FILENAME
    gunzip $FILENAME
done

Naturally, after a (short) while I realized I need a proper mailbox search utility. I’ve been using Debian for ages, but the pure richness of Ubuntu’s repositories has only recently made my brain rewire ‘task: find new software’ to apt-cache search. So I did, and indeed apt-cache search mbox search found mairix, a “a program for indexing and searching email messages stored in maildir, MH or mbox folders.”. Sweet.

mairix has a slightly odd usage pattern and is geared towards people fleunt in mutt (which I’m not) so it, ugh, took me a while to realize it’s the tool I need and how to use it (gory details below). To sum things up, with mairix, you (a) index all the mail you’d like to search in one invocation and (b) run mairix with a search query, which creates a new mailbox (mbox/Maildir/MH) only with the results. You can later view that mailbox with your favorite reader, but the only one that I know of that would make sense in this context is probably, indeed, mutt.

Initially I set mairix up to index the mboxes as they were, but then I realized that due to the limitations of the mbox format, mairix has to copy every matching message to the results mailbox. If I were to use Maildir, for example, where every message is a file, it would generate a search-result-Maildir made of symlinks, which sounds better. So how do you convert all these mbox’s to Maildir? apt-cache search convert mbox maildir found mb2md for me. I placed all the mbox’s in a directory called mbox, and created a directory called maildir, and ran: mb2md -s $(pwd)/mbox -d $(pwd)/maildir. It chugged along unhappily (spewed a ton of error messages), but seemed to have worked (it later occurred to me that some emails might have been lost, I’m not sure), and a few minutes later I had all of python-dev’s archives in Maildir format.

Now I can use mairix! I setup my .mairixrc like so:

base=/home/teolicy/Projects/python-internals/mail/maildir
maildir=...
database=/home/teolicy/.mairixdb
mfolder=/tmp/mairix-results

The maildir=... bit means “recurse under base and index the maildirs within”. The mfolder line says where to put the resulting mailbox from your last search. I recon the other parameters are rather obvious, but see mairixrc(5) for details if you need something else. Warning! Obviously, if you’re going to index something that’s private, don’t place the results in /tmp!

Now I just had to run mairix with no arguments, and a few (rather short) moments later all the emails in the archive were indexed. To use mairix, you type something like: mairix d:3m- s:gil f:antoine which means, “find all messages in the last three months where the subject has ‘gil’ in it and the sender has ‘antoine’ in it”. The results will be stored in /tmp/mairix-results, which you can read using mutt -f /tmp/mairix-results. I encourage you to read mairix(1), but if you don’t, be aware that the useful -t switch will pull in whole threads into the results, not just matched messages. I use it more often than not.

Two small things remained. The first, for some reason which I didn’t care enough to research, mutt kept complaining I don’t have a ~/Mail folder on startup. Placing set folder=/tmp/mairix-results in my .muttrc made it go away. <sheepish>I didn’t really read what that means</sheepish>, so if that setting eats your homework, well, you deserve it. Also, I wrote a simple function for my zshrc file that reads something like:

mairix() {
    /usr/bin/env mairix -o /tmp/mairix-results $* &&
    mutt -Rf /tmp/mairix-results
}

It makes the whole thing easier.

That’s it. I’d feel pretty happy with myself, having an itch scratched so nicely, unless I was so dumbheaded as to fail to see that mairix is essentially the tool I was looking for in the first place. After about three minutes in its manpage, I figured it’s “unwieldy crap”, and started writing my own mailbox search engine in Python, based on whoosh. Fortunately, after a couple of days of mellow hacking, and having learned of the horrors that are email algorithm (email just sucks, you know?), it dawned on me that I’m slowly changing my design thus that I’m bloody rewriting mairix, so I ditched my effort, spent a few more minutes reading mairix’s manpage and not dismissing it unconsciously all the time as crap and realized it’s exactly what I needed. I learned some from the experience about free text searching in Python and RFC2822 and stuff, but honestly, I wish I weren’t such an arse in the first place. There, I confessed.

Below you can find all the stuff written here in easily copy-pastable form, you lazy bastard. Note this isn’t a script, as it doesn’t check for any kind of error, so it’s up to you to make sure this doesn’t botch your computer or whatever.

ARCHIVE_LOCATION=$HOME/python-dev
MAILMAN_URL=http://mail.python.org/pipermail/python-dev/

echo installing mutt, mairix, mb2md
sudo apt-get install mutt mairix mb2md

echo creating directories
mkdir -p $ARCHIVE_LOCATION/mbox
mkdir -p $ARCHIVE_LOCATION/maildir
cd $ARCHIVE_LOCATION/mbox

echo downloading $MAILMAN_URL
for FILENAME in $(wget -O - -q $MAILMAN_URL |
                         egrep -o 'href="[^"]+.txt.gz"' |
                         cut -f2 -d\")
do
    echo downloading $FILENAME
    wget -q $MAILMAN_URL/$FILENAME
    gunzip $FILENAME
done

echo converting to maildir
cd $ARCHIVE_LOCATION
mb2md -s $(pwd)/mbox -d $(pwd)/maildir 2>/dev/null 1>/dev/null

echo removing converted mailboxes
rm -fr $ARCHIVE_LOCATION/mbox
mv $ARCHIVE_LOCATION/maildir/* $ARCHIVE_LOCATION/maildir/.??* $ARCHIVE_LOCATION
rmdir $ARCHIVE_LOCATION/maildir

echo setting up mairixrc and muttrc
cat << EOF > ~/.mairixrc
base=$ARCHIVE_LOCATION
maildir=...
database=$HOME/.mairixdb
mfolder=/tmp/mairix-results
EOF

cat << EOF > ~/.muttrc
set folder=/tmp/mairix-results
EOF

echo indexing archive
mairix

echo 'mairix is all set-up; maybe you want to use this function:'
echo 'mairix() {'
echo '  /usr/bin/env mairix -o /tmp/mairix-results $* &&'
echo '   mutt -Rf /tmp/mairix-results'
echo '}'

The question of updates remains; a simple script should be able to do the trick, and maybe I’ll write it sometime. Or not.

Why don’t I contribute to Python (often)

2010/04/23 § 11 Comments

Oddly enough, just a week or two after I wrote the post “Contributing to Python“, Jesse Noller (of multiprocessing fame) wrote a post called “Why aren’t you contributing (to Python)?“. As a somewhat wannabe contributor, I think it’s a terrific question which deserves wordy and open answers (hey, just like this one). I realize the question has been asked specifically in the context of python.org’s development, but I think some of the answer applies to the whole Python eco-system, alternative implementations and libraries/frameworks included. Before we begin, I think that to answer the question, some distinction should be made about two rather different kinds of contributions.

The first is driven by whatever it is that you’re contributing. Suppose you happen to hack about a better GIL arbitration algorithm, or a sprawling and insanely rich asynchronous networking framework. You want that stuff to get out. I’d go so far as to say the source code itself wants to be out. It’s your baby, more often than not you think it’s great and unless something bad happens, you’ll push it. These are the kinds of things where you’re likely to find yourself obsessing over the stats of the website describing your latest gizmo or taking it personally when some loser on python-dev (like, say, that van-what’s-his-name-guy) says your implementation of goto for Python is not quite the next best thing since sliced lists.

The other, rather different kind, is that you run into something that is rather obviously a bug and wish to open-a-ticket-that-has-a-good-chance-to-be-committed for it. First of all, this is usually a far smaller patch. I doubt many people import threading, discover The Beazley Effect, rework the GIL and open a ticket with a patch. The use-case here is more like “I have a reproducible SIGSEGV” or “I wish import zipfile would support ZIP64″. Two interesting observations about this case: first, people are far less committed to their contribution, and second, more importantly, the realities of life dictate that the J. Random Hacker who ran into this either found a workaround or patched their own Python, so they sidestepped the bug. This is important. In practically all interesting cases, the reporter has already sidestepped the bug before or shortly after posting (sidestepped is a loose term, maybe they even moved to Perl…). I doubt anyone’s schedule is loose enough to allow them to wait without sidestepping a painful thorn even for the next bugfix release. This is a hundred times more true for documentation touchups – if you realized it’s wrong, you probably don’t have to fix it to keep working, you just use whatever knowledge you now know is right.

A rather pathological, tertiary case is the “I am not-Python-core, I have some free time, I wanna contribute to Python and I went bug-hunting in the tracker” one. I think its a pathological case of the second kind of contribution, and one that I suspect happens rather rarely. I’ll lump these two together.

If you agree so far, that we have a commit-driven-contribution (“damn this is so awesome I want this to be part of Python/twisted/Django/etc”) and a contribution-driven-commit (“damn Python is so awesome, it’s a shame to leave this wart unfixed, I’ll help”). As I said, I think very different reasons prevent people from doing either. I’ll start talking about the latter kind, both because it seemed to be the focus of Jesse’s original post and because it’s easiest to answer.

First, almost everything Jesse listed as reasons is true. Don’t know how, don’t know where, etc, almost all true. The best remedy here is to get as many people as possible to have, ugh, “broken their contribution cherry”, so to speak. The easier it will be to submit minor fixes for the first time, the more people will do it. The first time is important, psychologically and setup-ly. I think after a patch of yours has been committed, the fear of the technical part process is gone and the feeling of “gee, I can actually put stuff in Python!” kicks in, and you’re far more likely to start submitting more small patches. So if you want many more people to help with mundane issues, documentations touchups, etc, the community at large should make every effort to make this first time sweet.

How do we make it sweet? I don’t know for sure, but here is a short flurry of ideas which I’d be happy to discuss (and aid implementing!):

  • Easy step-by-step instructions for opening a bug report, submitting a patch, for Ubuntu, OSX and Windows, all concentrated in one place, from setup to bug tracker attachment. The “contributing to Python” post I mentioned earlier is a (small) step in what I think is the right direction. We can flesh it out a lot, but make sure it keeps the step-by-step cookbook get-it-done approach, rather than what exists today, which is good, but isn’t aimed at getting-things-done. Compare signing up to Facebook with applying for a Tourist Visa in some foreign country.
  • Small-time-Python-contribution-talks material to be made available. This is both to be consumed online in web-talks, but mainly aims to reach out and encourage such talks in LUGs and highschools/colleges (hmm, I love this idea, I should do this sometime…).
  • A bit on a limb here, but maybe even doing what’s possible to optimize the review process in favour of first-time contributors. This is quite debatable, and (intentionally) vague, but I cautiously think it will pay off rather quickly.

These means (and probably others I didn’t think of) could probably alleviate the problem of a “contribution-driven-commit”, as I called it. Which leaves us with your fabulous implementation of goto, or “commit-driven-contribution”. I think two factors come into play here, both of them nearly irrelevant for the previous type of contribution. The first is the feeling that whatever it is you’ve done, it’s not good enough (this usually breaks my balls). “Me? Send this? To python-dev? Get outta here.”. And the second, I think, is indeed the feeling of an ‘uphill battle’ against grizzled python-dev grey beards and sharp tongued lurkers that are, I suspect, more likely than not to shred your idea to bits. Let’s face it, hacker communities at large are pretty harsh, and generally for understandable reasons. However, I think at times this tough skin and high barrier for contributing anything significant hurts us.

I used to have a co-worker, a strong hacker, who made two significant open-source packages for Python. I humbly think both are exceptionally elegant and at least one of them could have been a strong addition to stdlib. Every time I hear/read the code of some poor soul who recreated the efforts of this guy with these two packages, I cringe. I wouldn’t like to disclose his name before talking to him, but when I asked him why aren’t these packages part of stdlib, he said something like: “blah, unless you’re part of the python-dev cognoscenti you’ve no chance of putting anything anywhere”. I think he might not have pushed these packages hard enough, I should raid python-dev’s archives to know, but looking at the finesse of these packages on the one hand, and the number of questions on #python at freenode which I can answer by uttering these packages’ names, I think maybe we’re missing out on something. His perception, even if downright wrong (and I suspect it isn’t accurate, but not so wrong) is the bad thing that can happen to make you not contribute that big masterpiece you’ve made in your back yard, and that’s a damn shame. Most people will not survive the School of Hard Knocks, and that’s not necessarily always a good thing.

The issue of contributing big stuff is far more delicate and complex, and I’d be the first to admit I’m probably not rad enough to even discuss it. But it feels wrong to write a post under a heading like the one this one bears without at least mentioning this hacker-subculture-centric and sensitive issue, which affects many OSS projects, and Python as well. Micro-contributions are important, they’re the water and wind which slowly erode the landscape of a project into something cleaner, stabler and more elegant, but let’s not forget them big commits from which the mountains are born, too.

So Jesse, or any other curious soul, this is my answer to you. Should the gauntlet be picked up (by you, me or both of us) regarding the list of items I suggested earler about making micro-contributions more accessible? How about taking this to python-dev?

Contributing to Python

2010/04/08 § 6 Comments

So you have some free time to donate to Python, and don’t know where to start. Good. If you follow these cookbook instructions, you can also be a Contributor to Python™ 60 minutes from now. Most of what I’ll cover here is covered elsewhere (I’ll link), but this concentrates everything together so you really don’t have any excuses for not contributing anymore. I assume you have at least passing knowledge in Python or C, any decent editor, any version control system (like svn, git, etc), any bug-tracking system and a typical build process (./configure, make, etc; or your platform’s equivalent). You don’t have to be an expert, but if this is your first day in any of these, maybe this isn’t the right document for you.

Let’s get our gear in order. In my experience, it’s easiest to get ready to start using Ubuntu, and Ubuntu examples are what I’m going to give. If you use something else you should do your platform’s equivalent (or, uh, ‘upgrade’…). I assume you have a toolchain (compiler, linker, etc) installed, so start by making sure you have svn (sudo apt-get install subversion), then checkout one of Python’s latest development trees. Checking out is covered here, but I found that explanation a bit tiring. So, briefly: for the foreseeable future, you’re going to want to work either on trunk (Python 2.x development branch, currently 2.7) or on py3k (Python 3.x development branch, currently 3.2). The commands you need are these two:

svn co http://svn.python.org/projects/python/branches/py3k
svn co http://svn.python.org/projects/python/trunk

Read below for stuff to do during the checkout, but as soon as Subversion will be done, you best configure and compile the source, to see that you can build Python. I assume you won’t be installing this copy of Python at all, so you can simply run ./configure && make without fiddling much with --prefix et al. By default, Python will not build some of the things you may be used to from whichever build you’ve been using so far. Fixing that isn’t mandatory for many bugs, but if you run into ‘missing features’, note two things. First, to include some modules you’ll have to edit ./Modules/Setup.dist and enable whichever modules you want. Second, Python may finish the build process successfully, but issue a notice like this towards the end of the build:

Python build finished, but the necessary bits to build these modules were not found:
_curses            _curses_panel      _dbm            
_gdbm              _sqlite3           _ssl            
_tkinter           bz2                readline        
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

The most likely reason is that you don’t have the development version of the listed packages installed. I made almost everything listed above disappear on my system with sudo apt-get install libreadline-dev libbz2-dev libssl-dev libsqlite3-dev libncurses-dev libgdbm-dev. Re-read the error message and the apt-get I issued, you’ll see it’s rather easy to guess the development package name from Python’s error message. Much like you’ve done during the checkout, read below for stuff to do during the configuration / build. If your internet connection is blazing and your CPU screams or you otherwise took your time in choosing a bug and the build process finished before you chose a bug, you can also run the full test suite: ./python -m test.regrtest -uall and see that nothing beyond the really esoteric breaks.

Finding a bug to work on is not hard, but it’s not trivial. Register with the Python bug-tracking site and go to the search page. For a start, I recommend to use this search: Stage: needs patch; Keyword: easy. It’s easiest to start afresh (no patch submitted with the bug), and the bugs marked ‘easy’ are easy. You will notice that not many bugs return from the search – that’s because so many easy bugs already have a patch issued to them; we’ll get back to that later. Of the returned bugs, I chose issues #6610 and #7834. #6610 reports a potential flaw in subprocess; turns out that when forking out a coprocess from a process with closed standard streams, there’s a subtlety in dup2()-ing the child’s standard streams from the pipes the parent inherited to it. #7834 reports an issue with socketmodule.c‘s creation of AF_BLUETOOTH sockets; the implementation of getsockaddrarg() is fragile as it doesn’t zero out a returned sockaddr_l2 struct which might be left with garbage.

First, make sure you understand the bug. For #6610 I read an article that was linked to the bug about the subtlety of coprocess forking for daemons, as well as the relevant code in Python (here and here, the relevant bits are near the dup2 calls). Once you understand what happens, and sometimes even if you're not sure, the best thing is to write test code that will trigger the bug. I didn’t want to touch Python’s test suite quite yet, so I jotted a small script in /tmp that will create a parent with closed stdio streamed that then fork a /bin/cat coprocess. Writing the test forced me to think harder about the bug, and to realize that… it isn’t. Bug #6610 reports a potential bug, but as it turns out, the author(s) of subprocess already protected themselves against the bug’s manifestation. My test code, assuming it was correct, proved my hypothesis – the test worked fine.

Case closed, no? No. While its possible to comment “sorry, not a bug, move along”, I wanted to nail the bug. From what I’ve seen on other bugs, nothing will convince a trusted developer to close the bug more than working coverage of the alleged bug in Python’s suite of unit test. Before touching a test, make sure it runs on your system, otherwise you might waste lots of time hunting issues. After running the subprocess test, I took my jotted script from earlier, cleaned it up and fit it in Lib/test/test_subprocess.py. Important! Python, rightfully, is rather anal about the style of code they accept into the repository (tests are code just like any other code). Save yourself lots of grief, and make certain you abide to PEP8 and PEP7. You don’t have to read them to the letter to do small fixes, but make sure you caught the spirit of what’s in there and especially that your new code fits with its old surrounding as far as style is concerned. Once I was happy with how the new test looked, I re-ran this particular test (./python -m test.regrtest test_subprocess), created an svn patch (running svn diff | tee ~/test_subprocess_with_standard_fds_closed.diff after I modified subprocess’ test was all it took) and posted the patch along with the description of what I did and found.

#7834 is simpler in most regards but much harder in one important aspect. It’s rather obvious the person who submitted the bug really ran into it in real life and probably knows more about Bluetooth sockets than I do. Fixing the bug is trivial, but writing a test for it that will run even without using actual Bluetooth hardware is not. I looked into bluez’ sources and #bluez on Freenode for help, didn’t see a managable way to do it, prepared just a patch of the fix and submitted it. I did however mention my attempt at finding a way to write a test, so whomever reviews my bug will see the complexities involved and decide for themselves whether or not this testless fix is worthy of a commit.

Ha! Two bugs clos… no. Not closed, just updated in the bug tracker. Truth is, most easy/medium bugs have some patch waiting for them, and are pending review. It is my meager understanding that if you really want to help out, reviewing some of these patches, cleaning them up, adding tests and updating documentations are probably the best things you can do for Python. Python could use an alleviation of the GIL, but if you’re an early and casual contributor, I doubt you’ll be the one to do it. What you can do, especially if you’re a student or otherwise have time and are keen to learn, is to improve as many bugs as you can into pristine condition. The less small flaws there are in a bug, the more mature it looks and it becomes easier it is for a trusted/core developer to just commit them and be done with it. You’d like someone to do it with your bugs, why not do it for someone else. Besides, soon enough you’ll be tangled enough being nosy with so many bugs that you’ll get deeper into the scene, find more interesting aspects of seemingly simple bugs, and so on and so forth, until one day you can overthrowkneel before the BDFL and be knighted to say Ni!

Happy bug hunting!

Where Am I?

You are currently browsing the Uncategorized category at NIL: .to write(1) ~ help:about.