RESTfully atomically incrementing a counter using HTTP PATCH
2012/02/08 § Leave a comment
So today I ran into the question of incrementing a counter in a RESTful manner, and wasn’t sure how to go about doing it. Googling around a bit didn’t find me a satisfactory answer, though I did find @idangazit
asked the same question on Stack Overflow, but alas the question was answered by what I humbly felt was an inadequate answer.
Idan had “PUT vs. POST” in his question, but quoting the answer I just added to that question (#selfplagiarism
!), I believe PATCH is the answer as RFC 2068 says very well:
The PATCH method is similar to PUT except that the entity contains a list of differences between the original version of the resource identified by the Request-URI and the desired content of the resource after the PATCH action has been applied. The list of differences is in a format defined by the media type of the entity (e.g., “application/diff”) and MUST include sufficient information to allow the server to recreate the changes necessary to convert the original version of the resource to the desired version.
So, for example, to update profile 123’s view count, I would do (using requests, what else?):
import requests requests.patch( 'http://localhost:8000/profiles/123', 'views + 1\n', headers={"Content-Type": "application/x-counters"} )
Which would emit something like:
PATCH /profiles/123 HTTP/1.1 Host: localhost:8000 Content-Length: 10 Content-Type: application/x-counters Accept-Encoding: identity, deflate, compress, gzip Accept: */* User-Agent: python-requests/0.10.0 views + 1
Where the x-counters
media type (which I just made up) is made of multiple lines of field operator scalar
tuples. views + 1
, views = 500
, views - 1
or views + 3
are all valid syntactically (but some may be forbidden semantically). I can understand some frowning-upon making up yet another media type, but I think this approach matches the intention of the RFC quite well, it’s extremely simple and if the backend is implemented correctly, it’s atomically correct.
Suggestions for another approach?
EDIT: I’ve had a long discussion with a friend who disliked the use of a non-standard media type. Perhaps something like this is better, though I’m still not entirely convinced:
import requests requests.patch( 'http://localhost:8000/profiles/123', '[{"field": "views", "operator": "+", "operand": 1}]', headers={"Content-Type": "application/json"} )
I’m not sure what’s the bigger crime – using a non-standard media type, which, in the words of the RFC, is “discouraged”, or using a standard generic serialization format as the media type, which doesn’t say much about the scheme you’d like to use within it. Both are better than anything else I can think of.
Walking Python objects recursively
2011/12/11 § 6 Comments
Here’s a small function that walks over any* Python object and yields the objects contained within (if any) along with the path to reach them. I wrote it and am using it to validate a deserialized datastructure, but you can probably use it for many things. In fact, I’m rather surprised I didn’t find something like this on the web already, and perhaps it should go in itertools.
Edit: Since the original post I added infinite recursion protection following Eli and Greg’s good advice, added Python 3 compatibility and did some refactoring (which means I had to add proper unit test). You will always be able to get the latest version here, on ActiveState’s Python Cookbook (at least until it makes its way into stdlib, fingers crossed…).
from collections import Mapping, Set, Sequence # dual python 2/3 compatability, inspired by the "six" library string_types = (str, unicode) if str is bytes else (str, bytes) iteritems = lambda mapping: getattr(mapping, 'iteritems', mapping.items)() def objwalk(obj, path=(), memo=None): if memo is None: memo = set() iterator = None if isinstance(obj, Mapping): iterator = iteritems elif isinstance(obj, (Sequence, Set)) and not isinstance(obj, string_types): iterator = enumerate if iterator: if id(obj) not in memo: memo.add(id(obj)) for path_component, value in iterator(obj): for result in objwalk(value, path + (path_component,), memo): yield result memo.remove(id(obj)) else: yield path, obj
And here’s a little bit of sample usage:
>>> tuple(objwalk(True)) (((), True),) >>> tuple(objwalk({})) () >>> tuple(objwalk([1,2,3])) (((0,), 1), ((1,), 2), ((2,), 3)) >>> tuple(objwalk({"http": {"port": 80, "interface": "0.0.0.0"}})) ((('http', 'interface'), '0.0.0.0'), (('http', 'port'), 80)) >>>
pv: the pipe swiss army knife
2011/11/05 § Leave a comment
When using UNIX, every now and then you run into a relatively unknown command line application which, once you master it, becomes part of your “first class” commands along with cut or tr. You wince every time you work on a computer that doesn’t have it (and promptly wget-configure-make-install it) and you’re amazed your colleagues never heard of it. I often feel pv is such a command for me. Really, this command, much like netcat, should have been written in Berkley sometime circa 1985 and be in every /usr/bin today. Alas, somehow Hobbit only wrote netcat in 1996, and it took a long while for for it to reach /usr/bin ubiquity. Similarly, Andrew Wood only wrote pv in 2002, and I hope this post will convince you to place it in all your /usr/local/bins today and convince distribution makers to promote it to the status of a standard package as soon as possible.
The basic premise of pv is simple – it’s a program that copies stdin to stdout, while richly displaying progress using terminal graphics on stderr. If you use UNIX a lot and you never heard of pv before, I’m pretty sure the lightbulb is already lit above your head (if not, maybe pv isn’t for you after all or maybe it would help if you’d take a look this review of pv to help you see why it’s so great). pv has evolved rather nicely over the years, it’s available from Ubuntu universe for a while now (why only universe? why??), and it has a slew of useful features, like rate limiting, ETA prediction for an expected amount of data, on-the-fly parameter change (increase/decrease rate limit without breaking the pipe!), multiple invocation support (measure speed in two different points of the pipe!) and so on.
If you’re using pv, I hope you may want to see some of the recipes I use it in; if you don’t, maybe they’ll whet your appetite (I’m using long options for pv and short options for everything else):
- The basics: copy /tmp/src/ to /tmp/dst/, with progress
- Scale a bunch of images to a specific size, using multiple cores and with progress
- Get a quick assessment of the traffic rate going through some interface
$ src=/tmp/src ; tar -cC "$src" . | pv --size $(du -hsk "$src" | cut -f1)k | tar -xC /tmp/dst 142MB 0:00:02 [43.4MB/s] [======> ] 58% ETA 0:00:01 $
By the way, this works great if you add nc and compression, pv can even help you decide what level of compression to use to achieve the best throughput before the CPU becomes the bottleneck.
$ cd /tmp/src ; ls *.jpg | xargs -P 4 -I % -t convert -resize 1024 % /tmp/dst/% 2>&1 | pv --line-mode --size $(ls *.jpg | wc -l) > /dev/null 96 0:00:16 [7.85/s] [===> ] 36% ETA 0:00:28 $
$ sudo tcpdump -c 10000 -i eth1 -w - 2>/dev/null | pv > /dev/null 35.4MB 0:00:07 [4.56MB/s] [ <=> ] $
Nifty, eh? I find myself inserting pv in any pipe I expect to exist for more than a few moments. How do you use pv?
zsh and virtualenv
2010/10/14 § 8 Comments
A week ago or so I finally got off my arse and did the pragmatic programmer thing, setting aside those measly ten minutes to check out virtualenv (well, I also checked out buildout, but I won’t discuss it in this post). I knew pretty much what to expect, but I wanted to get my hands dirty with them so I could see what I assumed I’ve been missing out on for so long (and indeed I promptly kicked myself for not doing it sooner, yada yada, you probably know the drill about well-known-must-know-techniques-and-tools-that-somehow-you-don’t-know). Much as I liked virtualenv, there were two things I didn’t like about environment activation in virtualenv. First, I found typing ‘source bin/activate’ (or similar) cumbersome, I wanted something short and snazzy that worked regardless of where inside the virtualenv I am so long as I’m somewhere in it (it makes sense to me to say that I’m ‘in’ a virtualenv when my current working directory is somewhere under the virtualenv’s directory). Note that being “in” a virtualenv isn’t the same as activating it; you can change directory from virtualenv foo to virtualenv bar, and virtualenv foo will remain active. Indeed, this was the second problem I had: I kept forgetting to activate my virtualenv as I started using it or to deactivate the old one as I switched from one to another.
zsh to the rescue. You may recall that I already mentioned the tinkering I’ve done to make it easier to remember my current DVCS branch. Briefly, I have a function called _rprompt_dvcs which is evaluated whenever zsh displays my prompt and if I’m in a git/Mercurial repository it sets my right prompt to the name of the current branch in blue/green. You may also recall that while I use git itself to tell me if I’m in a git repository at all and which branch I’m at (using git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'), I had to resort to a small C program (fast_hg_root) in order to decide whether I’m in a Mercurial repository or not and then I manually parse the branch with cat. As I said in the previous post about hg and prompt, I’m not into giving hg grief about speed vs. git, but when it comes to the prompt things are different.
With this background in mind, I was perfectly armed to solve my woes with virtualenv. First, I changed fast_hg_root to be slightly more generic and search for a user-specified “magic” filename upwards from the current working directory (I called the outcome walkup, it’s really simple and nothing to write home about…). For example, to mimic fast_hg_root with walkup, you’d run it like so: $ walkup .hg. Using $ walkup bin/activate to find my current virtualenv (if any at all), I could easily add the following function to my zsh environment:
act () { if [ -n "$1" ] then if [ ! -d "$1" ] then echo "act: $1 no such directory" return 1 fi if [ ! -e "$1/bin/activate" ] then echo "act: $1 is not a virtualenv" return 1 fi if which deactivate > /dev/null then deactivate fi cd "$1" source bin/activate else virtualenv="$(walkup bin/activate)" if [ $? -eq 1 ] then echo "act: not in a virtualenv" return 1 fi source "$virtualenv"/bin/activate fi }
Now I can type $ act anywhere I want in a virtualenv, and that virtualenv will become active; this saves figuring out the path to bin/activate and ending up typing something ugly like $ source ../../bin/activate. If you want something that can work for you without a special binary on your host, there’s also a pure-shell version of the same function in the collapsed snippet below.
function act() { if [ -n "$1" ]; then if [ ! -d "$1" ]; then echo "act: $1 no such directory" return 1 fi if [ ! -e "$1/bin/activate" ]; then echo "act: $1 is not a virtualenv" return 1 fi if which deactivate > /dev/null; then deactivate fi cd "$1" source bin/activate else stored_dir="$(pwd)" while [ ! -f bin/activate ]; do if [ $(pwd) = / ]; then echo "act: not in a virtualenv" cd "$stored_dir" return 1 fi cd .. done source bin/activate cd "$stored_dir" fi }
This was nice, but only solved half the problem: I still kept forgetting to activate the virtualenv, or moving out of a virtualenv and forgetting that I left it activated (this can cause lots of confusion, for example, if you’re simultaneously trying out this, this, this or that django-facebook integration modules, more than one of them thinks that facebook is a good idea for a namespace to take!). To remind me, I wanted my left prompt to reflect my virtualenv in the following manner (much like my right prompt reflects my current git/hg branch if any):
- If I’m not in a virtualenv and no virtualenv is active, do nothing.
- If I’m in a virtualenv and it is not active, display its name as part of the prompt in white.
- If I’m in a virtualenv and it is active, display its name as part of the prompt in green.
- If I’m not in a virtualenv but some virtualenv is active, display its name in yellow.
- Finally, if I’m in one virtualenv but another virtualenv is active, display both their names in red.
So, using walkup, I wrote the virtualenv parsing functions:
function active_virtualenv() { if [ -z "$VIRTUAL_ENV" ]; then # not in a virtualenv return fi basename "$VIRTUAL_ENV" } function enclosing_virtualenv() { if ! which walkup > /dev/null; then return fi virtualenv="$(walkup bin/activate)" if [ -z "$virtualenv" ]; then # not in a virtualenv return fi basename $(grep VIRTUAL_ENV= "$virtualenv"/bin/activate | sed -E 's/VIRTUAL_ENV="(.*)"$/\1/') }
All that remained was to change my lprompt function to look like so (remember I have setopt prompt_subst on):
function _lprompt_env { local active="$(active_virtualenv)" local enclosing="$(enclosing_virtualenv)" if [ -z "$active" -a -z "$enclosing" ]; then # no active virtual env, no enclosing virtualenv, just leave return fi if [ -z "$active" ]; then local color=white local text="$enclosing" else if [ -z "$enclosing" ]; then local color=yellow local text="$active" elif [ "$enclosing" = "$active" ]; then local color=green local text="$active" else local color=red local text="$active":"$enclosing" fi fi local result="%{$fg[$color]%}${text}$rst " echo -n $result } function lprompt { local col1 col2 ch1 ch2 col1="%{%b$fg[$2]%}" col2="%{$4$fg[$3]%}" ch1=$col1${1[1]} ch2=$col1${1[2]} local _env='$(_lprompt_env)' local col_b col_s col_b="%{$fg[green]%}" col_s="%{$fg[red]%}" PROMPT="\ $bgc$ch1\ $_env\ %{$fg_bold[white]%}%m:\ $bgc$col2%B%1~%b\ $ch2$rst \ $col2%#$rst " }
A bit lengthy, but not very difficult. I suffered a bit until I figured out that I should escape the result of _lprompt_virtualenv using a percent sign (like so: "%{$fg[$color]%}${text}$rst "), or else the ANSII color escapes are counted for cursor positioning purposes and screw up the prompt’s alignment. Meh. Also, remember to set VIRTUAL_ENV_DISABLE_PROMPT=True somewhere, so virtualenv’s simple/default prompt manipulation functionality won’t kick in and screw things up for you, and we’re good to go.
The result looks like so (I still don’t know how to do a terminal-“screenshot”-to-html, here’s a crummy png):
Voila! Feel free to use these snippets, and happy zshelling!
Eulogy to a server
2010/10/01 § 2 Comments
You don’t know it, but I’ve started writing this blog several times before it actually went live, and every time I scraped whatever post I started with (the initial run was on blogger.com). I just didn’t think these posts were all too interesting, they were about my monstrous home server, donny. Maybe this is still not interesting, but I’m metaphorically on the verge of tears and I just have to tell someone of what happened, to repent me of my horrible sin. You may not read if you don’t want to. I bought donny about 2.5-3 years ago, to replace my aging home storage server (I had about 3x250GB at the time, no RAID). There’s not much to say about donny’s hardware (Core 2 Duo, 2GB of RAM, Asus P5K-WS motherboard), other than the gargantuan CoolerMaster Stacker 810 chassis with room for 14 (!) SATA disks. Initially I bought 8×0.5TB SATA Hitachi disks for it, and added more as I had the chance. I guess I bought it because at the time I’d hang around disks all day long, I must’ve felt the need to compensate for something (my job at the time was mostly around software, but still, you can’t ignore the shipping crates of SATA disks in lab).
Anyway, most of its life donny ran OpenSolaris. One of our customers had a big ZFS deployment, I’ve always liked Solaris most of all the big Unices (I never thought it really better than Linux, it just Sucked Less™ than the other big iron Unices), I totally drank the cool-aid about “ZFS: the last word in File System” (notice how the first Google hit for this search term is “Bad Request” :) and dtrace really blew me away. So I chose OpenSolaris. Some of those started-but-never-finished posts were about whether I was happy with OpenSolaris and ZFS or not, I never found them interesting enough to even finish them. So even if I don’t wanna discuss that particularly, it should be noted that if we look at how I voted with my feet, I ended up migrating donny to Ubuntu 10.04.1/mdadm RAID5/ext4 when my wife and I got back from our long trip abroad.
Migration was a breeze, the actual migration process convinced me I’ve made the right choice in this case. Over the time with ZFS (both at work and at home) I realized it’s probably good but certainly not magical and not the end of human suffering with regard to storage. In exchange for giving up zfs and dtrace I received the joys of Ubuntu, most notably a working package management system and sensible defaults to make life so much easier, along with the most vibrant eco-system there is. I bought donny 4×2.0TB SATA WD Cavier Green disks, made a rolling upgrade for the data while relying on zfs-fuse (it went well, despite a small and old bug) and overall the downtime was less than an hour for the installation of the disks. At the time of the disaster, donny held one RAID5 array made of 4x2TB, one RAID5 array made of 4x.5TB, one soon-to-be-made RAID5 array made of 3x1TB+1×1.5TB (I bought a larger drive after one of the 1TB failed a while ago), and its two boot disks. I was happy. donny, my wife and me, one happy family. Until last night.
I was always eyeing donny’s small boot disks (what a waste of room… and with all these useless SATA disks I’ve accumulated over the years and have lying about…), so last night I wanted to break the 2x80GB mirror and roll-upgrade to a 2x1TB boot configuration, planning on using the extra space for… well, I’ll be honest, I don’t know for what. I’ll admit it – I got a bit addicted to seeing the TB suffix near the free space column of df -h at home (at work you can see better suffixes :). I just have hardware lying around, and I love never deleting ANYTHING, and I love keeping all the .isos of everything ever (Hmm… RHEL3 for Itanium… that must come in handy some day…) and keeping an image of every friend and relative’s Windows computer I ever fixed (it’s amazing how much time this saves), and never keeping any media I buy in plastic… and, well, the fetish of just having a big server. Heck, it sure beats farmville.
So, indeed, last night I broke that mirror, and installed that 1TB drive, and this morning I started re-mirroring the boot, and while I was at it I started seeing some of the directory structure was wrong so I redistributed stuff inside the RAIDs, and all the disks where whirring merrily at the corner of the room, and I was getting cold so I turned off the AC, and suddenly donny starts beeping (I didn’t even remember I installed that pcspkr script for mdadm) and I get a flurry of emails regarding disks failures in the md devices. WTF? Quickly I realized that donny was practically boiling hot (SMART read one of the disks at 70 degrees celsius), at which point I did an emergency shutdown and realized… that last night I disconnected the power cable running from the PSU to several fans, forgot to reconnect it, and now I’ve effectively cooked my server. Damn.
I’m not sure what to do now. I still have some friends who know stuff about harddisks (like, know the stuff you have to sign NDAs with the disk manufacturers in order to know), and I’m trying to pick my network about what to do next. Basically, from what I hear so far, I should keep donny off, let the disks cool down, be ready with lots of room on a separate host to copy stuff out of it, boot it up in a cool room, take the most critical stuff out and then do whatever, it doesn’t matter, cuz the disks are dead even if they seem alive. I’m told never to trust any of the disks that were inside during the malfunction (that’s >$1,000USD worth of disks…), once a disk reached 70 degrees, even far less, don’t get near it, even if it’s new. Admittedly, these guys are used to handling enterprise disk faults, where $1,000USD in hardware costs (and even many many times that amount) is nothing compared to data loss, but this is the gist of what I’m hearing so far. If you have other observations, let me know. It’s frustratingly difficult to get reliable data about disk failures on the Internet; I know just what to do in case of logical corruption of any sort; but I don’t know precisely what to do in a case like this, and in case of a controller failure, and a head crash, and so on, and so forth. I know it’s a lot about luck, but what’s the best way to give donny the highest chance of survival?
On a parting note, I’ll add that I’m a very sceptic kind of guy, but when it comes to these things I’m rather mystical. It all stems from my roots as a System Administrator; what else can comfort a lonely 19-year-old sysadmin trying to salvage something from nothing in a cold server room at 03:27AM on a Saturday? So now I blame all of this for the name I gave donny. I named it so because I name all my hosts at home after characters from Big Lebowski (I’m typing this on Dude, my laptop), and I called the server donny. The email address I gave it (so it could send me those FzA#$%! S.M.A.R.T reports it never did!) was named Theodore Donald Kerabatsos. The backup server, which is tiny compared to donny and doesn’t hold nearly as much stuff as I’d like to have back now, is called Francis Donnelly. The storage pools (and then RAID volumes) were called folgers, receptacle, modest and cookies (if you don’t understand, you should have paid more attention to Donny’s funeral in The Big Lebowski). And, indeed, as I predicted without knowing it, it ended up being friggin’ cremated. May Stallman bless its soul.
I guess I’m a moron for not thinking about this exact scenario; I was kinda assuming smartmontools and would be SMART (ha!) enough to shutdown when the disks reach 50 degrees, and maybe there is such a setting and I just didn’t enable it… I guess by now it doesn’t matter. I’m one sad hacker. I can’t believe I did this to myself.
The Curious Case of HID Malfunction
2010/08/21 § Leave a comment
A quick tidbit for any interested hardware wizards out there (I know no one is likely to care, this is really more of an excuse for why the next Python’s Innards post is progressing slowly). As some of you know, I’m currently on a long trip with my wife (a trip which is already nearing its end…). This means I’m rather poor in hardware, and that sometimes the environment is harsh – hot, cold, humid, occasionally vibrating (flights, boats), rich with small particles (sand, dust), etc. As I expected for a long time, the elements finally had their toll on my small but until now trusty Asus EeePC 1005HA. The thing is, the toll was taken in a rather odd way.
For about three days now the builtin trackpad stopped working – most of the time. Usually I get no cursor movement nor clicks, and there are no particular messages in dmesg//var/log/messages. On rare occasions the trackpad resumes working for a bit, I wasn’t able to find a pattern in what makes it work for these short periods of time (heating, cooling, sleeping, booting… nothing seems to make the short ‘work-periods’ predictable). On one occasion the trackpad worked but behaved erratically (jumping around, random clicks, etc), on others it works fine, but for a few seconds and up to a few minutes. I’m running Ubuntu 10.04, kernel 2.6.32-24, keeping it reasonably apt-get updated. I didn’t change anything significant in the software configuration of the computer before this happened, and booting a vanilla 10.04 from a USB stick I have around doesn’t help, so I’m pretty sure it’s not a vanilla software issue (despite the oddity listed below).
This is patently unpleasent but not entirely odd, and I would chalk it down to some undefined hardware damage and let it be. I could buy an external mouse for the remaining few weeks of the trip and otherwise ignore the issue, lest the builtin keyboard started showing similar behaviour. It works far more often than the mouse, but has spells of brokeness. An external USB keyboard works fine when plugged in. I don’t even know if my internal keyboard interfaces via some kind of internal USB controller or not; seems not, as even when it’s working it’s not listed in lsusb -v. /proc/bus/input/devices lists an “AT Translated Set 2 keyboard”, but I have no idea if this is really my keyboard or not. Anyway, the really weird thing is that the keyboard’s broken behaviour has a few extra odd quirks:
- It works perfectly prior to loading the kernel: in the BIOS configuration screen, or GRUB’s menu, or the USB stick’s boot menu. It seems that as soon as the kernel is loaded, no more keyboard (X11 or console).
- The “special” EeePC keys, like toggling wifi or changing screen brightness, work perfectly. They aren’t special keys, but rather a key combination, and the keys used in the combination don’t reach the OS discreetly.
- When I open the laptop’s lid in sleep, I need to hit a key to bring it out of sleep. Any key works well enough for the computer to wakeup, and the very same key (or any other key) will promptly stop working when the OS is awake enough to ask for a password.
So what gives? My keyboard isn’t broken, but some kind of interface between the keyboard and the system which is circumvented by the BIOS but is used by the kernel is broken? Huh, WTF?
The bit of Googling I did yielded nothing, Internet here isn’t really scarce but it isn’t abundant and sure is not fast or pleasent (I’m on a beach in Thailand at the moment). I’m left with a big WTF and apt-get install keynav. Any tips will be greatly appreciated (and speed up the next post in the Python’s Innards series, too!).
Update: I’ve decided to disassemble and reassemble the keyboard, following these instructions, using a swiss-army knife, my wife’s fingernail file and a camping torch. Following the work both keyboard and touchpad are working for about 10 minutes now, one of the longer durations in the past few days. I can only hope I fixed the problem. Either way, I’m curious why the keyboard consistently didn’t work with a loaded kernel yet seemed to work fine using the BIOS (in the BIOS’ setup, GRUB’s boot menu, etc). Any explanations?
Switching to mercurial: taming zsh
2010/05/14 § 12 Comments
A quick one, so you’ll know not all my posts must have so many words. PEP 385 is materializing, and its time to learn Mercurial. I can’t say I’m a Mercurial expert, but I thought migrating all my git-oriented-zsh-gizmos would help me along the way. The conversion is almost done and had just one somewhat noteworthy tidbit.
A while back I copied from a friend a rather elaborate zsh prompt (not as elaborate as some people’s…), which includes my current git branch (if any) in it. The code to make it looked like this:
parse_git_branch() { git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/' } function _rprompt_git { local git_branch="$(parse_git_branch)" if [ -n "$git_branch" ]; then result=":%{$fg[blue]%}$git_branch$rst" fi echo -n $result }
I’ve naïvely added a Mercurial equivalent to parse_git_branch using hg bookmarks or hg branch, and retrofitted/redubbed _rprompt_git into _rprompt_dvcs. It worked well, but was slow. I’m not the kind of person to give hg grief over reasonable speed differences with the speed monster, but you can’t wait 0.15 seconds for your friggin’ prompt, now can you (this is not an invitation for a git/hg/bzr performance holy war in the comments, people). Removing a call to hg by using $ hg root just once, storing that value and using cat to get the actual branch/bookmark didn’t speed things up enough. #mercurial on freenode was kind but didn’t know how to help, other than suggest I buy a really fast computer… Blah, I’ll have to jot something out in C.
One bitbucket account later and a bit of tinkering led to fast_hg_root, which is a dumb C program which acts as a (fast) replacement to hg root. So now my dvcs-prompt-related code includes:
parse_git_branch() { git branch --no-color 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/' } parse_hg_branch() { if ! HG_ROOT=$(fast_hg_root) 2> /dev/null; then # not an HG repository, quit return fi BOOKMARK=$(cat "$HG_ROOT"/.hg/bookmarks.current 2> /dev/null) if [ -n "$BOOKMARK" ]; then # have a current bookmark, display that echo $BOOKMARK return fi # display the current branch or 'default' cat "$HG_ROOT"/.hg/branch 2> /dev/null || echo 'default' } function _rprompt_dvcs { local git_branch="$(parse_git_branch)" if [ -n "$git_branch" ]; then result=":%{$fg[blue]%}$git_branch$rst" else local hg_branch="$(parse_hg_branch)" if [ -n "$hg_branch" ]; then result=":%{$fg[green]%}$hg_branch$rst" fi fi echo -n $result }
Which leads to my terminal sessions on my netbook monica looking like so:
C’est tout. If someone is interested, I’ll open a repository with more of my .zsh.d stuff. By the way, anyone knows of a good way to capture a terminal into HTML, preserving ANSI color? I know of ttyrec and found HTML::FromANSI, but was hoping for a finished program or at least, uhm, a library less in Perl. :p
Searching mailman archives offline (python-dev, anyone?)
2010/05/07 § 8 Comments
Since I’m a newcomer to python-dev, I often need to search the python-dev mailman archives. While I did find this way to do it (using Google with site:), it’s no good for offline searches (and at best it’s a kludge for online searches, too, IMHO). I’m offline quite a lot these days, since cellular 3G isn’t what it never used to be, and as long as I’m travelling the world, pre-paid cellular 3G is even worse. So, I set out looking for a proper solution to search python-dev in an offline manner.
Initially I just downloaded the whole mailing list archive with the shell-concoction listed below, and used grep to fish out what I needed. Obviously, if you’re reading this to look into something which isn’t python-dev (but why would you?!), replace MAILMAN_URL with wherever the mailing list you care about is archived.
MAILMAN_URL=http://mail.python.org/pipermail/python-dev/ for FILENAME in $(wget -O - -q $MAILMAN_URL | egrep -o 'href="[^"]+.txt.gz"' | cut -f2 -d\" ) do wget $MAILMAN_URL/$FILENAME gunzip $FILENAME done
Naturally, after a (short) while I realized I need a proper mailbox search utility. I’ve been using Debian for ages, but the pure richness of Ubuntu’s repositories has only recently made my brain rewire ‘task: find new software’ to apt-cache search. So I did, and indeed apt-cache search mbox search found mairix, a “a program for indexing and searching email messages stored in maildir, MH or mbox folders.”. Sweet.
mairix has a slightly odd usage pattern and is geared towards people fleunt in mutt (which I’m not) so it, ugh, took me a while to realize it’s the tool I need and how to use it (gory details below). To sum things up, with mairix, you (a) index all the mail you’d like to search in one invocation and (b) run mairix with a search query, which creates a new mailbox (mbox/Maildir/MH) only with the results. You can later view that mailbox with your favorite reader, but the only one that I know of that would make sense in this context is probably, indeed, mutt.
Initially I set mairix up to index the mboxes as they were, but then I realized that due to the limitations of the mbox format, mairix has to copy every matching message to the results mailbox. If I were to use Maildir, for example, where every message is a file, it would generate a search-result-Maildir made of symlinks, which sounds better. So how do you convert all these mbox’s to Maildir? apt-cache search convert mbox maildir found mb2md for me. I placed all the mbox’s in a directory called mbox, and created a directory called maildir, and ran: mb2md -s $(pwd)/mbox -d $(pwd)/maildir. It chugged along unhappily (spewed a ton of error messages), but seemed to have worked (it later occurred to me that some emails might have been lost, I’m not sure), and a few minutes later I had all of python-dev’s archives in Maildir format.
Now I can use mairix! I setup my .mairixrc like so:
base=/home/teolicy/Projects/python-internals/mail/maildir maildir=... database=/home/teolicy/.mairixdb mfolder=/tmp/mairix-results
The maildir=... bit means “recurse under base and index the maildirs within”. The mfolder line says where to put the resulting mailbox from your last search. I recon the other parameters are rather obvious, but see mairixrc(5) for details if you need something else. Warning! Obviously, if you’re going to index something that’s private, don’t place the results in /tmp!
Now I just had to run mairix with no arguments, and a few (rather short) moments later all the emails in the archive were indexed. To use mairix, you type something like: mairix d:3m- s:gil f:antoine which means, “find all messages in the last three months where the subject has ‘gil’ in it and the sender has ‘antoine’ in it”. The results will be stored in /tmp/mairix-results, which you can read using mutt -f /tmp/mairix-results. I encourage you to read mairix(1), but if you don’t, be aware that the useful -t switch will pull in whole threads into the results, not just matched messages. I use it more often than not.
Two small things remained. The first, for some reason which I didn’t care enough to research, mutt kept complaining I don’t have a ~/Mail folder on startup. Placing set folder=/tmp/mairix-results in my .muttrc made it go away. <sheepish>I didn’t really read what that means</sheepish>, so if that setting eats your homework, well, you deserve it. Also, I wrote a simple function for my zshrc file that reads something like:
mairix() { /usr/bin/env mairix -o /tmp/mairix-results $* && mutt -Rf /tmp/mairix-results }
It makes the whole thing easier.
That’s it. I’d feel pretty happy with myself, having an itch scratched so nicely, unless I was so dumbheaded as to fail to see that mairix is essentially the tool I was looking for in the first place. After about three minutes in its manpage, I figured it’s “unwieldy crap”, and started writing my own mailbox search engine in Python, based on whoosh. Fortunately, after a couple of days of mellow hacking, and having learned of the horrors that are email algorithm (email just sucks, you know?), it dawned on me that I’m slowly changing my design thus that I’m bloody rewriting mairix, so I ditched my effort, spent a few more minutes reading mairix’s manpage and not dismissing it unconsciously all the time as crap and realized it’s exactly what I needed. I learned some from the experience about free text searching in Python and RFC2822 and stuff, but honestly, I wish I weren’t such an arse in the first place. There, I confessed.
Below you can find all the stuff written here in easily copy-pastable form, you lazy bastard. Note this isn’t a script, as it doesn’t check for any kind of error, so it’s up to you to make sure this doesn’t botch your computer or whatever.
ARCHIVE_LOCATION=$HOME/python-dev MAILMAN_URL=http://mail.python.org/pipermail/python-dev/ echo installing mutt, mairix, mb2md sudo apt-get install mutt mairix mb2md echo creating directories mkdir -p $ARCHIVE_LOCATION/mbox mkdir -p $ARCHIVE_LOCATION/maildir cd $ARCHIVE_LOCATION/mbox echo downloading $MAILMAN_URL for FILENAME in $(wget -O - -q $MAILMAN_URL | egrep -o 'href="[^"]+.txt.gz"' | cut -f2 -d\") do echo downloading $FILENAME wget -q $MAILMAN_URL/$FILENAME gunzip $FILENAME done echo converting to maildir cd $ARCHIVE_LOCATION mb2md -s $(pwd)/mbox -d $(pwd)/maildir 2>/dev/null 1>/dev/null echo removing converted mailboxes rm -fr $ARCHIVE_LOCATION/mbox mv $ARCHIVE_LOCATION/maildir/* $ARCHIVE_LOCATION/maildir/.??* $ARCHIVE_LOCATION rmdir $ARCHIVE_LOCATION/maildir echo setting up mairixrc and muttrc cat << EOF > ~/.mairixrc base=$ARCHIVE_LOCATION maildir=... database=$HOME/.mairixdb mfolder=/tmp/mairix-results EOF cat << EOF > ~/.muttrc set folder=/tmp/mairix-results EOF echo indexing archive mairix echo 'mairix is all set-up; maybe you want to use this function:' echo 'mairix() {' echo ' /usr/bin/env mairix -o /tmp/mairix-results $* &&' echo ' mutt -Rf /tmp/mairix-results' echo '}'
The question of updates remains; a simple script should be able to do the trick, and maybe I’ll write it sometime. Or not.
Why don’t I contribute to Python (often)
2010/04/23 § 11 Comments
Oddly enough, just a week or two after I wrote the post “Contributing to Python“, Jesse Noller (of multiprocessing fame) wrote a post called “Why aren’t you contributing (to Python)?“. As a somewhat wannabe contributor, I think it’s a terrific question which deserves wordy and open answers (hey, just like this one). I realize the question has been asked specifically in the context of python.org’s development, but I think some of the answer applies to the whole Python eco-system, alternative implementations and libraries/frameworks included. Before we begin, I think that to answer the question, some distinction should be made about two rather different kinds of contributions.
The first is driven by whatever it is that you’re contributing. Suppose you happen to hack about a better GIL arbitration algorithm, or a sprawling and insanely rich asynchronous networking framework. You want that stuff to get out. I’d go so far as to say the source code itself wants to be out. It’s your baby, more often than not you think it’s great and unless something bad happens, you’ll push it. These are the kinds of things where you’re likely to find yourself obsessing over the stats of the website describing your latest gizmo or taking it personally when some loser on python-dev (like, say, that van-what’s-his-name-guy) says your implementation of goto for Python is not quite the next best thing since sliced lists.
The other, rather different kind, is that you run into something that is rather obviously a bug and wish to open-a-ticket-that-has-a-good-chance-to-be-committed for it. First of all, this is usually a far smaller patch. I doubt many people import threading, discover The Beazley Effect, rework the GIL and open a ticket with a patch. The use-case here is more like “I have a reproducible SIGSEGV” or “I wish import zipfile would support ZIP64″. Two interesting observations about this case: first, people are far less committed to their contribution, and second, more importantly, the realities of life dictate that the J. Random Hacker who ran into this either found a workaround or patched their own Python, so they sidestepped the bug. This is important. In practically all interesting cases, the reporter has already sidestepped the bug before or shortly after posting (sidestepped is a loose term, maybe they even moved to Perl…). I doubt anyone’s schedule is loose enough to allow them to wait without sidestepping a painful thorn even for the next bugfix release. This is a hundred times more true for documentation touchups – if you realized it’s wrong, you probably don’t have to fix it to keep working, you just use whatever knowledge you now know is right.
A rather pathological, tertiary case is the “I am not-Python-core, I have some free time, I wanna contribute to Python and I went bug-hunting in the tracker” one. I think its a pathological case of the second kind of contribution, and one that I suspect happens rather rarely. I’ll lump these two together.
If you agree so far, that we have a commit-driven-contribution (“damn this is so awesome I want this to be part of Python/twisted/Django/etc”) and a contribution-driven-commit (“damn Python is so awesome, it’s a shame to leave this wart unfixed, I’ll help”). As I said, I think very different reasons prevent people from doing either. I’ll start talking about the latter kind, both because it seemed to be the focus of Jesse’s original post and because it’s easiest to answer.
First, almost everything Jesse listed as reasons is true. Don’t know how, don’t know where, etc, almost all true. The best remedy here is to get as many people as possible to have, ugh, “broken their contribution cherry”, so to speak. The easier it will be to submit minor fixes for the first time, the more people will do it. The first time is important, psychologically and setup-ly. I think after a patch of yours has been committed, the fear of the technical part process is gone and the feeling of “gee, I can actually put stuff in Python!” kicks in, and you’re far more likely to start submitting more small patches. So if you want many more people to help with mundane issues, documentations touchups, etc, the community at large should make every effort to make this first time sweet.
How do we make it sweet? I don’t know for sure, but here is a short flurry of ideas which I’d be happy to discuss (and aid implementing!):
- Easy step-by-step instructions for opening a bug report, submitting a patch, for Ubuntu, OSX and Windows, all concentrated in one place, from setup to bug tracker attachment. The “contributing to Python” post I mentioned earlier is a (small) step in what I think is the right direction. We can flesh it out a lot, but make sure it keeps the step-by-step cookbook get-it-done approach, rather than what exists today, which is good, but isn’t aimed at getting-things-done. Compare signing up to Facebook with applying for a Tourist Visa in some foreign country.
- Small-time-Python-contribution-talks material to be made available. This is both to be consumed online in web-talks, but mainly aims to reach out and encourage such talks in LUGs and highschools/colleges (hmm, I love this idea, I should do this sometime…).
- A bit on a limb here, but maybe even doing what’s possible to optimize the review process in favour of first-time contributors. This is quite debatable, and (intentionally) vague, but I cautiously think it will pay off rather quickly.
These means (and probably others I didn’t think of) could probably alleviate the problem of a “contribution-driven-commit”, as I called it. Which leaves us with your fabulous implementation of goto, or “commit-driven-contribution”. I think two factors come into play here, both of them nearly irrelevant for the previous type of contribution. The first is the feeling that whatever it is you’ve done, it’s not good enough (this usually breaks my balls). “Me? Send this? To python-dev? Get outta here.”. And the second, I think, is indeed the feeling of an ‘uphill battle’ against grizzled python-dev grey beards and sharp tongued lurkers that are, I suspect, more likely than not to shred your idea to bits. Let’s face it, hacker communities at large are pretty harsh, and generally for understandable reasons. However, I think at times this tough skin and high barrier for contributing anything significant hurts us.
I used to have a co-worker, a strong hacker, who made two significant open-source packages for Python. I humbly think both are exceptionally elegant and at least one of them could have been a strong addition to stdlib. Every time I hear/read the code of some poor soul who recreated the efforts of this guy with these two packages, I cringe. I wouldn’t like to disclose his name before talking to him, but when I asked him why aren’t these packages part of stdlib, he said something like: “blah, unless you’re part of the python-dev cognoscenti you’ve no chance of putting anything anywhere”. I think he might not have pushed these packages hard enough, I should raid python-dev’s archives to know, but looking at the finesse of these packages on the one hand, and the number of questions on #python at freenode which I can answer by uttering these packages’ names, I think maybe we’re missing out on something. His perception, even if downright wrong (and I suspect it isn’t accurate, but not so wrong) is the bad thing that can happen to make you not contribute that big masterpiece you’ve made in your back yard, and that’s a damn shame. Most people will not survive the School of Hard Knocks, and that’s not necessarily always a good thing.
The issue of contributing big stuff is far more delicate and complex, and I’d be the first to admit I’m probably not rad enough to even discuss it. But it feels wrong to write a post under a heading like the one this one bears without at least mentioning this hacker-subculture-centric and sensitive issue, which affects many OSS projects, and Python as well. Micro-contributions are important, they’re the water and wind which slowly erode the landscape of a project into something cleaner, stabler and more elegant, but let’s not forget them big commits from which the mountains are born, too.
So Jesse, or any other curious soul, this is my answer to you. Should the gauntlet be picked up (by you, me or both of us) regarding the list of items I suggested earler about making micro-contributions more accessible? How about taking this to python-dev?