pv: the pipe swiss army knife

When using UNIX, every now and then you run into a relatively unknown command line application which, once you master it, becomes part of your “first class” commands along with cut or tr. You wince every time you work on a computer that doesn’t have it (and promptly wget-configure-make-install it) and you’re amazed your colleagues never heard of it. I often feel pv is such a command for me. Really, this command, much like netcat, should have been written in Berkley sometime circa 1985 and be in every /usr/bin today. Alas, somehow Hobbit only wrote netcat in 1996, and it took a long while for for it to reach /usr/bin ubiquity. Similarly, Andrew Wood only wrote pv in 2002, and I hope this post will convince you to place it in all your /usr/local/bins today and convince distribution makers to promote it to the status of a standard package as soon as possible.

The basic premise of pv is simple – it’s a program that copies stdin to stdout, while richly displaying progress using terminal graphics on stderr. If you use UNIX a lot and you never heard of pv before, I’m pretty sure the lightbulb is already lit above your head (if not, maybe pv isn’t for you after all or maybe it would help if you’d take a look this review of pv to help you see why it’s so great). pv has evolved rather nicely over the years, it’s available from Ubuntu universe for a while now (why only universe? why??), and it has a slew of useful features, like rate limiting, ETA prediction for an expected amount of data, on-the-fly parameter change (increase/decrease rate limit without breaking the pipe!), multiple invocation support (measure speed in two different points of the pipe!) and so on.

If you’re using pv, I hope you may want to see some of the recipes I use it in; if you don’t, maybe they’ll whet your appetite (I’m using long options for pv and short options for everything else):

  1. The basics: copy /tmp/src/ to /tmp/dst/, with progress
  2. [sourcecode language=”bash” light=”true”]
    $ src=/tmp/src ; tar -cC "$src" . |
    pv –size $(du -hsk "$src" | cut -f1)k |
    tar -xC /tmp/dst
    142MB 0:00:02 [43.4MB/s] [======> ] 58% ETA 0:00:01
    $
    [/sourcecode]
    By the way, this works great if you add nc and compression, pv can even help you decide what level of compression to use to achieve the best throughput before the CPU becomes the bottleneck.

  3. Scale a bunch of images to a specific size, using multiple cores and with progress
  4. [sourcecode language=”bash” light=”true”]
    $ cd /tmp/src ; ls *.jpg |
    xargs -P 4 -I % -t convert -resize 1024 % /tmp/dst/% 2>&1 |
    pv –line-mode –size $(ls *.jpg | wc -l) > /dev/null
    96 0:00:16 [7.85/s] [===> ] 36% ETA 0:00:28
    $
    [/sourcecode]

  5. Get a quick assessment of the traffic rate going through some interface
  6. [sourcecode language=”bash” light=”true”]
    $ sudo tcpdump -c 10000 -i eth1 -w – 2>/dev/null | pv > /dev/null
    35.4MB 0:00:07 [4.56MB/s] [ <=> ]
    $
    [/sourcecode]

Nifty, eh? I find myself inserting pv in any pipe I expect to exist for more than a few moments. How do you use pv?