Fazal Majid's low-intensity blog

Sporadic pontification

Fazal Fazal

Trimming the fat from JPEGs

I use Adobe Photoshop CS2 on my Mac as my primary photo editor. Adobe recently announced that the Intel native port of Photoshop would have to wait for the next release CS3, tentatively scheduled for Spring 2007. This ridiculously long delay is a serious sticking point for Photoshop users, specially those who jumped on the MacBook Pro to finally get an Apple laptop with decent performance, as Photoshop under Rosetta emulation will run at G4 speeds or lower on the new machines.

This nonchalance is not a very smart move on Adobe’s part, as it will certainly drive many to explore Apple’s Aperture as an alternative, or be more receptive to newcomers like LightZone. I know Aperture and Photoshop are not fully equivalent, but Aperture does take care of a significant proportion of a digital photographer’s needs, and combined with Apple’s recent $200 price reduction for release 1.1, and their liberal license terms (you can install it on multiple machines as long as you are the only user of those copies, so you only need to buy a single license even if like me you have both a desktop and a laptop).

There is a disaffection for Adobe among artists of late. Their anti-competitive merger with Macromedia is leading to complacency. Adobe’s CEO, Bruce Chizen, is also emphasizing corporate customers for the bloatware that is Acrobat as the focus for Adobe, and the demotion of graphics apps shows. Recent releases of Photoshop have been rather ho-hum, and it is starting to accrete the same kind of cruft as Acrobat (to paraphrase Borges, each release of it makes you regret the previous one). Hopefully Thomas Knoll can staunch this worrisome trend.

Adobe is touting its XMP metadata platform. XMP is derived from the obnoxious RDF format, a solution in search of a problem if there ever was one. RDF files are as far from human-readable as a XML-based format can get, and introduce considerable bloat. If Atom people had not taken the RDF cruft out of their syndication format, I would refuse to use it.

I always scan slides and negatives at maximal bit depth and resolution, back up the raw scans to a 1TB external disk array, then apply tonal corrections and spot dust. One bizarre side-effect of XMP is that if I take a 16-bit TIFF straight from the slide scanner, then apply curves and reduce it to 8 bits, somewhere in the XMP metadata that Photoshop “helpfully” embedded in the TIFF the bit depth is not updated and Bridge incorrectly shows the file as being 16-bit. The only way to find out is to open it (Photoshop will show the correct bit depth in the title bar) or look at the file size.

This bug is incredibly annoying, and the only work-around I have found so far is to run ImageMagick‘s convert utility with the -strip option to remove the offending XMP metadata. I did not pay the princely price for the full version of Photoshop to be required to use open-source software as a stop-gap in my workflow.

Photoshop will embed XMP metadata and other cruft in JPEG files if you use the “Save As…” command. In Photoshop 7, all that extra baggage actually triggered a bug in IE that would break its ability to display images. You have to use the “Save for Web…” command (actually a part of ImageReady) to save files in a usable form. Another example of poor fit-and-finish in Adobe’s software: “Save for Web” will not automatically convert images in AdobeRGB or other color profiles to the Web’s implied sRGB, so if you forget to do that as a previous step, the colors in the resulting image will be off.

“Save for Web” will also strip EXIF tags that are unnecessary baggage for web graphics (and can actually be a privacy threat). While researching the Fotonotes image annotation scheme, I opened one of my “Save for Web” JPEGs under a hex editor, and I was surprised to see literal strings like “Ducky” and “Adobe” (apparently the ImageReady developers have an obsession with rubber duckies). Photoshop is clearly still embedding some useless metadata in these files, even though it is not supposed to. The overhead corresponds to about 1-2%, which in most cases doesn’t require more disk space because files use entire disk blocks, whether they are fully filled or not, but this will lead to increased network bandwidth utilization because packets (which do not have the block size constraints of disks) will have to be bigger than necessary.

I wrote jpegstrip.c, a short C program to strip out Photoshop’s unnecessary tags, and other optional JPEG “markers” from JPEG files, like the optional “restart” markers that allow a JPEG decoder to recover if the data was corrupted — it’s not really a file format’s job to mitigate corruption, more TCP’s or the filesystem’s. The Independent JPEG Group’s jpegtran -copy none actually increased the size of the test file I gave it, so it wasn’t going to cut it. jpegstrip is crude and probably breaks in a number of situations (it is the result of a couple of hours’ hacking and reading the bare minimum of the JPEG specification required to get it working). The user interface is also pretty crude: it takes an input file over standard input, spits out the stripped JPEG over standard output and diagnostics on standard error (configurable at compile time).

ormag ~/Projects/jpegstrip>gcc -O3 -Wall -o jpegstrip jpegstrip.c
ormag ~/Projects/jpegstrip>./jpegstrip < test.jpg > test_strip.jpg
in=2822 bytes, skipped=35 bytes, out=2787 bytes, saved 1.24%
ormag ~/Projects/jpegstrip>jpegtran -copy none test.jpg > test_jpegtran.jpg
ormag ~/Projects/jpegstrip>jpegtran -restart 1 test.jpg > test_restart.jpg
ormag ~/Projects/jpegstrip>gcc -O3 -Wall -DDEBUG=2 -o jpegstrip jpegstrip.c
ormag ~/Projects/jpegstrip>./jpegstrip < test_restart.jpg > test_restrip.jpg
skipped marker 0xffdd (4 bytes)
skipped restart marker 0xffd0 (2 bytes)
skipped restart marker 0xffd1 (2 bytes)
skipped restart marker 0xffd2 (2 bytes)
skipped restart marker 0xffd3 (2 bytes)
skipped restart marker 0xffd4 (2 bytes)
skipped restart marker 0xffd5 (2 bytes)
skipped restart marker 0xffd6 (2 bytes)
skipped restart marker 0xffd7 (2 bytes)
skipped restart marker 0xffd0 (2 bytes)
in=3168 bytes, skipped=24 bytes, out=3144 bytes, saved 0.76%
ormag ~/Projects/jpegstrip>ls -l *.jpg
-rw-r--r--   1 majid  majid  2822 Apr 22 23:17 test.jpg
-rw-r--r--   1 majid  majid  3131 Apr 22 23:26 test_jpegtran.jpg
-rw-r--r--   1 majid  majid  3168 Apr 22 23:26 test_restart.jpg
-rw-r--r--   1 majid  majid  3144 Apr 22 23:27 test_restrip.jpg
-rw-r--r--   1 majid  majid  2787 Apr 22 23:26 test_strip.jpg

Update (2006-04-24):

Reader “Kam” reports jhead offers JPEG stripping with the -purejpg option, and much much more. Jhead offers an option to strip mostly useless preview thumbnails, but it does not strip out restart markers.

How to show respect for your readers

Blogging is often seen as a narcissistic pursuit. It can be, but the best bloggers (that is not necessarily synonymous with the most popular) put their audience first. To do that, you need to know it first. Most blogs have three very distinct types of readers:

  1. Regular visitors who use web browsers and bookmarks to visit. If the page doesn’t change often enough, they will get discouraged by the lack of changes and eventually stop coming. You need to post often to keep this population engaged.
  2. People who come from a search engine looking for very specific information. If they do not find what they are looking for, they will move on to the next site in their list, then possibly linger for other articles and may eventually graduate to repeat visitor status. Closely related are people who follow links from other sites, pointing to yours.
  3. Those who let feed readers do the polling for them, and thus do not necessarily care how often a feed is updated. Feed readers allow for much more scalable browsing – I currently subscribe to 188 feeds (not all of them are listed in my blogroll), and I certainly couldn’t afford to visit 188 sites each day. Feed readers are still a minority, but specially for commercial publications, a very attractive one of tech-savvy early adopters. The flip side of this is a more demanding audience. Many people go overboard with the number of feeds and burn out, then mass unsubscribe. If you are a little careful, you can avoid this pendulum effect by pruning feeds that no longer offer a sufficient signal to noise ratio.

The following sections, in no particular order, are rough guidelines on how best to cater to the needs of the other two types of users.

Maintain a high signal to noise ratio

Posting consistently good information on a daily or even weekly basis is no trivial amount of work. I certainly cannot manage more than a couple of postings per month, and I’d rather not clutter my website with “filler material” if I can help it. For this reason, I have essentially given up on the first constituency, and can only hope that they can graduate to feed readers as the technology becomes more mainstream.

Needless to say, test posts are amateurish and you should not waste your readers’ time with them. Do the right thing and use a separate staging environment for your blog. If your blogging provider doesn’t provide one, switch to a supplier that has a clue.

Posting to say that one is not going to post for a few days due to travel, a vacation or any other reason is the height of idiocy and the sure sign of a narcissist. A one-way trip to the unsubscribe button as far as I am concerned.

Distinguish between browsers and feed readers

In November of last year, I had an interesting conversation with Om Malik. My feedback to him was that he was posting too often and needed to pay more attention to the quality rather than the quantity of his postings.

The issue is not quite as simple as that. To some extent the needs of these browser users and those who subscribe to feeds are contradictory, but a good compromise is to omit the inevitable filler or site status update articles from the Atom or RSS feeds. Few blog tools offer this feature, however.

Search engines will index your home page, which is normally just a summary of the last N articles you wrote. Indeed, it will often have the highest page rank (or whatever metric is used). An older article may be pushed out but still listed in the (now out of date) search engine index. The latency is often considerable, and the end result is that people searching for something saw a tantalizing preview in the search engine results listing, but cannot find it once they land on the home page, or in the best of cases they will have to wade through dozens of irrelevant articles to get to it. Ideally, you want them to reach the relevant permalink page directly without stopping by the home page.

There is a simple way to eliminate this frustration for search engine users: make the home page (and other summary pages like category-level summaries or archive index pages) non-indexable. This can be done by adding the following meta tags to the top of the summary pages, but not to permalink pages. The search engine spiders will crawl through the summary pages to the permalinks, but only store the permalink pages in their index. Thus, all searches will lead to relevant and specific content free from extraneous material (which is still available, just one click away).

Here again, not all weblog software supports having different templates for permalink pages than for summary pages.

There is an unfortunate side-effect of this — as your home page is no longer indexed, you may experience a drop in search engine listings. My weblog is no longer the first hit for Google search for “Fazal Majid”. In my opinion, the improved relevance for search engine users far outweighs the bruising to my ego, which needs regular deflating anyways.

Support feed autodiscovery

Supporting autodiscovery of RSS feeds or Atom feeds makes it much easier for novice users to detect the availability of feeds (Firefox and Safari already support it, and IE will soon). Adding them to a page is a no-brainer.

Categorize your articles

In all likelihood, your postings cover a variety of topics. Categorizing them means users can subscribe only to those of interest to them, and thus increases your feed’s signal to noise ratio.

Keep a stable feed URL under your control

If your feed location changes, set up a redirection. If this is not possible, at least post an article in the old feed to let subscribers know where to get the new feed.

Depending on a third-party feed provider like Feedburner is risky — if they ever go out of business, your subscribers are stranded. Even worse, if a link farm operator buys back the domain, they can easily start spamming your subscribers, and make it look as if the spam is coming from you. Your feeds are just as mission-critical as your email and hosting, don’t enter in an outsourcing arrangement casually, specially not one without a clear exit strategy.

Maintain old posts

Most photographers, writers and musicians depend on residuals (recurring revenue from older work) for their income and to support them in retirement. Unless your site is pure fluff (and you would not be reading this if that were the case), your old articles are still valuable. Indeed, there is often a Zipf law at work and you may find some specific archived articles account for the bulk of your traffic (in my case, my article on lossy Nikon NEF compression is a perennial favorite).

It is worth dusting these old articles off every now and then:

  • You should fix or replace the inevitable broken links (there are many programs available to locate broken links on a site, I have my own but linkchecker is a pretty good free one.
  • The content in the article may have gone stale and need refreshing, Don’t rewrite history, however, and change it in a way that alters the original meaning — better to append an update to the article. If there was a factual error, don’t leave it in the main text of the article, but leave a mention of the correction at the end
  • there is no statute of limitations on typos or spelling mistakes. Sloppy writing is a sign of disrespect towards your readers; Rewriting text to clarify the meaning is also worthwhile on heavily visited “backlist” pages. The spirit of the English language lies in straightforwardness, one thing all the good style guides agree on.
  • For those of you who have comments enabled on their site, pay special attention to your archives, comment spammers will often target those pages as it is often easier for them to avoid detection there. You may want to disable comments on older articles.
  • Provide redirection for old URLs so old links do not break. Simple courtesy, really.

Make your feeds friendly for aggregators

Having written my own feed reader, I have all too much experience with broken or dysfunctional feeds. There is only so much feed reader programmers can do to work around brain-dead feeds.

  • Stay shy of the bleeding edge in feed syndication formats. Atom offers a number of fancy features, but you have to assume many feed readers may break if you use too many of them. It is best if your feed files use fully qualified absolute URLs, even if Atom supports relative URLs, for instance. Unicode is also a double-edged sword, prefer HTML entity-encoding them over relying on a feed reader to deal with content-encoding correctly.
  • Understand GUIDs. Too many feeds with brain-dead blogging software will issue a new GUID when an article is edited or corrected, or when its title is changed. Weblogs Inc. sites are egregious offenders, as is Reuters. The end-result is that an article will appear several times in the user’s aggregator, which is incredibly annoying. Temboz has a feature to automatically suppress duplicate titles, but that won’t cope with rewritten titles.
  • Full contents vs. abstracts is a point of contention. Very long posts are disruptive on web-based feed readers, but on the other hand most people dislike the underhanded teaser tactics of commercial sites that try and draw you to their website to drive ad revenue, and providing only abstracts may turn them off your feed altogether. Remember, the unsubscribe button is a mere click away…

Blogging ethics

The golden rule of blogging is that it’s all about the readers. Everything follows from this simple principle. You should strive to be relevant and considerate of their time. Take the time to spell-check your text. It is very difficult to edit one’s own text, but any article can benefit from a little time spent maturing, and from tighter and more lucid prose.

Don’t be narcissistic, unless friends and family are the primary audience. Most people couldn’t care less about your pets, your garden or for the most part your personal life (announcing major life events like a wedding or the birth of your children is perfectly normal, however).

Respect for your readers requires absolute intellectual honesty. Laziness or expediency are no excuse for poor fact-checking or revisionist edits. Enough said…

Update (2008-05-21):

Unfortunately setting the meta tags above seems to throw Google off so that it stops indexing pages altogether (Yahoo and MSN search have no problems). So much for the myth of Google’s technical omnipotence… As a result, I have removed them and would advise you to do as well.

Update (2015-11-20):

If you use JavaScript and cookie-based web analytics like Piwik or Mint, make sure those script tags are disabled if the browser sends the Do-Not-Track header. As for third-party services like Google Analytics, just don’t. Using those services means you are selling giving away your readers’ privacy to some of the most rapacious infringers in the world.

MacBook Pro first impressions

I am writing this on a brand-spanking new Apple MacBook Pro (yes, I know, clumsy name). One of the reasons for my purchase is because I have been spending quite a bit of time in trains lately. Trains are one of the most civilized ways to travel, Caltrain certainly beats being stuck behind the wheel in the gridlock that is U.S. Highway 101. A laptop is a good way to get things done during the 3-hour round-trip to Santa Clara.

My last few laptops were company-issued Windows models. I only ever purchased two laptops before, both Macs, a PowerBook 180c in college (it sported a 68K chip, proof that Apple could have kept the PowerBook moniker on an Intel-powered machine) and one of the original white iBooks in 2001 when they first came out around the same time as Mac OS X. For the last ten years or so, I always managed to have ultra-thin and light models (less than 2kg / 4lb) assigned to me, and the MacBook Pro is certainly heavier than I would like. That said, it has a gorgeous screen and a decent keyboard.

Subjectively so far, it does not seem appreciably slower than my dual-2GHz PowerMac G5. I ran Xbench for a more objective comparison, you can see the benchmark results for more info. Unsurprisingly, the disk I/O is in the desktop’s favor, but the Core Do processor holds its own, and even beats the G5 handily on integer performance benchmarks.

I prefer desktops to laptops, for their superior capacity and peripherals. With its relatively puny 80GB of storage capacity, the laptop (it doesn’t really qualify as a notebook given its physical size) is not going to usurp the G5 soon. It doesn’t even have enough capacity to store my complete music library, for instance. I am not looking forward to the usual hassles of synchronizing two computers. Apple’s synchronization solution requires buying a $499 Mac OS X Server license, and third-party solutions are a bit thin.

Now, Apple is a designer PC company, and you want to protect the casework with a decent amount of padding, but the protective case itself must look sharp. I have always had good experience with Waterfield Designs bags made right here in San Francisco, so I naturally got one of their sleevecases. It is made of high-grade neoprene rubber rather than the foam used by other manufacturers, but in exploring my options, I couldn’t help but notice the dizzying array of choices for design-conscious Mac users. For some reason, Australian companies are over-represented, I counted no fewer than 4 manufacturers:

  • Crumpler
  • STM As for the MacBook Pro itself, it is too soon to tell. One thing you immediately notice is how hot it gets, even though the entire aluminum case should act like one big heat sink. I haven’t played with the built-in iSight yet so I can’t compare its quality with that of the stand-alone iSight I have mounted on my desktop.

    The 512MB of RAM installed are woefully inadequate for a supposedly professional machine, but I would rather not pay Apple’s grossly inflated margins on RAM compared to Crucial. I bumped it up to the full 2GB. This upper limit is kind of disappointing when you come from a 64-bit platform (my desktop has 5.5GB of RAM). Laptops benefit even more than desktops from RAM, as free RAM is automatically used as a disk cache, and reduces the need to fetch data from slow and power-hungry 2.5″ hard drives.

    Update (2006-04-05):

    Don’t try to use Monolingual to strip non-Intel architectures to save some space. You will end up rendering Rosetta unusable… I used to disable Classic, I am not sure I would go that far in only allowing Intel binaries to run on my machine.

    Update (2007-08-02):

    More Australian laptop bag manufacturers:

Another one bites the dust

After a brief period of 100% digital shooting in 1999–2001, I went back to primarily shooting with film, both black & white and color slides. I process my B&W film at home but my apartment is too small for a darkroom to make prints, not do I have a room dark enough, so I rent time at a shared darkroom. I used to go to the Focus Gallery in Russian Hill, but when I called to book a slot about a month ago, the owner informed me he was shutting down his darkroom rental business and relocating. He did recommend a suitable replacement, which actually has nicer, brand new facilities, albeit in not as nice a neighborhood. Learning new equipment and procedures was still an annoyance

Color is much harder than B&W, and requires toxic chemicals. I shoot slides, which use the E-6 process, not the C-41 process for more common color negative film. For the last five years, I have been going to ChromeWorks, a Mom-and-Pop lab on Bryant Street, San Francisco’s closest equivalent to New York’s photo district. The only thing they did was E-6 film processing, and they did it exceedingly well, with superlative customer service and quite reasonable rates. When I went there today to hand them a roll for processing, I discovered they closed down two months ago, apparently a mere week after I last went there.

I ended up giving my roll to the NewLab, another pro lab a few blocks away, which is apparently the last E-6 lab in San Francisco (I had used their services before for color negative film, which I almost never use apart from the excellent Fuji Natura 1600).

Needless to say, these developments are not encouraging for a film enthusiast.

Update (2007-12-14):

There is at least one other E-6 lab in San Francisco, Fotodepo (1063 Market @ 7th). They cater mostly to Academy of Arts students and are not a pro lab by any means (I have never seen a more cluttered and untidy lab). In and in any case they are more expensive than the New Lab, if more conveniently located.

Update (2009-08-27):

The Newlab itself closed as well few months ago. I now use Light Waves instead.

A Python driver for the Symbol CS 1504 bar code scanner

One of my cousins works for Symbol, the world’s largest bar code reader manufacturer. The fashionable action today is in RFID, but the humble bar code is relatively untapped at the consumer level. The unexpected success of Delicious Library shows people want to manage their collection of books, CDs and DVDs, and as with businesses, scanning bar codes is the fastest and least error-prone way to do so. Delicious Library supports scanning bar codes with an Apple iSight camera, but you have to wonder how reliable that is.

If you want something more reliable, you need a dedicated bar code scanner. They come in a bewildering array of sizes and shapes, from thin wands to pistol-like models or flat ones like those used at your supermarket checkout counter. For some reason, the bar code scanner world seems stuck in the era of serial ports (or worse, PS/2 keyboard wedges), but USB models are available, starting at $70 or so. They emulate a keyboard – when you scan a bar code, they will type in the code (as printed on the label), character by character so as to not overwhelm the application, and follow with a carriage return, which means they can work with almost anything from terminal-based applications to web pages. Ingeniously, most will allow you to program the reader’s settings using a booklet of special bar codes that perform changes like enabling or disabling ISBN decoding, and so on.

The problem with tethered bar code readers is, they are not very convenient if you are trying to catalog items on a bookshelf or read in UPC codes in a supermarket. Symbol has a unit buried deep inside its product catalog, the CS 1504 consumer scanner. This tiny unit (shown below with a canister of 35mm film for size comparison) can be worn on a key chain, although I would worry about damaging the plastic window. Most bar code readers are hulking beasts in comparison. It has a laser bar code scanner: just align the line it projects with the bar code and it will chirp once it has read and memorized the code. The memory capacity is up to 150 bar code scans with timestamps, or 300 without timestamps. The 4 silver button batteries (included) are rated for 5000 scans — AAA would have been preferable, but I guess the unit wouldn’t be so compact, but it is clear this scanner was not intended for heavy-duty commercial inventory tracking purposes.

I bought one to simplify the process of listing books with BookCrossing (even though their site is not optimized for bar code readers), but you have other interesting uses like finding out more about your daily purchases such as nutritional information or whether the company behind them engages in objectionable business practices. I can also imagine sticking preprinted bar-coded asset tracking tags on inventory (e.g. computers in the case of an IT department), and keeping track of them with this gizmo. People who sell a lot of books or used records through Amazon.com can also benefit as Amazon has a bulk listing service to which you can upload a file with barcodes. An interesting related service is the free UPC database.

Symbol CS 1504
You can order the scanner in either serial ($100) or USB ($110) versions, significantly cheaper than the competition like Intelliscanner (and much smaller to boot). I highly recommend the USB version, even if you have a serial port today — serial ports seem to be going the way of the dodo and your next computer may not have one. The USB version costs slightly more, but that’s because they include a USB-Serial adapter, and you can’t get one retailing for a mere $10. The one shipped with my unit is the newer PN50 cable which uses a Prolific 2303 chipset rather than the older Digi adapter. Wonder of wonders, they even have a

Mac OS X driver available.

The scanner ships without any software. Symbol mostly sells through integrators to corporations that buy hundreds or thousands of bar code scanners for inventory or point of sale purposes, and they are not really geared to be a direct to consumer business with all the customer support hassles that entails. There are a number of programs available, mostly for Windows, but they don’t seem to have that much by way of functionality to justify their high prices, often as expensive as the scanner itself.

Symbol does make available a SDK to access the scanner, including complete documentation of the protocol used for the device. While you do have to register, they do not make you go through the ridiculous hoops you have to pass to access to the Photoshop plug-in SDK or the Canon RAW decoding SDK. The supplied libraries are Windows-only, however, so I wrote a Python script that works on both Windows and Mac OS X (and probably most UNIX implementations as well, although you will have to use a serial port). The only dependency is the pySerial module.

By default, it will set the clock on the scanner, retrieve the recorded bar codes, correct the timestamps for any drift between the CS 1504’s internal clock and that of the host computer, and if successful clear the unit’s memory and dump the acquired bar codes in CSV format to standard output. The script will also decode ISBN codes (the CS 1504 does not appear to do this by itself in its default configuration). As it is written in Python, it can easily be extended, although it is probably easier to work off the CSV file.

The only configuration you have to do is set the serial port to use at the top of the script (it should do the right thing on a Mac using the Prolific driver, and the Windows driver seems to always use COM8 but I have no way of knowing if this is by design or coincidence). The program is still very rough, specially as concerns error recovery, and I appreciate any feedback.

A sample session follows:

ormag ~>python cs1504.py > barcodes.csv
Using device /dev/cu.usbserial...  connected
serial# 000100000003be95
SW version NBRIKAAE
reading clock for drift
clock drift 0:00:01.309451
resetting scanner clock... done
reading barcodes... done (2 read)
clearing barcodes... done
powering down... done

ormag ~>cat barcodes.csv
UPCA,034571575179,2006-03-27 01:08:48
ISBN,1892391198,2006-03-27 01:08:52

Update (2006-07-21):

At the prompting of some Windows users, I made a slightly modified version, win_cs1504.py, that will copy the barcodes to the clipboard, and also insert the symbology, barcode and timestamp starting on the first free line in the active Excel spreadsheet (creating one if necessary).

Update (2007-01-20):

Just to make it clear: I hereby place this code in the public domain.

Update (2009-11-06):

For Windows users, I have put up videos describing how to install the Prolific USB to serial driver, Python and requisite extensions, and how to use the program itself.

Update (2012-07-05):

I moved the script over to GitHub. Please file bug reports and enhancement requests there. Fatherhood and a startup don’t leave me much time to maintain this, so I make no promises, but this should allow people who make fixes to contribute them back (or fork).