Trimming the fat from JPEGs
I use Adobe Photoshop CS2 on my Mac as my primary photo editor. Adobe recently announced that the Intel native port of Photoshop would have to wait for the next release CS3, tentatively scheduled for Spring 2007. This ridiculously long delay is a serious sticking point for Photoshop users, specially those who jumped on the MacBook Pro to finally get an Apple laptop with decent performance, as Photoshop under Rosetta emulation will run at G4 speeds or lower on the new machines.
This nonchalance is not a very smart move on Adobe’s part, as it will certainly drive many to explore Apple’s Aperture as an alternative, or be more receptive to newcomers like LightZone. I know Aperture and Photoshop are not fully equivalent, but Aperture does take care of a significant proportion of a digital photographer’s needs, and combined with Apple’s recent $200 price reduction for release 1.1, and their liberal license terms (you can install it on multiple machines as long as you are the only user of those copies, so you only need to buy a single license even if like me you have both a desktop and a laptop).
There is a disaffection for Adobe among artists of late. Their anti-competitive merger with Macromedia is leading to complacency. Adobe’s CEO, Bruce Chizen, is also emphasizing corporate customers for the bloatware that is Acrobat as the focus for Adobe, and the demotion of graphics apps shows. Recent releases of Photoshop have been rather ho-hum, and it is starting to accrete the same kind of cruft as Acrobat (to paraphrase Borges, each release of it makes you regret the previous one). Hopefully Thomas Knoll can staunch this worrisome trend.
Adobe is touting its XMP metadata platform. XMP is derived from the obnoxious RDF format, a solution in search of a problem if there ever was one. RDF files are as far from human-readable as a XML-based format can get, and introduce considerable bloat. If Atom people had not taken the RDF cruft out of their syndication format, I would refuse to use it.
I always scan slides and negatives at maximal bit depth and resolution, back up the raw scans to a 1TB external disk array, then apply tonal corrections and spot dust. One bizarre side-effect of XMP is that if I take a 16-bit TIFF straight from the slide scanner, then apply curves and reduce it to 8 bits, somewhere in the XMP metadata that Photoshop “helpfully” embedded in the TIFF the bit depth is not updated and Bridge incorrectly shows the file as being 16-bit. The only way to find out is to open it (Photoshop will show the correct bit depth in the title bar) or look at the file size.
This bug is incredibly annoying, and the only work-around I have found so far is to run ImageMagick‘s convert utility with the -strip option to remove the offending XMP metadata. I did not pay the princely price for the full version of Photoshop to be required to use open-source software as a stop-gap in my workflow.
Photoshop will embed XMP metadata and other cruft in JPEG files if you use the “Save As…” command. In Photoshop 7, all that extra baggage actually triggered a bug in IE that would break its ability to display images. You have to use the “Save for Web…” command (actually a part of ImageReady) to save files in a usable form. Another example of poor fit-and-finish in Adobe’s software: “Save for Web” will not automatically convert images in AdobeRGB or other color profiles to the Web’s implied sRGB, so if you forget to do that as a previous step, the colors in the resulting image will be off.
“Save for Web” will also strip EXIF tags that are unnecessary baggage for web graphics (and can actually be a privacy threat). While researching the Fotonotes image annotation scheme, I opened one of my “Save for Web” JPEGs under a hex editor, and I was surprised to see literal strings like “Ducky” and “Adobe” (apparently the ImageReady developers have an obsession with rubber duckies). Photoshop is clearly still embedding some useless metadata in these files, even though it is not supposed to. The overhead corresponds to about 1-2%, which in most cases doesn’t require more disk space because files use entire disk blocks, whether they are fully filled or not, but this will lead to increased network bandwidth utilization because packets (which do not have the block size constraints of disks) will have to be bigger than necessary.
I wrote jpegstrip.c, a short C program to strip out Photoshop’s unnecessary tags, and other optional JPEG “markers” from JPEG files, like the optional “restart” markers that allow a JPEG decoder to recover if the data was corrupted — it’s not really a file format’s job to mitigate corruption, more TCP’s or the filesystem’s. The Independent JPEG Group’s jpegtran -copy none actually increased the size of the test file I gave it, so it wasn’t going to cut it. jpegstrip is crude and probably breaks in a number of situations (it is the result of a couple of hours’ hacking and reading the bare minimum of the JPEG specification required to get it working). The user interface is also pretty crude: it takes an input file over standard input, spits out the stripped JPEG over standard output and diagnostics on standard error (configurable at compile time).
ormag ~/Projects/jpegstrip>gcc -O3 -Wall -o jpegstrip jpegstrip.c ormag ~/Projects/jpegstrip>./jpegstrip < test.jpg > test_strip.jpg in=2822 bytes, skipped=35 bytes, out=2787 bytes, saved 1.24% ormag ~/Projects/jpegstrip>jpegtran -copy none test.jpg > test_jpegtran.jpg ormag ~/Projects/jpegstrip>jpegtran -restart 1 test.jpg > test_restart.jpg ormag ~/Projects/jpegstrip>gcc -O3 -Wall -DDEBUG=2 -o jpegstrip jpegstrip.c ormag ~/Projects/jpegstrip>./jpegstrip < test_restart.jpg > test_restrip.jpg skipped marker 0xffdd (4 bytes) skipped restart marker 0xffd0 (2 bytes) skipped restart marker 0xffd1 (2 bytes) skipped restart marker 0xffd2 (2 bytes) skipped restart marker 0xffd3 (2 bytes) skipped restart marker 0xffd4 (2 bytes) skipped restart marker 0xffd5 (2 bytes) skipped restart marker 0xffd6 (2 bytes) skipped restart marker 0xffd7 (2 bytes) skipped restart marker 0xffd0 (2 bytes) in=3168 bytes, skipped=24 bytes, out=3144 bytes, saved 0.76% ormag ~/Projects/jpegstrip>ls -l *.jpg -rw-r--r-- 1 majid majid 2822 Apr 22 23:17 test.jpg -rw-r--r-- 1 majid majid 3131 Apr 22 23:26 test_jpegtran.jpg -rw-r--r-- 1 majid majid 3168 Apr 22 23:26 test_restart.jpg -rw-r--r-- 1 majid majid 3144 Apr 22 23:27 test_restrip.jpg -rw-r--r-- 1 majid majid 2787 Apr 22 23:26 test_strip.jpg
Update (2006-04-24):
Reader “Kam” reports jhead offers JPEG stripping with the -purejpg option, and much much more. Jhead offers an option to strip mostly useless preview thumbnails, but it does not strip out restart markers.