Fazal Majid's low-intensity blog

Sporadic pontification

Fazal Fazal

Is the Nikon D70 NEF (RAW) format truly lossless?

Many digital photographers (including myself) prefer shooting in so-called RAW mode. In theory, the camera saves the data exactly as it is read off the sensor, in a proprietary format that can later be processed on a PC or Mac to extract every last drop of performance, dynamic range and detail from the captured image, something the embedded processor on board the camera is hard-pressed to do when it is trying to cook the raw data into a JPEG file in real time.

The debate rages between proponents of JPEG and RAW workflows. What it really reflects is two different approaches to photography, both equally valid.

For people who favor JPEG, the creative moment is when you press the shutter release, and they would rather be out shooting more images than slaving in a darkroom or in front of a computer doing post-processing. This was Henri Cartier-Bresson’s philosophy — he was notoriously ignorant of the details of photographic printing, preferring to rely on a trusted master printmaker. This group also includes professionals like wedding photographers or photojournalists for whom the productivity of a streamlined workflow is an economic necessity (even though the overhead of a RAW workflow diminishes with the right software, it is still there).

Advocates of RAW tend to be perfectionists, almost to the point of becoming image control freaks. In the age of film, they would spend long hours in the darkroom getting their prints just right. This is the approach of Ansel Adams, who used every trick in the book (he invented quite a few of them, like the Zone System) to obtain the creative results he wanted. In his later days, he reprinted many of his most famous photographs in ways that made them darker and filled with foreboding. For RAW aficionados, the RAW file is the negative, and the finished output file, which could well be a JPEG file, the equivalent of a print.

Implicit is the assumption that the RAW files are pristine and have not been tampered with, unlike JPEGs that had post-processing such as white balance or Bayer interpolation applied to them, and certainly no lossy compression. This is why the debate can get emotional when a controversy erupts, such as whether a specific camera’s RAW format is lossless or not.

The new Nikon D70’s predecessor, the D100, had the option of using uncompressed or compressed NEFs. Uncompressed NEFs were about 10MB in size, compressed NEF between 4.5MB and 6MB. In comparison, the Canon 10D lossless CRW format images are around 6MB to 6.5MB in size. In practice, compressed NEFs were not an option as they were simply too slow (the camera would lock up for 20 seconds or so while compressing).

The D70 only offers compressed NEFs as an option, but mercifully they have improved the performance. Ken Rockwell asserts D70 compressed NEFs are lossless, while Thom Hogan claims:

Leaving off Uncompressed NEF is potentially significant–we’ve been limited in our ability to post process highlight detail, since some of it is destroyed in compression.

To find out which one is correct, I read the C language source code for Dave Coffin’s excellent reverse-engineered, open-source RAW converter, dcraw, which supports the D70. The camera has a 12-bit analog to digital converter (ADC) that digitizes the analog signal coming out of the Sony ICX413AQ CCD sensor. In theory a 12-bit sensor should yield up to 212 = 4096 possible values, but the RAW conversion reduces these 4096 values into 683 by applying a quantization curve. These 683 values are then encoded using a variable number of bits (1 to 10) with a tree structure similar to the lossless Huffmann or Lempel-Ziv compression schemes used by programs like ZIP.

The decoding curve is embedded in the NEF file (and could thus be changed by a firmware upgrade without having to change NEF converters), I used a D70 NEF file made available by Uwe Steinmuller of Digital Outback Photo.

The quantization discards information by converting 12 bits’ worth of data into into log2(683) = 9.4 bits’ worth of resolution. The dynamic range is unchanged. This is a fairly common technique – digital telephony encodes 12 bits’ worth of dynamic range in 8 bits using the so-called A-law and mu-law codecs. I modified the program to output the data for the decoding curve (Excel-compatible CSV format), and plotted the curve (PDF) using linear and log-log scales, along with a quadratic regression fit (courtesy of R). The curve resembles a gamma correction curve, linear for values up to 215, then quadratic.

In conclusion, Thom is right – there is some loss of data, mostly in the form of lowered resolution in the highlights.

Does it really matter? You could argue it does not, as most color spaces have gamma correction anyway, but highlights are precisely where digital sensors are weakest, and losing resolution there means less headroom for dynamic range compression in high-contrast scenes. Thom’s argument is that RAW mode may not be able to salvage clipped highlights, but truly lossless RAW could allow recovering detail from marginal highlights. I am not sure how practicable this would be as increasing contrast in the highlights will almost certainly yield noise and posterization. But then again, there are also emotional aspects to the lossless vs. lossy debate…

In any case, simply waving the problem away as “curve shaping” as Rockwell does is not a satisfactory answer. His argument that the cNEF compression gain is not all that high, just as with lossless ZIP compression, is risibly fallacious, and his patronizing tone out of place. Lossless compression entails modest compression ratios, but the converse is definitely not true: if I replace the file with a file that is half the size but all zeroes, I have a 2:1 compression ratio, but 100% data loss. Canon does manage to get the close to the same compression level using lossless compression, but Nikon’s compressed NEF format has the worst of both world – loss of data, without the high compression ratios of JPEG.

Update (2004-05-12):

Franck Bugnet mentioned this technical article by noted astrophotographer Christian Buil. In addition to the quantization I found, it seems that the D70 runs some kind of low-pass filter or median algorithm on the raw sensor data, at least for long exposures, and this is also done for the (not so) RAW format. Apparently, this was done to hide the higher dark current noise and hot pixels in the Nikon’s Sony-sourced CCD sensor compared to the Canon CMOS sensors on the 10D and Digital Rebel/300D, a questionable practice if true. It is not clear if this also applies to normal exposures. The article shows a work-around, but it is too cumbersome for normal usage.

Update (2005-02-15):

Some readers asked whether the loss of data reflected a flaw in dcraw rather than actual loss of data in the NEF itself. I had anticipated that question but never gotten around to publishing the conclusions of my research. Somebody has to vindicate the excellence of Dave Coffin’s software, so here goes.

Dcraw reads raw bits sequentially. All bits read are processed, there is no wastage there. It is conceivable, if highly unlikely, that Nikon would keep the low-order bits elsewhere in the file. If that were the case, however, those bits would still take up space somewhere in the file, even with lossless compression.

In the NEF file I used as a test case, dcraw starts processing the raw data sequentially beginning at an offset of 963,776 bytes from the beginning of the file, and reads in 5.15MB of RAW data, i.e. all the way to the end of the 6.07MB NEF file. The 941K before the offset correspond to the EXIF headers and other metadata, the processing curve parameters and the embedded JPEG (which is usually around 700K in size on a D70). There is no room left elsewhere in the file for the missing 2.5 bits by 6 million pixels (roughly 2MB) of missing low-order sensor data. Even if they were compressed using a LZW or equivalent algorithm the way the raw data is, and assuming a typical 50% compression ratio for nontrivial image data, that would still mean something like 1MB of data that is unaccounted for.

Nikon simply could not have tucked the missing data away anywhere else in the file. The only possible conclusion is that dcraw does indeed extract whatever image data is available in the file.

Update (2005-04-17):

In another disturbing development in Nikon’s RAW formats saga, it seems they are encrypting white balance information in the D2X and D50 NEF format. This is clearly designed to shut out third-party decoders like Adobe Camera RAW or Phase One Capture One, and a decision that is completely unjustifiable on either technical or quality grounds. Needless to say, these shenanigans on Nikon’s part do not inspire respect.

Generally speaking, Nikon’s software is usually somewhat crude and inefficient (just for the record, Canon’s is far worse). For starters, it does not leverage multi-threading or the AltiVec/SSE3 optimizations in modern CPUs. Nikon Scan displays scanned previews at a glacial pace on my dual 2GHz PowerMac G5, and on a modern multi-tasking operating system, there is no reason for the scanning hardware to pause interminably while the previous frame’s data is written to disk.

While Adobe’s promotion of the DNG format is partly self-serving, they do know a thing or two about image processing algorithms. Nikon’s software development kit (SDK) precludes them from implementing those algorithms instead of Nikon’s, and thus disallows Adobe Camera RAW’s advanced features like chromatic aberration or vignetting correction. Attempting to lock out alternative image-processing algorithms is more an admission of (justified) insecurity than anything else.

Another important consideration is the long-term accessibility of the RAW image data. Nikon will not support the D70 for ever — Canon has already discontinued support in its SDK for the RAW files produced by the 2001 vintage D30. I have thousands of photos taken with a D30, and the existence of third-party maintained decoders like Adobe Camera RAW, or yet better open-source ones like Dave Coffin’s is vital for the long-term viability of those images.

Update (2005-06-23):

The quantization applied to NEF files could conceivably be an artifact of the ADC. Paradoxically, most ADCs digitize a signal by using their opposite circuit, a digital to analog converter (DAC). DACs are much easier to build, so many ADCs combine a precision voltage comparator, a DAC and a counter. The counter increases steadily until the corresponding analog voltage matches the signal to digitize.

The quantization curve on the D70 NEF is simple enough that it could be implemented in hardware, by incrementing by 1 until 215, and then incrementing by the value of a counter afterwards. The resulting non-linear voltage ramp would iterate over at most 683 levels instead of a full 4096 before matching the input signal. The factor of nearly 8 speed-up means faster data capture times, and the D70 was clearly designed for speed. If the D70’s ADC (quite possibly one custom-designed for Nikon) is not linear, the quantization of the signal levels would not in itself be lossy as that is indeed the exact data returned by the sensor + ADC combination, but the effect observed by Christian Buil would still mean the D70 NEF format is lossy.

Attack of the London taxis

London taxiLondon-style taxis (also known as “Hackney carriages) are becoming a common sight in San Francisco, which is apparently one of the first cities in the US to get them. It is amusing, really, when most observers in London expected them to disappear a few years ago. The antiquated look of the London taxi endears it to Londoners, but more importantly, they are very roomy for passengers, and easy to get in and out of, even when you are carrying an umbrella…

One (regular) taxi driver complained to me the London taxis are under-powered and do not go fast enough for him to zip to the other side of the city to pick a ride. Anyone who has seen taxicabs drive in this city knows this is a feature, not a bug, in the interests of public safety. Not that taxi drivers are worse than others – I have never been in another city where drivers violate red lights as casually as in San Francisco, even though I have lived in Paris and Amsterdam.

Taxis, along with docks, are one of the few domains in everyday life where byzantine nineteenth century work arrangements still prevail in defiance of the free market. Most cities arbitrarily limit the number of taxis that can ply the streets, a system that usually benefits taxi companies more than taxi drivers, who often end up in a position similar to sharecroppers. The quotas are seldom updated to reflect demand, due to lobbying by entrenched taxi companies, and cities like Paris or San Francisco often face severe taxi shortages. The French demographer Alfred Sauvy (PDF) related how ministers would fear the wrath of taxi strikers and chicken out of raising numbers.

In San Francisco, proposition K, passed in 1978, limits the number of taxi medallions to 1300. The measure was designed to let genuine taxi drivers, not companies, own the medallions, by requiring a nominal number of driving hours to retain the medallion. The lucky few who hold medallions lease them for $20,000-30,000 a year to taxi companies for when they are not driving themselves. Most actual taxi drivers do not have medallions and lease them for $100 a day or so from taxi companies (sharecroppers on plantations were not required to pay for the privilege of employment).

Of course, the people profiting from this cozy arrangement are never content – the permit holders want to drive less so they can enjoy the rent they are collecting from the coveted medallions. One attempted ploy was to reduce the driving hours requirement for disabled workers. Needless to say, had the measure been passed, overnight many permit holders would have found themselves mysteriously incapacitated. Taxi companies would like to grab medallions for themselves and cut off permit holders from the trough.

The right solution would be to abolish the medallion system altogether, or grant one to all working as opposed to rent-collecting drivers. But of course that is the one solution all vested interests are adamantly opposed to, as it would upset their apple cart. Given the abysmally dysfunctional state of San Francisco municipal politics, the situation is unlikely to improve. No amount of window-dressing with London style cabs is going to change that.

9 Beet Stretch

Amedei Porcelana

PorcelanaI recently purchased a bar of Amedei Porcelana chocolate. Fog City News sells them for $11 here in San Francisco. When a bar of chocolate is individually numbered in a limited edition, you know it is going to be expensive… There are two reasons why boutique chocolates bars made in small quantities are better than mass-produced ones.

The first one is they don’t adulterate the cocoa butter with vegetable fats (a.k.a. margarine). The European Union yielded to British lobbying efforts and allowed this indefensible practice. Not that chocolate is the only product that legendarily taste-impaired nation tampers with. I lived in London in 1982, and remember my horror at finding out that vanilla “ice cream” included such fine ingredients as fish oil…

The second one is that big manufacturers like Nestle, Kraft Jacobs Suchard, Cadbury or Lindt produce such large volumes they can only retain cocoa varietals that are also grown in large quantities in industrial scale plantations, just as McDonald’s uses standardized potatoes grown to order. Furthermore, several varieties are usually blended for homogeneity, at the expense of character (echoes of the debate between proponents of blended vs. single malt Scotch whiskey). Smaller companies or smaller production runs do not have these constraints and can purchase high-quality cocoa beans that are grown in small quantities.

Venezuelan Criollo cocoa is widely considered the finest variety. It is not as strong (some may say harsh) as Forestero varieties, but has much more refined and complex flavor. It also has poor yields, making it unsuitable for the mass market. Porcelana is the most genetically pure variety of Criollo, and like the others, has mild but incredibly subtle aromas, without the aggressive acidity of some.

I find self-proclaimed connoisseur reviews that speak breathlessly of “fantastic tangy flavor, that evolves through wine and blue cheese to almost too sharp citrus” faintly ridiculous at best, and more than a little unappealing in how they are obviously patterned on wine snobs. That said, Porcelana is definitely a superlative chocolate. I don’t think I will be feasting regularly on it, due to the price, but it is certainly worth trying on special occasions.

Keyspan USB Server review

I saw the Keyspan USB Server at MacWorld SF a few months ago, but it has only recently started to ship (I received mine yesterday). This device allows you to connect a Mac or PC to up to 4 USB 1.1 peripherals remotely over Ethernet, much as a print server allows you to access remote printers. It also allows sharing of USB devices between multiple computers.

I use it to reduce clutter in my apartment by moving away bulky items like my HP 7660 printer and my Epson 3170 scanner away from the iMac in my living room, which has progressively become my main computer, even though it is probably the slowest machine I have.

You install the driver software (Windows 2000/XP or Mac OS X, no drivers for Linux so far), and it creates a simulated USB hub device that takes care of bridging the USB requests over Ethernet. There is a management program that allows you to configure the settings on the USB Server such as the IP address (zeroconf, a.k.a RendezVous is supported, a nice touch), password and access mode. The user interface is functional, if not perfectly polished. To use a USB peripheral hooked to the USB server, you fire up the admin client, select one of the USB devices and take a “lease” on it. I have links to some screen shots of the GUI below:

The process is as smooth as it can possibly be, given that USB devices are not designed to be shared between multiple hosts, and thus some form of locking had to be provided. I tried my scanner over the Ethernet, and have not noticed any perceptible degradation in performance. The software copes with sleep mode correctly. The only nit I would have to pick is that the power adapter “wall wart” DC connector slips off the device too easily (not enough friction to hold it in place), disconnecting it.

Many families are becoming multi-computer households. The Keyspan USB Server is a surprisingly effective way to share peripherals or to move bulky and seldom used peripherals out of the way. At a street price of around $100, it is not inexpensive, but I found it a very worthwhile accessory for my home network.