Fazal Majid's low-intensity blog

Sporadic pontification

Fazal Fazal

Data mining Outlook for fun and profit

For a few years now, I have owned the domain name majid.fm. Dot-fm stands for the Federated States of Micronesia, a micro-state in the Pacific Ocean, and they market their domain names to FM radio stations. Those are also my initials. Unfortunately, the registration fees are quite expensive ($200 every two years), and the domain is redundant now that I have acquired majid.info and majid.org (majid.com is reserved by a Malaysian cybersquatter who is demanding a couple thousand dollars for it – I may be vain, but not that vain). I have decided to let the domain lapse when it expires on April 1st.

I used the majid-dot-FM domain for my emails, and set it up so emails sent to anything @majid.fm would be sent to my primary mailbox fazal@majid.fm. For instance, if I registered with Dell, I would give them the email address dell@majid.fm. This was helpful in tracing where I got my email from, and blacklisting companies that started spamming me (they shall remain nameless to protect the guilty yet litigious).

Unfortunately, spammers and some worms attempt dictionary attacks by trying all possible combinations like jim@majid.fm, smith@majid.fm, and so on. My spam filter would catch some, but not all of them, and it would be a terrible hassle. I do not want to have an auto-responder send emails back to people who email me at the old address, as this would at best flood innocent people whose addresses spammers are impersonating, and at worst actually give my new address to the spammers.

My solution to this dilemma is to produce a Python script that scans through all the emails in my Outlook personal folder (PST) files of archived emails, flag all those who sent me an email, and them manually send them a change of address notification (or in the case of websites and online stores, update my contact info online).

Simply using Outlook’s advanced search function will not work, as in many cases the To: header is set to something other than the address the email is delivered to, such as undisclosed-recipients, or the sender’s address when they send the email to multiple Bcc: recipients (the proper way to proceed when you want to send an email to multiple recipients without giving everyone in the list the email addresses of the other recipients). I actually have to sift through the raw message headers to see the envelope destination address.

Here is a simplified version of olmine.py, the script I used. It requires Python 2.x with the win32all extensions, and Outlook 2000 with the Collaboration Data Objects (CDO) option installed (this is not the default). CDO is required to access the full headers. Of course, this script can be useful for all sorts of social network analysis fun on your own Outlook files, or more prosaically to generate a whitelist of email addresses for your spam filter.

import re, win32com.client

srcs = {}
dsts = {}
pairs = {}

# regular expression that scans for valid email addresses in the headers
m_re = re.compile(r'[-A-Za-z0-9.,_]*@majid\.fm')
# regular expression that strips out headers that can cause false positives
strip_re = re.compile(r'(Message-Id:.*$|In-Reply-To:.*$|References:.*$)',
                      re.IGNORECASE | re.MULTILINE)

def dump_folder(folder):
  """Iterate recursively over the given folder and its subfolders"""
  print '-' * 72
  print folder.Name
  print '-' * 72
  for i in range(1, folder.Messages.Count + 1):
    try:
      # PR_SENDER_EMAIL_ADDRESS
      _from = folder.Messages[i].Fields[0x0C1F001F].Value
      # PR_TRANSPORT_MESSAGE_HEADERS
      headers = folder.Messages[i].Fields[0x7d001e].Value
    except:
      # ignore non-email objects like contacts or calendar entries
      continue
    stripped_headers = strip_re.sub('', headers)
    for _to in m_re.findall(stripped_headers):
      srcs[_from] = srcs.get(_from, 0) + 1
      dsts[_to] = dsts.get(_to, 0) + 1
      if (_from, _to) not in pairs:
        print _from, '->', _to
      pairs[_from, _to] = pairs.get((_from, _to), 0) + 1
  # recurse
  for i in range(1, folder.Folders.Count + 1):
    dump_folder(folder.Folders[i])

# connect to Outlook via CDO
cdo = win32com.client.Dispatch('MAPI.Session')
cdo.Logon()
# iterate over all the open PST files
for i in range(1, cdo.InfoStores.Count + 1):
  store = cdo.InfoStores[i]
  root = store.RootFolder
  m = root.Messages
  store.ID
  print '#' * 72
  print store.Name
  print '#' * 72
  dump_folder(root)
cdo.Logoff()

Stationery pattern

I found myself buying quite a bit of stationery recently. The nice thing is, even premium stationery lines are cheap compared to computers or photography, my other capital-intensive hobbies. A top of the line solid sterling silver pen like the Waterman Edson LE will not break the $1000 barrier, when you can’t even buy a handbag for that price from most luxury brands.

I spent four years in a Catholic school in Versailles, a very conservative city, where we were not allowed to use ball point pens because they deform handwriting. I used Sheaffer pens back then, and all the way to college. When I started working, I splurged on an Edson Blue, but what with computers and email, seldom got to use it.

There is a kind of fashion phenomenon brewing around Moleskine notebooks and journals, including an aficionado website (of which I was a charter contributor). Their Italian manufacturer concocted a clever marketing campaign to give them a cosmopolitan aura of travel mystique, branding them as “The Legendary Notebook as used by Van Gogh, Chatwin, Hemingway, Matisse and Céline” (to which my obvious reaction was, Chatwin who?). These notebooks have thin yet rigid covers, a pocket for clippings in the back, and an elastic band to keep them closed. They are decently made, but not exceptionally so. Just holding one in your hands makes you want to write in them, in a purely emotional way, as described in a recent book by Don Norman.

After the bug bit me, I bought in quick order:

  • A pair of Faber-Castell mechanical pencils that use thick 1.4mm leads and just glide on paper
  • Some sets of Crane’s paper. It is made from 100% cotton rag and they are the official supplier for US currency.
  • A pad of G. Lalo “Vergé de France” laid paper (paper that has a horizontally striped watermark, for a striking yet classy finish). This paper responds exceptionally well to fountain pens.
  • A Pelikan Souverän 800 fountain pen in classic green Stresemann stripes, with a broad nib for bold modulated strokes. My grandfather used to own a pen like this one, and they are the equivalents of Montblanc pens in quality, at a much more reasonable price.
  • Inks by J. Herbin, a company that has been operating since 1670. I am particularly enamored of their black (Perle des encres) and meadow-green (Vert pré) inks, the latter is the color I am now using for the navigation and date headings on this site.
  • A handsome journal by local company Oberon Design, with a richly detailed cover hand-tooled from leather in a Celtic knotwork pattern (St Patrick’s day is just around the corner…) and matching pewter accents. Many of their designs are breathtaking, like the oak tree on their home page.

Now, I just have to muster the inspiration to make use of all this…

Update (2004-03-21):

I may have to revise my judgement about “inexpensive”. Last Friday, I stopped by at a local Montblanc store. The saleswoman tried to interest me in a J. P. Morgan Limited Edition fountain pen, for a mere $1850… Her closing argument? “It will sell out soon”. Given the slightly hysterical nature of Montblanc collectors (and the fact it is the default brand for rich people who are not all that knowledgeable about pens, much as Rolex is for watches), she may well be right.

The megapixel myth – a pixel too far?

Revised introduction

This article remains popular thanks to Google and the like, but it was written 7 years ago and the models described are ancient history. The general principles remain, you are often better off with a camera that has fewer but better quality pixels, though the sweet spot shifts with each successive generation. The more reputable camera makers have started to step back from the counterproductive megapixel race, and the buying public is starting to wise up, but this article remains largely valid.

My current recommendations are:

  • Dispense with entry-level point and shoot cameras. They are barely better than your cameraphone
  • If you must have a pocketable camera with a zoom lens, get the Canon S95, Panasonic LX5, Samsung TL500 or Olympus XZ-1. Be prepared to pay about $400 for the privilege.
  • Upping the budget to about $650 and accepting non-zoom lenses gives you significantly better optical and image quality, in cameras that are still pocketable like the Panasonic GF2, Olympus E-PL2, Samsung NX100, Ricoh GXR and Sony NEX-5.
  • The Sigma DP1x and DP2x offer stunning optics and image quality in a compact package, but are excruciatingly slow to autofocus. If you can deal with that, they are very rewarding.
  • The fixed-lens Fuji X100 (pretty much impossible to get for love or money these days, no thanks to the Sendai earthquake) and Leica X1 offer superlative optics, image and build quality in a still pocketable format. The X1 is my everyday-carry camera, and I have a X100 on order.
  • If size and weight is not an issue, DSLRs are the way to go in terms of flexibility and image quality, and are available for every budget. Models I recommend by increasing price range are the Pentax K-x, Canon Rebel T3i, Nikon D7000, Canon 5DmkII, Nikon D700 and Nikon D3S.
  • A special mention for the Leica M9. It is priced out of most people’s reach, and has poor low-light performance, but delivers outstanding results thanks to Leica lenses and its sensor devoid of anti-alias filters.

Introduction

As my family’s resident photo geek, I often get asked what camera to buy, specially now that most people are upgrading to digital. Almost invariably, the first question is “how many megapixels should I get?”. Unfortunately, it is not as simple as that, megapixels have become the photo industry’s equivalent of the personal computer industry’s megahertz myth, and in some cases this leads to counterproductive design decisions.

A digital photo is the output of a complex chain involving the lens, various filters and microlenses in front of the sensor, and the electronics and software that post-process the signals from the sensor to produce the image. The image quality is only as good as the weakest link in the chain. High quality lenses are expensive to manufacture, for instance, and often manufacturers skimp on them.

The problem with megapixels as a measure of camera performance is that not all pixels are born equal. No amount of pixels will compensate for a fuzzy lens, but even with a perfect lens, there are two factors that make the difference: noise and interpolation.

Noise

All electronic sensors introduce some measure of electronic noise, among others due to the random thermal motion of electrons. This shows itself as little colored flecks that give a grainy appearance to images (although the effect is quite different from film grain). The less noise, the better, obviously, and there are only so many ways to improve the signal to noise ratio:

  • Reduce noise by improving the process technology. Improvements in this area occur slowly, typically each process generation takes 12 to 18 months to appear.
  • Increase the signal by increasing the amount of light that strikes each sensor photosite. This can be done by using faster lenses or larger sensors with larger photosites. Or by only shooting photos in broad daylight where there are plenty of photons to go around.

Fast lenses are expensive to manufacture, specially fast zoom lenses (a Canon or Nikon 28-70mm f/2.8 zoom lens costs over $1000). Large sensors are more expensive to manufacture than small ones because you can fit fewer on a wafer of silicon, and as the likelihood of one being ruined by an errant grain of dust is higher, large sensors have lower yields. A sensor twice the die area might cost four times as much. A “full-frame” 36mm x 24mm sensor (the same size as 35mm film) stresses the limits of current technology (it has nearly 8 times the die size of the latest-generation “Prescott” Intel Pentium IV), which is why the full-frame Canon EOS 1Ds costs $8,000, and professional medium-format digital backs can easily reach $25,000 and higher.

This page illustrates the difference in size of the sensors on various consumer digital cameras compared to those on some high-end digital SLRs. Most compact digital cameras have tiny 1/1.8″ or 2/3″ sensors at best (these numbers are a legacy of TV camera tube ratings and do not have a relationship with sensor dimensions, see DPReview’s glossary entry on sensor sizes for an explanation).

For any given generation of cameras, the conclusion is clear – bigger pixels are better, they yield sharper, smoother images with more latitude for creative manipulation of depth of field. This is not true across generations, however, Canon’s EOS-10D has twice as many pixels as the two generations older EOS-D30 for a sensor of the same size, but it still manages to have lower noise thanks to improvements in Canon’s CMOS process.

The problem is, as most consumers fixate on megapixels, many camera manufacturers are deliberately cramming too many pixels in too little silicon real estate just to have megapixel ratings that look good on paper. Sony has introduced a 8 megapixel camera, the DSC-F828, that has a tiny 2/3″ sensor. The resulting photosites are 1/8 the size of those on the similarly priced 6 megapixel Canon Digital Rebel (EOS-D300), and 1/10 the size of those on the more expensive 8 megapixel DSLR Canon EOS-1D Mark II.

Predictably, the noise levels of the 828 are abysmal in anything but bright sunlight, just as a “150 Watts” ghetto blaster is incapable of reproducing the fine nuances of classical music. The lens also has its issues, for more details see the review. The Digital Rebel will yield far superior images in most circumstances, but naive purchasers could easily be swayed by the 2 extra megapixels into buying the inferior yet overpriced Sony product. Unfortunately, there is a Gresham’s law at work and manufacturers are racing to the bottom: Nikon and Canon have also introduced 8 megapixel cameras with tiny sensors pushed too far. You will notice that for some reason camera makers seldom show sample images taken in low available light…

Interpolation

Interpolation (along with its cousin, “digital zoom”) is the other way unscrupulous marketers lie about their cameras’ real performance. Fuji is the most egregious example with its “SuperCCD” sensor, that is arranged in diagonal lines of octagons rather than horizontal rows of rectangles. Fuji apparently feel this somehow gives them the right to double the pixel rating (i.e. a sensor with 6 million individual photosites is marketed as yielding 12 megapixel images). You can’t get something for nothing, this is done by guessing the values for the missing pixels using a mathematical technique named interpolation. This makes the the image look larger, but does not add any real detail. You are just wasting disk space storing redundant information. My first digital camera was from Fuji, but I refuse to have anything to do with their current line due to shenanigans like these.

Most cameras use so-called Bayer interpolation, where each sensor pixel has a red, green or blue filter in front of it (the exact proportions are actually 25%, 50% and 25% as the human eye is more sensitive to green). An interpolation algorithm reconstructs the three color values from adjoining pixels, thus invariably leading to a loss of sharpness and sometimes to color artifacts like moiré patterns. Thus, a “6 megapixel sensor” has in reality only 1.5-2 million true color pixels.

A company called Foveon makes a distinctive sensor that has three photosites stacked vertically in the same location, yielding more accurate colors and sharper images. Foveon originally took the high road and called their sensor with 3×3 million photosites a 3MP sensor, but unfortunately they were forced to align themselves with the misleading megapixel ratings used by Bayer sensors.

Zooms

A final factor to consider is the zoom range on the camera. Many midrange cameras come with a 10x zoom, which seems mighty attractive in terms of versatility, until you pause to consider the compromises inherent in a superzoom design. The wider the zoom range, the more aberrations and distortion there will be that degrade image quality, such as chromatic aberration (a.k.a. purple fringing), barrel or pincushion distortion, and generally lower resolution and sharpness, specially in the corners of the frame.

In addition, most superzooms have smaller apertures (two exceptions being the remarkable constant f/2.8 aperture 12x Leica zoom on the Panasonic DMC-FZ10 and the 28-200mm equivalent f/2.0-f/2.8 Carl Zeiss zoom on the Sony DSC-F828), which means less light hitting the sensor, and a lower signal to noise ratio.

A reader was asking me about the Canon G2 and the Minolta A1. The G2 is 2 years older than the A1, and has 4 million 9 square micron pixels, as opposed to 5 million 11 square micron sensors, and should thus yield lower image quality, but the G2’s 3x zoom lens is fully one stop faster than the A1’s 7x zoom (i.e. it lets twice as much light in), and that more than compensates for the smaller pixels and older sensor generation.

Recommendations

If there is a lesson in all this, it’s that unscrupulous marketers will always find a way to twist any simple metric of performance in misleading and sometimes even counterproductive ways.

My recommendation? As of this writing, get either:

  • An inexpensive (under $400, everything is relative) small sensor camera rated at 2 or 3 megapixels (any more will just increase noise levels to yield extra resolution that cannot in any case be exploited by the cheap lenses usually found on such cameras). Preferably, get one with a 2/3″ sensor (although it is becoming harder to find 3 megapixel cameras nowadays, most will be leftover stock using an older, noisier sensor manufacturing process).
  • Or save up for the $1000 or so that entry-level large-sensor DSLRs like the Canon EOS-300D or Nikon D70 will cost. The DSLRs will yield much better pictures including low-light situations at ISO 800.
  • Film is your only option today for decent low-light performance in a compact camera. Fuji Neopan 1600 in an Olympus Stylus Epic or a Contax T3 will allow you to take shots in available light without a flash, and spare you the “red-eyed deer caught in headlights” look most on-camera flashes yield.

Conclusion

Hopefully, as the technology matures, large sensors will migrate into the midrange and make it worthwhile. I for one would love to see a digital Contax T3 with a fast prime lens and a low-noise APS-size sensor. Until then, there is no point in getting anything in between – midrange digicams do not offer better image quality than the cheaper models, while at the same time being significantly costlier, bulkier and more complex to use. In fact, the megapixel rat race and the wide-ranging but slow zoom lenses that find their way on these cameras actually degrade their image quality over their cheaper brethren. Sometimes, more is less.

Updates

Update (2005-09-08):

It seems Sony has finally seen the light and is including a large sensor in the DSC-R1, the successor to the DSC-F828. Hopefully, this is the beginning of a trend.

Update (2006-07-25):

Large-sensor pocket digicams haven’t arrived yet, but if you want a compact camera that can take acceptable photos in relatively low-light situations, there is currently only one game in town, the Fuji F30, which actually has decent performance up to ISO 800. That is in large part because Fuji uses a 1/1.7″ sensor, instead of the nasty 1/2.5″ sensors that are now the rule.

Update (2007-03-22):

The Fuji F30 has been superseded since by the mostly identical F31fd and now the F40fd. I doubt the F40fd will match the F30/F31fd in high-ISO performance because it has two million unnecessary pixels crammed in the sensor, and indeed the maximum ISO rating was lowered, so the F31fd is probably the way to go, even though the F40 uses standard SD cards instead of the incredibly annoying proprietary Olympus-Fuji xD format.

Sigma has announced the DP-1, a compact camera with an APS-C size sensor and a fixed 28mm (equivalent) f/4 lens (wider and slower than I would like, but since it is a fixed focal lens, it should be sharper and have less distortion than a zoom). This is the first (relatively) compact digital camera with a decent sensor, which is also a true three-color Foveon sensor as cherry on the icing. I lost my Fuji F30 in a taxi, and this will be its replacement.

Update (2010-01-12):

We are now facing an embarrassment of riches.

  • Sigma built on the DP1 with the excellent DP2, a camera with superlative optics and sensor (albeit limited in high-ISO situations, but not worse than film) but hamstrung by excruciatingly slow autofocus and generally not very responsive. In other words, best used for static subjects.
  • Panasonic and Olympus were unable to make a significant dent in the Canon-Nikon duopoly in digital SLRs with their Four-Thirds system (with one third less surface than an APS-C sensor, they really should be called “Two-Thirds”). After that false start, they redesigned the system to eliminate the clearance required for a SLR mirror, leading to the Micro Four Thirds system. Olympus launched the retro-styled E-P1, followed by the E-P2, and Panasonic struck gold with its GF1, accompanied by a stellar 20mm f/1.7 lens (equivalent to 40mm f/1.7 in 35mm terms).
  • A resurgent Leica introduced the X1, the first pocket digicam with an APS-C sized sensor, essentially the same Sony sensor used in the Nikon D300. Extremely pricey, as usual with Leica. The relatively slow f/2.8 aperture means the advantage from its superior sensor compared to the Panasonic GF1 is negated by the GF1’s faster lens. The GF1 also has faster AF.
  • Ricoh introduced its curious interchangeable-camera camera, the GXR, one option being the A12 APS-C module with a 50mm f/2.5 equivalent lens. Unfortunately, it is not pocketable

According to Thom Hogan, Micro Four Thirds grabbed in a few months 11.5% of the market for interchangeable-lens cameras in Japan, something Pentax, Samsung and Sony have not managed despite years of trying. It’s probably just a matter of time before Canon and Nikon join the fray, after too long turning a deaf ear to the chorus of photographers like myself demanding a high-quality compact camera. As for myself, I have already voted with my feet, successively getting a Sigma DP1, Sigma DP2 and now a Panasonic GF1 with the 20mm f/1.7 pancake lens.

Update (2010-08-21):

I managed to score a Leica X1 last week from Camera West in Walnut Creek. Supplies are scarce and they usually cannot be found for love or money—many unscrupulous merchants are selling their limited stock on Amazon or eBay, at ridiculous (25%) markups over MSRP.

So far, I like it. It may not appear much smaller than the GF1 on paper, but in practice those few millimeters make a world of difference. The GF1 is a briefcase camera, not really a pocketable one, and I was subconsciously leaving at home most of the time. The X1 fits easily in any jacket pocket. It is also significantly lighter.

High ISO performance is significantly better than the GF1 – 1 to 1.5 stops. The lens is better than reported in technical reviews like DPReview’s—it exhibits curvature of field, which penalizes it in MTF tests.

The weak point in the X1 is its relatively mediocre AF performance. The GF1 uses a special sensor that reads out at 60fps, vs. 30fps for most conventional sensors (and probably even less for the Sony APS-C sensor used in the X1, possibly the same as in the Nikon D300). This doubles the AF speed of its contrast-detection algorithm over its competitors. Fuji recently introduced a special sensor that features on-chip phase-detection AF (the same kind used in DSLRs), let’s hope the technology spreads to other manufacturers.

Aspirin and history

There are instances of chemical discoveries having a major impact on world history. Dr. Chaim Weizman helped the British World War I war effort by inventing a method to produce acetone, a fundamental component of explosives in those days. The Germans had a near stranglehold on chemistry in those days, thanks to their pioneering chemists and large industrial groups like the IG Farben cartel. The grateful British rewarded him with the Balfour declaration, the foundation for the later establishment of the state of Israel, of which Weizman became the first president.

Sometimes, the link is more indirect. Aspirin was purified in 1897 by Felix Hoffmann, a chemist working for Bayer who was looking for a drug to relieve his arthritic father’s pains. He took salicylic acid, the active principle behind willow bark tea (an ancient remedy mentioned as far back as Hippocrates), and found a way to synthesize sufficiently pure acetylsalicylic acid, much less irritating for the stomach lining.

In those days, medicine was just entering the scientific age (the conservative medical profession had long defended its turf, trying to shut down interlopers like surgeons or the chemist Pasteur), and modern drugs were few and in between. A potent medicine like aspirin was a godsend and used too often as a panacea.

Generations of inbreeding had led most of the royal families of Europe to be affected by various genetic diseases, the most prominent being hemophilia, a lack of clotting factors in the blood that can cause victims to literally bleed to death from the slightest cut. The tsarevich, heir to the throne of Russia was one of those affected. His physicians prescribed aspirin, the wonder drug from the West. As aspirin is a blood thinner, this actually worsened the hapless boy’s condition.

Enter Rasputin, a charismatic monk, who advised the royal family to shun the impious potions of the western heretics and to adopt his brand of faith healing. Removing the aspirin treatment led to an improvement in the tsarevich’s condition, thus sealing Rasputin’s influence over the queen.

Many historians believe Rasputin’s influence was one of the factors leading to the weakening of the Russian monarchy, leading to its eventual overthrow in 1917, followed by the rise of communism there. Thus did an act of filial piety lead to the fall of an empire.

Sessions must die

Many e-commerce sites have session timeouts. Dawdle too long between the moment you enter the site and the moment you actually want to buy something, and you will be presented with an unpleasant message. The words “session timeout” will be there, drowned in a sea of technobabble, and you will have to restart from scratch. Using a bookmark will often have the same effect.

At this point, you may well be tempted to go shop elsewhere; indeed, it is the only principled response to such blatant contempt for customers. You will notice that successful sites like Amazon.com do not make their customers suffer such hassles – once you’re in, you are in, whether you have to take a lunch break or not. I don’t buy the security argument either – there is nothing sensitive about the contents of a cart, security belongs at checkout time, not browse time.

The reason why such crimes against usability are perpetrated is that business requirements too often take a back seat to technical expediency, paradoxically most often due to lack of technical competence. Many web development environments keep track of what you do on a website, the contents of your cart, and so on, in “sessions”, portions of memory that are set aside for this book-keeping purpose. They cannot be set aside forever, and must be purged to make room for new customers.

The tyro programmer will leave the default policy in place, which is to dump the session altogether and place the burden of recovering state on the customer. More experienced programmers will implement the session mechanism in a database so it can be kept almost indefinitely. In an era where disk space costs a dollar or two per gigabyte, and a desktop computer has enough processing power to crunch tens of thousands of transactions per minute, there is no justification for not doing so.