Fazal Majid's low-intensity blog

Sporadic pontification

Fazal Fazal

Threadframe: multithreaded stack frame extraction for Python

Note: threadframe is obsolete. Python 2.5 and later include a function sys._current_frames() that does the same thing. Threadframe is only useful for Python 2.2 through 2.4.

Rationale

I was encountering deadlocks in a multi-threaded CORBA server (implemented using omniORB). Debugging using GDB gave me too low-level information, and what I needed was an equivalent of the GDB command “info threads”. There was no such facility available from within Python’s standard library, so I rolled my own.

David Beazley added advanced debugging functions to the Python interpreter, and they have been folded into the 2.2 release.

I used these hooks to build a debugging module that is useful when you are looking for deadlocks in a multithreaded application. It basically has a single function that will return a list of the stack frames for all Python interpreter threads in the process.

Guido van Rossum added in Python 2.3 the thread ID to the interpreter state structure, and this allows us to produce a dictionary mapping thread IDs to frames.

This functionality is now integrated in Python 2.5’s batteries-included sys._current_frames() function.

Of course, I disclaim any liability if this code should crash your system, erase your homework, eat your dog (who also ate your homework) or otherwise have any undesirable effect.

Building and installing

Python 2.2 or later is required. Thread ID to frame dictionary extraction is only available in Python 2.3 and later, and will generate a NotImplementedError if used from 2.2.

Download the source tarball threadframe-0.2.tar.gz. You can use the Makefile or directly with the setup.py script. I have built and tested this only on Solaris 8/x86 and Windows 2000, but the code should be pretty portable. There is a small test program test.py that illustrates how to use this module to dump stack frames of all the Python interpreter threads. A sample run is available for your perusal.

For Windows users, I have available pre-compiled binaries, built using Mingw32 and GCC 2.95.2. Just copy the file threadframe.pyd in any location in your Python path and you should be able to run the test script test.py.

Windows binaries
Python versionDownload
2.2.1 threadframe.pyd
2.3.4 threadframe.pyd
2.4.x threadframe.pyd

License

This code is licensed under the same terms as Python itself.

Change history

Release 0.2 (2004-06-10)

Distutils based setup.py contributed by Bob Ippolito. Bob also noticed that thread_id was added to the Python interpreter state, and contributed a patch to get a dictionary mapping thread_ids to frames instead of a list.

Release 0.1 (2002-10-11)

Initial release for Python 2.2: threadframe-0.1.tar.gz

What’s missing in the Airport Express?

Apple Airport ExpressApple introduced the Airport Express today, surprising observers who expected product announcements to be on hold until the WWDC conference in San Francisco later this month. Apple-watching is a surprise-fraught art not unlike Kremlinology used to be, with the added risk of cease-and-desist letters by the notoriously secretive and litigious company.

The Airport Express is a compact little wireless network in a box, offering an IEEE 802.11g WiFi access point cum router, an Ethernet port, an audio port to stream audio (interestingly, it supports both conventional electrical line-level output as well as Toslink optical in the same jack), and a USB port to allow printer sharing (no word on whether this also allows scanner sharing the way Keyspan’s USB server does).

This unit replaces 2 or 3 boxes (and their associated wall warts), is relatively inexpensive at $129, and will no doubt become as popular and widely (yet poorly) imitated as the iPod was in its day, specially given it can be used by Windows PCs. If I did not already have a Slimdevices Squeezebox (with beta support for the Apple lossless encoder), I might have be tempted, in spite of the lack of a display or remote control.

I am not all that fond of the wall-wart concept, but the plug can be removed and replaced with a standard IEC-320-C7 cable (which can certainly be found far cheaper than the ridiculously expensive $39 Apple charges for them), or even powered from Ethernet using the new power-over-Ethernet standard 802.3af (the USB port is disabled in that case), a nice touch that exemplifies Apple’s attention to detail. As a side note for those of you who have a hard time coping with wall warts, I highly recommend the Power Strip Liberator Plus, a simple but highly effective solution to the problem of clogged power strips.

That said, there is one port missing, one that would have turned the Airport Express from a well-designed piece of electronics into a visionary product: a phone jack. A RJ-11 jack that can be plugged into a phone line (FXO) or into which a phone can be plugged (FXS) would bridge one of the few remaining domains not covered by Apple’s digital hub (the other one being TV). With iTunes AV, Apple has a very capable Voice over IP (VoIP) client, but no way to interface it to legacy POTS (Plain Old Telephone System) networks. I am not sure if this is deliberate and if they want to introduce this as a value-added feature to their .Mac Internet services suite, but Apple has lacked decent telephony product since the introduction of the Geoport ten years ago.

It should be straightforward to add telephony software to a Mac and have it able to act as an intelligent voice-mail or IVR system (forwarding voice mails via email the way Panther’s Fax feature can with faxes). Computer-Telephony Integration, widespread in the PC world, is also an essential feature for many enterprise applications (think call centers or CRM). Many small businesses use Macs because they cannot afford full-time IT staff to baby-sit Windows machines. Offering them an integrated telephony solution would be a very attractive proposition.

Etienne Guittard Soleil d’Or

Guittard Soleil d'OrGhirardelli is the best-known chocolate maker from San Francisco, but by no means the only one. The Bay Area is very serious about food, and boasts many fine chocolatiers such as Guittard, Scharffen-Berger, Joseph Schmidt, and Michael Recchiuti, all of which uphold a much higher standard of quality than Ghirardelli (while not inedible dreck like Hershey’s, Ghirardelli is over-sweet and fairly lackluster).

Guittard is not as well known, as they used not to sell retail (their chocolate is used, among others, by See’s Candies and Boudin Bakery, and I once had a wonderful cherry and Guittard chocolate cake at Eno in Atlanta). This changed when they recently introduced a line of premium chocolates, named after the firms’s French founder, Etienne Guittard.

They probably don’t have an extensive distribution network yet, but their products are starting to trickle into finer San Francisco groceries like my neighborhood one, Lebeau Nob Hill Market (“People in the Know / Shop at Lebeau”).

Guittard new packagingI bought a 500g box of their “Soleil d’Or” milk chocolate, packaged as a box of “wafers” (little quarter-sized pieces reminiscent of Droste Pastilles). In this form, it is intended for cooking, but the bite-sized wafers are also perfect for snacking. It has a relatively high cocoa content for milk chocolate (38%, the usual is more like 32%), which gives it a satisfying taste that lingers in the mouth. This chocolate is also well balanced, it does not have the malty harshness of Scharffen-Berger milk chocolate or the milky aftertaste of Valrhona “Le Lacté”. In fact, it comes close to my personal favorite, Michel Cluizel “Grand Lait Java”, no small achievement, specially when you consider the difference in cocoa content (38% vs. 50%) and the price difference ($9 for a 500g box vs. $5 for a 100g tablet).

Update (2004-12-30):

Guittard updated their packaging (shown right). The newer one is more classy and eschews the pretentious “Soleil d’Or” and “Collection Etienne” labels, but the chocolate itself is unchanged. The box is also slightly lighter (1lb or 454g vs. 500g for the older one, i.e. a 10% price increase…), but at $9.99/lb, you are still paying Lindt prices for near Cluizel quality

Networked storage on the cheap

As hard drives get denser, the cost of raw storage is getting ridiculously cheap – well under a dollar per gigabye as I write. The cost of managed storage, however, is an entirely different story.

Managed storage is the kind required for “enterprise applications”, i.e. when money is involved. It builds on raw storage by adding redundancy, the ability to hot-swap drives, to add capacity without disruption. In the higher-end of the market, additional manageability features include fault tolerance, the ability to take “snapshots” of data for backup purposes, and to mirror data remotely for disaster recovery purposes.

Traditionally, managed storage has been more expensive than raw disk by a factor of at least two, sometimes even an order of magnitude or more. When I started my company in 2000, for instance, we paid $300,000, almost half of our initial capital investment, for a pair of clustered Network Appliance F760 filers, with a total disk capacity of 600GB or so ($500/GB, when disk drives would cost $10/GB at the time). The investment was well worth it, as these machines have proven remarkably reliable, and the Netapps’ instant snapshot capability is vital for us, as it allows us to take instantaneous snapshots of our Oracle databases, which we can then back up in a leisurely backup window, without having to keep Oracle in the performance-sapping backup mode during that time.

Web serving workloads and the like can easily be distributed across farms of inexpensive rackmount x86 servers, an architecture pioneered by ISPs. Midrange servers (up to 4 processors), pretty much commodities nowaday, are adequate for all but the very highest transaction volume databases. Storage and databases are the backbone of any information system, however, and a CIO cannot afford to take any risks with them, that is why storage represents such a high proportion of hardware costs for most IT departments, and why specialists like EMC have the highest profit margins in the industry.

Most managed storage is networked, i.e. does not consist of hard drives directly attached to a server, but instead of disks attached to a specialized storage appliance connected to the server with a fast interconnect. There are two schools:

  • Network-Attached Storage (NAS), like our Netapps, that basically serve act as network file servers using common protocols like NFS (for UNIX) and SMB (for Windows). These are more often used for midrange applications and unstructured data, and connect using inexpensive Ethernet (Gigabit Ethernet, in our case) networks every network administrator is familiar with. NAS are available for home or small office use, at prices of $500 and up.
  • Storage Area Networks (SAN) offer a block-level interface (they behave like virtual hard drives that serve fixed-size blocks of data, without any understanding of what is in them). They currently use Fibre Channel, a fast and low latency interconnect, that is unfortunately also terribly expensive (FC switches are over ten times more expensive than equivalent Gigabit Ethernet gear). The cost of setting up a SAN usually limits them to high-end, mainframe-class data centers. Exotic cluster filesystems or databases like Oracle RAC need to be used if multiple servers are going to access the same data.

One logical way to lower the cost of SANs is to use inexpensive Ethernet connectivity. This was recently standardized as iSCSI, which is essentially SCSI running on top of TCP/IP. I recently became aware of Ximeta, a company that makes external drives that apparently implement iSCSI, at a price that is very close to that of raw disks (since iSCSI does not have to manage state for clients the way a more featured NAS does, Ximeta can shun expensive CPUs and RAM, and use a dedicated ASIC instead).

The Ximeta hardware is not a complete solution, and the driver software manages the metadata for the cluster of networked drives, such as the information that allows multiple drives to be concatenated to add capacity while keeping the illusion of a single virtual disk. The driver is also responsible for RAID, although Windows, Mac OS X and Linux all have volume managers capable of this. There are apparently some Windows-only provisions to allow multiple computers to share a drive, but I doubt they constitute a full-blown clustered filesystem. There are very few real-world cases in the target market where anything more than a cold standby is required, and it makes a lot more sense to designate one machine to share a drive for the others in the network.

I think this technology is very interesting and has the potential to finally make SANs affordable for small businesses, as well as for individuals (imagine extending the capacity of a TiVo by simply adding networked drives in a stack). Disk-to-disk Backups are replacing sluggish and relatively low-capacity tape drives, and these devices are interesting for that purpose as well.

The classical music lover’s iPod

Sony’s Norio Ohga is a classically trained musician and conductor. In contrast, Steve Jobs is clearly not a classical music lover (and indeed is reportedly partially deaf). If he were a classical aficionado, the iPod would not be as poorly designed for classical music.

I have started backing up my extensive CD collection (99% classical) using the new Apple Lossless Encoder, and switched from my original 5GB iPod (which does not support ALE) to a new 15GB model, with half the upgrade paid for with my universal upgrade plan. I had actually started with straight uncompressed PCM audio, but while the old iPod nominally supported it, its hard drive or buffering algorithm would have a hard time keeping up the 1.5 Mbps flow of data required and often skip. I only used my old iPod on flights, where the ambient noise would drown out the low quality of MP3s, but the new one is a better device, specially when coupled with high-quality earphones from Etymotic Research, and I may use it more regularly.

The simplistic Artist/Album/Song schema is completely inadequate for classical music, where you need Composer/Performer/Album/Opus/Movement. This can be kludged by dropping Album and using it for the Opus instead, and fortunately in recent versions of iTunes and the iPod software, there is a field for Composer (which wasn’t there when the iPod was first released). The Gracenote online database CDDB is not normalized in any way, and rekeying the metadata is actually the most time-consuming part of the whole process (we have Sony and Philips to thank for this monumental oversight in the CD format).

Even then there still are flaws. If you have two different interpretations of the same piece by different performers, the iPod will interleave the tracks from both of them, so you have to add numbers to the Album field (used for Opus) to distinguish between them. If you use the “keep organized” option, folders are named after the artist rather than the composer, which is rather inconvenient and illogical. It could be worse: earlier versions of the iPod would actualy force pauses between tracks, which basically sabotaged dramatic transitions, like the one in J. S. Bach’s Magnificat between the aria “Quia respexit” and the thundering chorus of “Omnes Generationes”.

In passing, I have to tip my hat to Apple for pulling off one of the greatest scams in consumer history since the Bell Telephone company made people accept time-based billing for telephone use. Compare the 99 cents you pay for an individual track on the iTunes Music Store with the $18.99 list price for a Compact Disc (in most classical albums, you want the whole album, not just some stray bits of goodness in a sea of force-bundled filler material).

Instead of a high-quality 16-bit 44.1kHz PCM audio stream (or even better if you use one of the competing multichannel high-resolution formats SACD or DVD-Audio, although the difference is very subtle), you are paying for low-quality AAC files (I am the first one to admit that for most pop music, adding noise or distortion actually improves the signal to noise ratio). You also receive the dubious benefits of Digital Rights Management (i.e. they infringe on your fair use rights to protect the record industry cartel’s and get their acquiescence). No booklet, no durable storage medium, no possibility of resale.

Update (2004-06-06):

The iPod interface also seems to be poorly internationalized, unlike iTunes. It mangles the names of Antonín Dvořák or Bohuslav Martinů, but oddly enough not those of Camille Saint-Saëns or Béla Bartók.