Fazal Majid's low-intensity blog

Sporadic pontification

Fazal Fazal

Are Americans becoming second-class consumers?

I keep noticing with dismay that many of the gadgets I consider for purchase are deliberately crippled in their US versions. It used to be only European consumers had to suffer from inflated prices and reduced functionality, usually self-inflicted due to bureaucratic EU mandates like the DV-In fiasco (most DV camcorders in Europe have digital IEEE1394/Firewire/iLink video out but not digital video in, as otherwise they would be classified as VCRs and be subject to various protectionist customs duties).

  • Sony’s PEG-TH55 PDA has integrated WiFi and Bluetooth worldwide, except in the US where Bluetooth is omitted. This is incredibly annoying and rules the device out for me (unless I import one from the UK or Germany), as I have discovered from practical experience with my PEG-UX50 that WiFi access points are seldom available when you need them, and I often have to fall back to GPRS via Bluetooth. We are already saddled with the industrialized world’s worst mobile telephone operators and clunkiest phones, why add injury to insult?
  • Canon’s new Digital Rebel DSLR is available in a kit with a 18-55mm lens. The lens has the smooth and fast USM ultrasonic motor in Japan, but uses the inferior AFD micro-motor in the US. Perhaps they believe US customers are too clueless to notice the difference.
  • Many ultra-slim laptops available in Japan are never introduced in the US (this has created a market opportunity for parallel importers like Dynamism. Once again, the gaijin must lack the refined aesthetic sensibility to appreciate models like the Sony Vaio X505 and are probably content to lug their boat anchor laptops in their gas-guzzling SUVs. Nor is this attitude limited to Japanese companies – until recently IBM had an entire line of ultra-compact notebook computers available only in Japan.
  • Epson’s Stylus Photo 2200, probably the favorite printer of professional photographers, does not include in the US the gray balancer, special software and calibration sheets used to improve the neutrality of black and white prints. Michael Reichmann puts it best when he calls this “The software that Epson North America thinks its customers are too dumb to use”.

The US is the world’s single largest market for consumer goods. Why is it treated with such disregard?

Update (2004-05-12):

Sony is relenting and will officially release the Vaio X505 in the US, albeit for the princely sum of $3000.

Information Lifecycle Management and the cost of forgetfulness

Maxwell’s demon is a classic thought experiment that illustrates the second law of thermodynamics. The conundrum drove Ludwig Boltzmann to suicide. Leo Szilard, a contemporary and friend of Einstein, and one of the first proponents of the atomic bomb, provided the first refutation in 1929 – Maxwell’s demon appears to create energy from scratch, but what it is really doing is transferring entropy to the outside world.

In his analysis, Szilard considered alternative demons that would overcome his objection, and for one of them, now known as the Szilard Engine, his interesting conclusion is that it cannot work because forgetting information from memory in itself incurs thermodynamic costs. To make a real-world analogy – you may pay to get information in the form of your daily newspaper, but disposing of all that paper also incurs real costs in the form of garbage hauling taxes, even if you are not aware of them. In the cosmic order, getting rid of data is as important as acquiring it in the first place.

One of the buzzwords of the day in IT is Information Lifecycle Management, This basically means using a fancy database to track information assets, how they are stored, backed up and disposed of in accordance to retention policies and various legislative mandates like the Sarbanes-Oxley law. Companies like Microsoft discovered to their dismay the consequences of having incriminating information dragged into court under subpoena.

It seems the price of forgetfulness is eternal vigilance…

A side note – one of the things that seems consistently forgotten whenever designing a database is archiving and deleting old historical data – the data just keeps accumulating, usually until the database becomes obsolete and is decommissioned or the original designers have moved on to other jobs. In large scale databases, the efficient archiving of data requires partitioning, and is several orders of magnitude harder if the partitioning was poorly designed in the original data model. For instance, if some classers of historical data have to be held for longer retention period than others, make sure they are stored in different partitions as well, otherwise separating them will require lengthy batches. If you are specifying a database today, for your successors’ sake, plan for the orderly disposal of data once it is no longer relevant.

Misremembering the Alamo

Starting tomorrow, the silver screens will be afflicted with a Disney mega-production on the Alamo. Presumably, jingoism will be slathered in the tasteful way one can expect from Michael Eisner’s firm. In an apparent bow to political correctness, however, the Tejanos (the original Mexican settlers of Texas) will be shown supporting the rebels (an act they still rue to this day, as they were later driven out of their lands by the Anglo settlers).

The Alamo is an illustration of the starkly diverging memories of Anglo and Hispanic Texans. The Senegalese poet (and later president) Leopold Sédar Senghor’s ironic poem “Nos ancêtres les Gaulois” relates how French colonial schools in his country tried to teach African schoolchildren they were descended from Celtic Gauls, and beyond that the intrinsic absurdity of the colonial project. It seems Texas is not far behind, and I would be interested in knowing how many people cheer for Santa Anna’s army when the movie screens.

In what may be a coincidence, the supposedly respectable academic and media darling Samuel Huntington penned a viciously anti-Hispanic screed that just drips with the smug contempt of the self-described Anglo-Protestant. In his opinion, Mexican immigrants are not assimilating and are a future fifth column that threatens the integrity of the country. Other know-nothing demagogues said much the same thing about earlier waves of German or Irish immigration. I am not sure which regrettable trait of Mexican-Americans Professor Huntington finds most loathsome, the fact they are not Anglo or that they are not Protestants…

For all the brouhaha, one thing is seldom mentioned. According to one of my Texan cousins, the Texas War of Rebellion (1835–1836) was primarily waged to defend slavery, as Santa Anna had just extended, in a dictatorial act of oppression, the Mexican ban on slavery to Texas. One can only conclude the mythology surrounding the Alamo is merely a successful version of what Southern revisionists are trying to achieve, i.e. transmogrifying slavers into noble defenders of freedom.

Tracing telephone number prefixes

I recently had a project where I needed to find out what telco served users based on their phone number (US only). Area code tables are a dime a dozen, but only give you the state, and I needed finer granularity than that, including the ability to drill down to the first three digits of the local phone number, for a total of 6 digits known in industry parlance as the NPA-NXX.

The solution I found is to go straight to the source: the website of the company tht administers the North American numbering plan on behalf of the FCC (the NANP actually covers more than the US, including Canada and some Caribbean countries, but the registrar I am refering to only covers the US).

They have a very convenient page with downloadable tables of NPA-NXX to carrier assignments. As an example, here is the entry for my home phone number:

Entry for 415-359-0918
StateNPA-NXXOCNCompany name Rate centerSwitchUse
CA415-3599740Pacific Bell SFNC CNTRLSNFCCA12DS0Assigned

OCN is the operating company number, a numeric code they use for carriers. The “rate center” (usually the city or town name) is unfortunately encoded using the proprietary Telcordia Common Language standards rather than in plain English, and you need to pay a license fee to get that database. The carrier name also varies wildly. Pac Bell has been fully subsumed under the SBC brand name, but the old identity still linger in these tables (Verizon, in contrast, has been much more diligent at having these tables updated, even when the pre-merger name is still mentioned).

With number portability, specially the forthcoming wired to wireless number portability required by the FCC, this information will be less useful as you will be able to have, say a New York phone number but actually be in Tokyo using a Vonage VoIP box (possibly even one with a ported number), but it is still a useful resource that is not widely known.

Update (2004-10-25):

This page is the most popular Google search on my website. If all you want is to look up a phone number and don’t want to go through the hassle of importing all the NANPA tables, there are a number of NPA-NXX search pages available on the web.

The Temboz RSS aggregator

2013-03-14: Google’s announcement that their Reader service will be discontinued has spurred interest in Temboz. This software is not dead, in fact I use it daily, but have not made an official release in a long time. You should use the version from Github instead. There are currently a number of bugs which can lead to Temboz locking up and requiring a restart. I am planning on completing my long overdue overhaul before Google’s July deadline.

Contents

Introduction

Temboz is a RSS aggregator. It is inspired by FeedOnFeeds (web-based personal aggregator), Google News (two column layout) and TiVo (thumbs up and down). I have been using FeedOnFeeds for some time now, but that software seems to have stopped evolving, and I had a number of optimizations to the user experience I wanted to make.

Features

Already implemented:

  • Multithreaded, download feeds in parallel.
  • Built-in web server.
  • Two-column user interface for better readability and information density. Automatic reflow using CSS.
  • Ratings system for articles
  • Real-time hunter-gatherer user interface: items flagged with a “Thumbs down” disappear immediately off the screen (using Dynamic HTML), making room for new articles. No laborious flagging of items as in FeedOnFeeds.
  • Filtering entries (using Python syntax, e.g. ‘Salon’ in feed_title and title == “King Kaufman’s Sports Daily”, or simply by selecting keywords/phrases and hitting “Thumbs down”).
  • Ability to generate a RSS feeds from “Thumbs Up” articles, which is why Temboz would be a true aggregator, not just a reader.
  • Ad filtering
  • Automatic garbage collection: every day between 3AM and 4AM, uninteresting articles (by default those older than 7 days) are purged of their contents (but not metadata such as titles, permalinks or timestamps) to keep the database size manageable. After 6 months (by default), they are deleted altogether
  • Automatic database backups daily (immediately after garbage collection)

On the to do list:

  • Write better documentation
  • Handle permanent HTTP redirects for feed XML URLs
  • Automatic pacing of feed polling intervals using the average and standard deviation of observed feed item inter-arrival times, to reduce bandwidth usage and load for both client and server. Most feeds should be polled on a daily rather than hourly interval (e.g. my own, since I update once a week on average), but the mechanisms for a feed to indicate its polling rate preferences are quite inconsistent from one flavor of RSS/Atom to another.
  • “Survivor mode” – vote feeds that no longer perform off the aggregator based on relevance statistics.
  • Ability to cluster together articles (I tried a heuristic of looking for common URLs they are all pointing to, but this didn’t work well in practice).
  • Portability to Windows, distribution as a standalone package.

History

I have been using it successfully for well over a year. It still has rough edges, with some administration functions only doable using the SQLite command-line utility. Here is a screen shot showing the reader user interface. The article highlighted in yellow was given a “Thumbs Up”. You can also see the user interface at work in a view of the last 50 articles I flagged as “thumbs up” among the feeds I read.

Screen shots

Click on a screen shot thumbnail for a full-sized version

The first screen shot shows the article reading interface, using a two-column layout. Clicking on the “Thumbs down” icon makes the article disappear, bringing a new one in its place (if available). Clicking on the “Thumbs up” icon highlights it in yello and flags it as interesting in the database.

view itemsThe feed summary page shows statistics on feeds, starting with feeds with unread articles, then by alphabetical order. Feeds can be sorted based on other metrics. You have the option of “catching up” with a feed (marking all the articles as read). Feeds with errors are highlighted in red (not shown).

view feedsClicking on the “details” link for a feed brings this page, which allows you to change title or feed URL, and shows the RSS or Atom fields accessible for filtering.

feed detailsFeeds can be filtered using Python expressions.

filtering rules

Known bugs

You can check outstanding bug reports, change requests and more at the public CVStrac site.

Credits

Temboz is written in Python, and leverages Mark Pilgrim’s Ultra-liberal feed parser, SQLite 2.x, Cheetah.

Download

You can download the current version: temboz-0.8.tar.gz I welcome any feedback you may have, specially as concerns improving installation.

The CVS version is far ahead of 0.8 in features. I have not yet had the time to test and document the migration procedure from 0.8 to 1.0, but if you are a new Temboz user I strongly advise you to get a nightly CVS snapshot instead (they are what I run on my own server): temboz-CVS.tar.gz or temboz-CVS.zip.

Updates

For news on Temboz, please subscribe to the RSS feed.

Temboz has a CVStrac where you can submit bug reports or change requests, and a Wiki, where all future documentation will ultimately reside.

Post scriptum

The name “Temboz” is a reference to Malima Temboz, “The mountain that walks”, an elephant whose tormented spirit is the object of Mike Resnick’s excellent SF novel, Ivory.