Mylos

Misremembering the Alamo

2004-04-04 Soapbox Mylos

Starting tomorrow, the silver screens will be afflicted with a Disney mega-production on the Alamo. Presumably, jingoism will be slathered in the tasteful way one can expect from Michael Eisner’s firm. In an apparent bow to political correctness, however, the Tejanos (the original Mexican settlers of Texas) will be shown supporting the rebels (an act they still rue to this day, as they were later driven out of their lands by the Anglo settlers).

The Alamo is an illustration of the starkly diverging memories of Anglo and Hispanic Texans. The Senegalese poet (and later president) Leopold Sédar Senghor’s ironic poem “Nos ancêtres les Gaulois” relates how French colonial schools in his country tried to teach African schoolchildren they were descended from Celtic Gauls, and beyond that the intrinsic absurdity of the colonial project. It seems Texas is not far behind, and I would be interested in knowing how many people cheer for Santa Anna’s army when the movie screens.

In what may be a coincidence, the supposedly respectable academic and media darling Samuel Huntington penned a viciously anti-Hispanic screed that just drips with the smug contempt of the self-described Anglo-Protestant. In his opinion, Mexican immigrants are not assimilating and are a future fifth column that threatens the integrity of the country. Other know-nothing demagogues said much the same thing about earlier waves of German or Irish immigration. I am not sure which regrettable trait of Mexican-Americans Professor Huntington finds most loathsome, the fact they are not Anglo or that they are not Protestants…

For all the brouhaha, one thing is seldom mentioned. According to one of my Texan cousins, the Texas War of Rebellion (1835–1836) was primarily waged to defend slavery, as Santa Anna had just extended, in a dictatorial act of oppression, the Mexican ban on slavery to Texas. One can only conclude the mythology surrounding the Alamo is merely a successful version of what Southern revisionists are trying to achieve, i.e. transmogrifying slavers into noble defenders of freedom.

Tracing telephone number prefixes

2004-04-04 IT Mylos

I recently had a project where I needed to find out what telco served users based on their phone number (US only). Area code tables are a dime a dozen, but only give you the state, and I needed finer granularity than that, including the ability to drill down to the first three digits of the local phone number, for a total of 6 digits known in industry parlance as the NPA-NXX.

The solution I found is to go straight to the source: the website of the company tht administers the North American numbering plan on behalf of the FCC (the NANP actually covers more than the US, including Canada and some Caribbean countries, but the registrar I am refering to only covers the US).

They have a very convenient page with downloadable tables of NPA-NXX to carrier assignments. As an example, here is the entry for my home phone number:

Entry for 415-359-0918
State	NPA-NXX	OCN	Company name	Rate center	Switch	Use
CA	415-359	9740	Pacific Bell	SFNC CNTRL	SNFCCA12DS0	Assigned

OCN is the operating company number, a numeric code they use for carriers. The “rate center” (usually the city or town name) is unfortunately encoded using the proprietary Telcordia Common Language standards rather than in plain English, and you need to pay a license fee to get that database. The carrier name also varies wildly. Pac Bell has been fully subsumed under the SBC brand name, but the old identity still linger in these tables (Verizon, in contrast, has been much more diligent at having these tables updated, even when the pre-merger name is still mentioned).

With number portability, specially the forthcoming wired to wireless number portability required by the FCC, this information will be less useful as you will be able to have, say a New York phone number but actually be in Tokyo using a Vonage VoIP box (possibly even one with a ported number), but it is still a useful resource that is not widely known.

Update (2004-10-25):

This page is the most popular Google search on my website. If all you want is to look up a phone number and don’t want to go through the hassle of importing all the NANPA tables, there are a number of NPA-NXX search pages available on the web.

The Temboz RSS aggregator

2004-03-29 Python Temboz Web Mylos Mylos Longer Articles

2013-03-14: Google’s announcement that their Reader service will be discontinued has spurred interest in Temboz. This software is not dead, in fact I use it daily, but have not made an official release in a long time. You should use the version from Github instead. There are currently a number of bugs which can lead to Temboz locking up and requiring a restart. I am planning on completing my long overdue overhaul before Google’s July deadline.

Introduction
Features
History
Screen shots
Known bugs
Credits
Download
Updates
Post scriptum

Introduction

Temboz is a RSS aggregator. It is inspired by FeedOnFeeds (web-based personal aggregator), Google News (two column layout) and TiVo (thumbs up and down). I have been using FeedOnFeeds for some time now, but that software seems to have stopped evolving, and I had a number of optimizations to the user experience I wanted to make.

Features

Already implemented:

Multithreaded, download feeds in parallel.
Built-in web server.
Two-column user interface for better readability and information density. Automatic reflow using CSS.
Ratings system for articles
Real-time hunter-gatherer user interface: items flagged with a “Thumbs down” disappear immediately off the screen (using Dynamic HTML), making room for new articles. No laborious flagging of items as in FeedOnFeeds.
Filtering entries (using Python syntax, e.g. ‘Salon’ in feed_title and title == “King Kaufman’s Sports Daily”, or simply by selecting keywords/phrases and hitting “Thumbs down”).
Ability to generate a RSS feeds from “Thumbs Up” articles, which is why Temboz would be a true aggregator, not just a reader.
Ad filtering
Automatic garbage collection: every day between 3AM and 4AM, uninteresting articles (by default those older than 7 days) are purged of their contents (but not metadata such as titles, permalinks or timestamps) to keep the database size manageable. After 6 months (by default), they are deleted altogether
Automatic database backups daily (immediately after garbage collection)

On the to do list:

Write better documentation
Handle permanent HTTP redirects for feed XML URLs
Automatic pacing of feed polling intervals using the average and standard deviation of observed feed item inter-arrival times, to reduce bandwidth usage and load for both client and server. Most feeds should be polled on a daily rather than hourly interval (e.g. my own, since I update once a week on average), but the mechanisms for a feed to indicate its polling rate preferences are quite inconsistent from one flavor of RSS/Atom to another.
“Survivor mode” – vote feeds that no longer perform off the aggregator based on relevance statistics.
Ability to cluster together articles (I tried a heuristic of looking for common URLs they are all pointing to, but this didn’t work well in practice).
Portability to Windows, distribution as a standalone package.

History

I have been using it successfully for well over a year. It still has rough edges, with some administration functions only doable using the SQLite command-line utility. Here is a screen shot showing the reader user interface. The article highlighted in yellow was given a “Thumbs Up”. You can also see the user interface at work in a view of the last 50 articles I flagged as “thumbs up” among the feeds I read.

Screen shots

Click on a screen shot thumbnail for a full-sized version

The first screen shot shows the article reading interface, using a two-column layout. Clicking on the “Thumbs down” icon makes the article disappear, bringing a new one in its place (if available). Clicking on the “Thumbs up” icon highlights it in yello and flags it as interesting in the database.

The feed summary page shows statistics on feeds, starting with feeds with unread articles, then by alphabetical order. Feeds can be sorted based on other metrics. You have the option of “catching up” with a feed (marking all the articles as read). Feeds with errors are highlighted in red (not shown).

Clicking on the “details” link for a feed brings this page, which allows you to change title or feed URL, and shows the RSS or Atom fields accessible for filtering.

Feeds can be filtered using Python expressions.

Known bugs

You can check outstanding bug reports, change requests and more at the public CVStrac site.

Credits

Temboz is written in Python, and leverages Mark Pilgrim’s Ultra-liberal feed parser, SQLite 2.x, Cheetah.

Download

You can download the current version: temboz-0.8.tar.gz I welcome any feedback you may have, specially as concerns improving installation.

The CVS version is far ahead of 0.8 in features. I have not yet had the time to test and document the migration procedure from 0.8 to 1.0, but if you are a new Temboz user I strongly advise you to get a nightly CVS snapshot instead (they are what I run on my own server): temboz-CVS.tar.gz or temboz-CVS.zip.

Updates

For news on Temboz, please subscribe to the RSS feed.

Temboz has a CVStrac where you can submit bug reports or change requests, and a Wiki, where all future documentation will ultimately reside.

Post scriptum

The name “Temboz” is a reference to Malima Temboz, “The mountain that walks”, an elephant whose tormented spirit is the object of Mike Resnick’s excellent SF novel, Ivory.

Data mining Outlook for fun and profit

2004-03-16 Python Mylos

For a few years now, I have owned the domain name majid.fm. Dot-fm stands for the Federated States of Micronesia, a micro-state in the Pacific Ocean, and they market their domain names to FM radio stations. Those are also my initials. Unfortunately, the registration fees are quite expensive ($200 every two years), and the domain is redundant now that I have acquired majid.info and majid.org (majid.com is reserved by a Malaysian cybersquatter who is demanding a couple thousand dollars for it – I may be vain, but not that vain). I have decided to let the domain lapse when it expires on April 1st.

I used the majid-dot-FM domain for my emails, and set it up so emails sent to anything @majid.fm would be sent to my primary mailbox fazal@majid.fm. For instance, if I registered with Dell, I would give them the email address dell@majid.fm. This was helpful in tracing where I got my email from, and blacklisting companies that started spamming me (they shall remain nameless to protect the guilty yet litigious).

Unfortunately, spammers and some worms attempt dictionary attacks by trying all possible combinations like jim@majid.fm, smith@majid.fm, and so on. My spam filter would catch some, but not all of them, and it would be a terrible hassle. I do not want to have an auto-responder send emails back to people who email me at the old address, as this would at best flood innocent people whose addresses spammers are impersonating, and at worst actually give my new address to the spammers.

My solution to this dilemma is to produce a Python script that scans through all the emails in my Outlook personal folder (PST) files of archived emails, flag all those who sent me an email, and them manually send them a change of address notification (or in the case of websites and online stores, update my contact info online).

Simply using Outlook’s advanced search function will not work, as in many cases the To: header is set to something other than the address the email is delivered to, such as undisclosed-recipients, or the sender’s address when they send the email to multiple Bcc: recipients (the proper way to proceed when you want to send an email to multiple recipients without giving everyone in the list the email addresses of the other recipients). I actually have to sift through the raw message headers to see the envelope destination address.

Here is a simplified version of olmine.py, the script I used. It requires Python 2.x with the win32all extensions, and Outlook 2000 with the Collaboration Data Objects (CDO) option installed (this is not the default). CDO is required to access the full headers. Of course, this script can be useful for all sorts of social network analysis fun on your own Outlook files, or more prosaically to generate a whitelist of email addresses for your spam filter.

import re, win32com.client

srcs = {}
dsts = {}
pairs = {}

# regular expression that scans for valid email addresses in the headers
m_re = re.compile(r'[-A-Za-z0-9.,_]*@majid\.fm')
# regular expression that strips out headers that can cause false positives
strip_re = re.compile(r'(Message-Id:.*$|In-Reply-To:.*$|References:.*$)',
                      re.IGNORECASE | re.MULTILINE)

def dump_folder(folder):
  """Iterate recursively over the given folder and its subfolders"""
  print '-' * 72
  print folder.Name
  print '-' * 72
  for i in range(1, folder.Messages.Count + 1):
    try:
      # PR_SENDER_EMAIL_ADDRESS
      _from = folder.Messages[i].Fields[0x0C1F001F].Value
      # PR_TRANSPORT_MESSAGE_HEADERS
      headers = folder.Messages[i].Fields[0x7d001e].Value
    except:
      # ignore non-email objects like contacts or calendar entries
      continue
    stripped_headers = strip_re.sub('', headers)
    for _to in m_re.findall(stripped_headers):
      srcs[_from] = srcs.get(_from, 0) + 1
      dsts[_to] = dsts.get(_to, 0) + 1
      if (_from, _to) not in pairs:
        print _from, '->', _to
      pairs[_from, _to] = pairs.get((_from, _to), 0) + 1
  # recurse
  for i in range(1, folder.Folders.Count + 1):
    dump_folder(folder.Folders[i])

# connect to Outlook via CDO
cdo = win32com.client.Dispatch('MAPI.Session')
cdo.Logon()
# iterate over all the open PST files
for i in range(1, cdo.InfoStores.Count + 1):
  store = cdo.InfoStores[i]
  root = store.RootFolder
  m = root.Messages
  store.ID
  print '#' * 72
  print store.Name
  print '#' * 72
  dump_folder(root)
cdo.Logoff()

Stationery pattern

2004-03-13 Stuff Mylos

I found myself buying quite a bit of stationery recently. The nice thing is, even premium stationery lines are cheap compared to computers or photography, my other capital-intensive hobbies. A top of the line solid sterling silver pen like the Waterman Edson LE will not break the $1000 barrier, when you can’t even buy a handbag for that price from most luxury brands.

I spent four years in a Catholic school in Versailles, a very conservative city, where we were not allowed to use ball point pens because they deform handwriting. I used Sheaffer pens back then, and all the way to college. When I started working, I splurged on an Edson Blue, but what with computers and email, seldom got to use it.

There is a kind of fashion phenomenon brewing around Moleskine notebooks and journals, including an aficionado website (of which I was a charter contributor). Their Italian manufacturer concocted a clever marketing campaign to give them a cosmopolitan aura of travel mystique, branding them as “The Legendary Notebook as used by Van Gogh, Chatwin, Hemingway, Matisse and Céline” (to which my obvious reaction was, Chatwin who?). These notebooks have thin yet rigid covers, a pocket for clippings in the back, and an elastic band to keep them closed. They are decently made, but not exceptionally so. Just holding one in your hands makes you want to write in them, in a purely emotional way, as described in a recent book by Don Norman.

After the bug bit me, I bought in quick order:

A pair of Faber-Castell mechanical pencils that use thick 1.4mm leads and just glide on paper
Some sets of Crane’s paper. It is made from 100% cotton rag and they are the official supplier for US currency.
A pad of G. Lalo “Vergé de France” laid paper (paper that has a horizontally striped watermark, for a striking yet classy finish). This paper responds exceptionally well to fountain pens.
A Pelikan Souverän 800 fountain pen in classic green Stresemann stripes, with a broad nib for bold modulated strokes. My grandfather used to own a pen like this one, and they are the equivalents of Montblanc pens in quality, at a much more reasonable price.
Inks by J. Herbin, a company that has been operating since 1670. I am particularly enamored of their black (Perle des encres) and meadow-green (Vert pré) inks, the latter is the color I am now using for the navigation and date headings on this site.
A handsome journal by local company Oberon Design, with a richly detailed cover hand-tooled from leather in a Celtic knotwork pattern (St Patrick’s day is just around the corner…) and matching pewter accents. Many of their designs are breathtaking, like the oak tree on their home page.

Now, I just have to muster the inspiration to make use of all this…

Update (2004-03-21):

I may have to revise my judgement about “inexpensive”. Last Friday, I stopped by at a local Montblanc store. The saleswoman tried to interest me in a J. P. Morgan Limited Edition fountain pen, for a mere $1850… Her closing argument? “It will sell out soon”. Given the slightly hysterical nature of Montblanc collectors (and the fact it is the default brand for rich people who are not all that knowledgeable about pens, much as Rolex is for watches), she may well be right.