IT

Is this a sign RSS is finally going mainstream?

Dialogue from The Librarian, a TV movie on TNT. Sheltered “professional student” Flynn Carsen (played by Noah Wyle of “E.R.” fame) is kicked out of school and is interviewing for a job at a vaguely supernatural library:

Charlene (Jane Curtin, coldly): What makes you think you could be the Librarian?

Flynn Carsen (Noah Wyle): I know the Dewey decimal system, Library of Congress, research paper orthodoxy, web searching, I can set up a RSS feed…

Charlene (stifling a sigh): Everybody can do that. They’re librarians.

Pointless referrer spamming

Q: What happens when you cross a mobster with a cell phone company?
A: Someone who makes you an offer you can’t understand.

The HTTP protocol used by web browsers specifies an optional Referer: (sic) header that allows them to tell the server where the link to a page came from. This was originally intended as a courtesy, so webmasters could ask people with obsolete links to update their pages, but it is also a valuable source of information for webmasters who can find out which sites link to them, and in most cases what keywords were used on a search engine. Unfortunately, spammers have found another well to poison on the Internet.

Over the past month, referrer spam on my site has graduated from nuisance to menace, and I am writing scripts that attempt to filter that dross automatically out of my web server log reports. In recent days, it seems most of the URLs spammers are pushing on me point to servers with names that aren’t even registered in the DNS. This seems completely asinine, even for spammers: why bother spamming someone without a profit motive? I was beginning to wonder whether this was just a form of vandalism like graffiti, but it seems the situation is more devious than it seems at first glance.

Referrer spam is very hard to fight (although not quite as difficult as email spam). I am trying to combine a number of heuristics, including behavioral analysis (e.g. whether the purported browser is downloading my CSS files or not), WHOIS lookups, reverse lookups for the client IP address, and so on. Unfortunately, if any of these filtering methods become widespread, the spammers can easily apply countermeasures to make their requests look more legitimate. This looks like another long-haul arms race…

IM developments

Telcos look at instant messaging providers with deep suspicion. Transporting voice is just a special case of transporting bits, and even the global Internet is now good enough for telephony (indeed, many telcos are already using IP to transport voice for their phone networks, albeit on private IP backbones). The main remaining barriers to VoIP adoption are interoperability with the legacy network during the transition, and signaling (i.e. finding the destination’s IP address). IM providers offer a solution for the latter, and could thus become VoIP providers. AOL actually is, indirectly, through Apple’s iChat AV. This competitive threat explains why, for instance, France Télécom made a defensive investment in open-source IM provider Jabber.

Two recent developments promise to change dramatically the economic underpinnings of the IM industry:

  1. Yahoo announced a few weeks ago it would drop its enterprise IM product. Within a week, AOL followed suit.
  2. AOL and Yahoo agreed to interoperate with LCS, Microsoft’s forthcoming Enterprise IM server. Microsoft will pay AOL and Yahoo a royalty for access to their respective IM networks.

These announcement make it clear neither Yahoo nor AOL feel they can sell successfully into enterprise accounts, and certainly not match Microsoft’s marketing muscle in that segment.

The second part, in effect Microsoft agreeing to pay termination fees to AOL and Yahoo, means that Microsoft’s business IM users will subsidize consumers. This is very similar to the situation in telephony, where businesses cross-subsidize local telephony for residential customers by paying higher fees. For most telcos, interconnect billing is either the first or second largest source of revenue, and this development may finally make IM profitable for Yahoo and AOL, rather than the loss-leader it is today.

Apparently Microsoft has concluded it cannot bury its IM competitors, and would rather make money now serving its business customers’ demand for an interoperable IM solution than wait to have the entire market to itself using its familiar Windows bundling tactics. Left out in the cold is IBM’s Lotus Sametime IM software.

Businesses will now be able to reach customers on all three major networks, but this does not change the situation for consumers. The big three IM providers have long played cat-and-mouse games with companies like Trillian that tried to provide reverse-engineered clients that work with all three networks. Ostensibly, this is for security reasons, but obviously the real explanation is to protect their respective walled gardens, just as in the early days the Bell Telephone company would refuse to interconnect with its competitors, and many businesses had to have maintain multiple telephones, one for each network. It is not impossible, however, that interoperability will be offered to consumers as a paid, value-added option. Whether consumers are ready to pay is an entirely different question.

Effective anti-spam enforcement

The European Union E-Privacy directive of 2002, the US CAN-SPAM act of 2003 and other anti-spam laws allow legal action against spammers. Only official authorities can initiate action (although there are proposals to set up a bounty system in the US), but enforceability of these statutes is a problem, as investigations and prosecutions are prohibitively expensive, and both law enforcement and prosecutors have other pressing priorities contending for finite resources. Financial investigative techniques (following the money trail) that can be deployed against terrorists, drug dealers and money launderers are overkill for spammers, and would probably raise civil liberties issues.

There is an option that could dramatically streamline anti-spam enforcement, however. Spammers have to find a way to get paid, and payment is usually tendered using a credit card. Visa and Mastercard both have systems by which a temporary, one-time use credit card number can be generated. This service is used mostly to assuage the fears of online shoppers, but also provides a solution.

Visa and Mastercard could offer an interface that would allow FTC investigators and their European counterparts to generate “poisoned” credit card numbers. Any merchant account that attempts a transaction using such a number would be immediately frozen and its balance forfeited. Visa and Mastercard’s costs could be defrayed by giving them a portion of the confiscated proceeds.

Of course, proper judicial oversight would have to be provided, but this is a relatively simple way to nip the spam problem in the bud, by hitting spammers where it hurts most – in the pocketbook.

Why IPv6 will not loosen IP address allocation

The current version of Internet Protocol (IP), the communications protocol underlying the Internet, is version 4. In IPv4, the address of any machine on the Internet, whether a client or a server, is encoded in 4 bytes. Due to various overheads, the total number of addresses available for use is much less than the theoretical 4 billion possible. This is leading to a worldwide crunch in the availability of addresses, and rationing is in effect, specially in Asia, which came late to the Internet party and has a short allocation (Stanford University has more IPv4 addresses allocated to it than the whole of China).

Internet Protocol version 6, IPv6, quadrupled the size of the address field to 16 bytes, i.e. unlimited for all practical purposes, and made various other improvements. Unfortunately, its authors severely underestimated the complexity of migrating from IPv4 to IPv6, which is why it hasn’t caught on as quickly as it should have, even though the new protocol is almost a decade old now. Asian countries are leading in IPv6 adoption, simply because they don’t have the choice. Many people make do today with Network Address Translation (NAT), where a box (like a DSL router) allows several machines to share a single global IP address, but this is not an ideal solution, and one that only postpones the inevitable (but not imminent) reckoning.

One misconception, however, is that that the slow pace of the migration is somehow related to the fact you get your IP addresses from your ISP, and don’t “own” them or have the option to port them the way you now can with your fixed or mobile phone numbers. While IPv6 greatly increases the number of addresses available for assignment, this will not change the way addresses are allocated, for reasons unrelated to the address space crunch.

First of all, nothing precludes anyone from requesting an IPv4 address directly from the registry in charge of their continent:

  • ARIN in North America and Africa south of the Equator
  • LACNIC for Latin America and the Caribbean
  • RIPE (my former neighbors in Amsterdam) for Europe, Africa north of the Equator, and Central Asia
  • APNIC for the rest of Asia and the Pacific.

That said, these registries take the IP address shortage seriously and will require justification to grant the request. Apart from ISPs, the other main kind of allocation recipients are large organizations that require significant numbers of IP addresses (e.g. for a corporate Intranet) and that will use multiple ISPs for their Internet connectivity.

The reason why IP addresses are allocated mostly through ISPs is the stability of the routing protocols used by ISPs to provide global IP connectivity. The Internet is a federation of independent networks that agree to exchange traffic, sometimes for free (peering) or for a fee (transit). Each of these networks is called an “Autonomous System” (AS) and has an AS number (ASN) assigned to it. ASNs are coded in 16 bits, so there are only 65536 available to begin with.

When your IP packets go from your machine to their destination, they will first go through your ISP’s routers to your ISP’s border gateway that connects to other transit or final destination ISPs leading to your destination. There usually are an order of magnitude or two fewer border routers than interior routers. The interior routers do not need much intelligence, all they need to know is how to get their packets to the border. The border routers, on the other hand, need to have a map of the entire Internet. For each block of possible destination IP addresses, they need to know which next-hop ISP to forward the packet on to. Border routers exchange routing information using the Border Gateway Protocol, version 4 (BGP4).

BGP4 is in many ways black magic. Any mistake in BGP configuration can break connectivity or otherwise impair the stability of vast swathes of the Internet. Very few vendors know how to make reliable and stable implementations of BGP4 (Cisco and Juniper are the only two really trusted to get it right), and very few network engineers have real-world experience with BGP4, learned mostly through apprenticeship. BGP4 in the real scary world of the Internet is very different from the safe and stable confines of a Cisco certification lab. The BGP administrators worldwide are a very tightly knit cadre of professionals, who gather in organizations like NANOG and shepherd the Net.

The state of the art in exterior routing protocols like BGP4 has not markedly improved in recent years, and the current state of the art in core router technology just barely keeps up with the fluctuations in BGP. One of the control factors is the total size of BGP routing tables, which has been steadily increasing as the Internet expands (but no longer exponentially, as was the case in the early days). The bigger the routing tables, the more memory has to be added to each and every border router in the planet, and the slower route lookups will be. For this reason, network engineers are rightly paranoid about keeping routing tables small. Their main weapon consists of aggregating blocks of IP addresses that should be forwarded the same way, so they take up only one slot.

Now assume every Internet user on the planet has his own IP address that is completely portable. The size of the routing tables would explode from 200,000 or so today to hundreds of millions. Every time someone logged on to a dialup connection, every core router on the planet would have to be informed, and they would simply collapse under the sheer volume of routing information overhead, and not have the time to forward actual data packets.

This is the reason why IP addresses will continue to be assigned by your ISP: doing it this way allows your ISP to aggregate all its IP addresses in a single block, and send a single route to all its partners. Upstream transit ISPs do even more aggregation, and keep the routing tables to a manageable size. The discipline introduced by the regional registries and ISPs is precisely what changed the exponential trend in routing table growth (one which even Moore’s law would not be able to keep up with) to a linear one.

It’s not as if this requirement is anti-competitive, unlike telcos dragging their feet on number portability – the DNS was precisely created so users would not have to deal with IP addresses, and can easily be changed to point to new addresses in the event of a change of IP addresses.