Fazal Majid's low-intensity blog

Sporadic pontification

Fazal

Backing up is hard to do (right)

You can never overstate the importance of backups. Over the last year I have put quite a bit of effort in making sure my data is backed up properly. The purpose of this article is not to describe backup best practices (that is a vast subject, there are other, better resources available on the web, and in any case there is no one-size-fits-all solution). I am just documenting my setup, the requirements that drove it, and possibly give readers some ideas.

The first part in planning for backup is to do an inventory of the assets you are trying to protect. In my case, in order of priority:

  • 1.5GB of scans of important documents: birth certificates, diplomas, invoices, legal documents, bank statements, and so on. This data is very sensitive, and should be encrypted.
  • 150GB of digital photos and scans
  • My address book, which lives on my laptop
  • My source code repositories
  • My personal email, approximately .75GB
  • The contents of this website, about 5GB
  • 190GB of music (lossless rips of my CD collection)
  • My Temboz article database

Thus the total storage capacity required for a full backup is reaching the 400GB mark. This in itself precludes DVD-R or even tape backup (short of buying an expensive LTO-4 tape drive or an autoloader, that is).

The second step is to devise your threat model. In my case, by decreasing order of likelihood:

  1. Human error
  2. Hard drive failure
  3. Software failure (e.g. filesystem corruption)
  4. Silent data loss or corruption, e.g a defective disk
  5. Theft
  6. Fire, earthquake, natural disaster, etc.

Third, some general principles I believe in:

  • Do not use proprietary backup formats. The best format is plain files on a filesystem identical in structure to the original.
  • Do not rely on offline media for backups. The watched pot does not boil over, online data is much less likely to go bad without my noticing until it is too late.
  • A backup plan needs to be effortless to be successful. Plugging in external drives when backups are needed, or rotating drives between home and office is something I have tried, but not stuck to.
  • Backups should be verified — they should generate positive feedback, so that the absence of feedback can alert to problems
  • For all types of data, there should be one and only one reference machine that holds the authoritative copy. Multi-master synchronization and replication is possible using tools like Unison, but is much harder to manage and increases the risk of human error.

With these preliminaries out of the way, here is my system:

  • My primary backups reside on my home server, a Sun Ultra 40 M2 workstation, running Solaris 10. This machine is very quiet, so I can keep it running in the room next to my bedroom without disturbing my sleep. It is also relatively power-efficient at 160W with seven hard drives.
  • One of the seven drives is the 160GB boot drive, and the other six are 750GB Seagate drives configured in a 3TB ZFS RAID-Z2 storage pool.
  • With large SATA drives, reconstruction after a drive failure is long and the risk of another drive failing due to the stress of rebuilding is not negligible. RAID-Z2 can tolerate two drives failing, unlike RAID 5 which can only tolerate a single drive failure. This level of data protection is higher than RAID 1 since RAID 1 won’t protect you if two drives that are the mirror of one another fail. You can get the same level of protection in RAID 6 or RAID-DP.
  • I have scripts to take ZFS snapshots daily, equivalent to the auto-snapshot service. The daily snapshots are kept for the current month, then I keep only monthly snapshots. Snapshots are the primary line of defense against human error.
  • Snapshot technology consumes only as much disk space as required to store the differences between the snapshot and current versions of a file, and is much more efficient than schemes like Apple’s Time Machine where a single byte change to a multi-gigabyte file like a Parallels virtual disk image will cause the entire file to be duplicated, wasting storage. Because snapshots are taken near instantly and cost almost nothing, they are an extremely powerful feature of a storage subsystem.
  • I backup from my various machines to the Sun via rsync over ssh. An incremental backup of my PowerMac G5, which has most of the 400GB in my backup set, takes less than 5 minutes over Gigabit Ethernet, despite the ssh encryption.
  • ZFS is probably the best filesystem, bar none, but it is not perfect, as demonstrated by the Joyent outage and you still need another copy for backup in case of ZFS corruption.
  • Every night at 2AM a cron job on my old home server (2x400GB, ZFS RAID 0), that I now I keep at work, pulls updates from the Sun using rsync over ssh (the company firewall won’t let me push updates to it from the Sun). Another cron job at 8AM kills any leftover rsync processes, e.g. if there are more data changes to transfer than fit in the 1-2 GB that can be transferred in 6 hours over my relatively pokey 320-512kbps DSL uplink (no thanks to AT&T’s benighted refusal to upgrade its tired infrastructure).
  • My cron jobs use verbose output which generates an email sent back to me. I could suppress those messages, but then I would lose the ability to detect errors.
  • A last line of defense is to back up my server at work to a D-Link DNS-323 NAS box using rsync over NFS. This cute little unit holds two Western Digital Green Power 1TB drives in RAID 1, which slide right in, no tools required. It consumes next to no power or desk space. Since it runs Linux and is easy to extend using fun-plug, I could conceivably run the cron and rsync from there. As a bonus, the built-in mt-daapd server streams my entire music collection to iTunes over the LAN so I can listen to any of my CDs at work.
  • It can take a few days for this data bucket brigade to catch up with a particularly intense photo shoot, but it will eventually and is never too far behind. This provides me with near continuous data protection and disaster recovery.

Update (2009-10-07):

I made some changes. My office backup server is now an inexpensive Shuttle KPC 4500 running OpenSolaris 2009.06 and a 1TB drive. It in turn backs up to the DNS-323, although I need to qualify the recommendation – like many embedded Linux devices, the DNS-323 has a distressing tendency to get wedged every now and then, requiring a reboot, and is not reliable enough as primary offsite backup in my book. OpenSolaris, of course, is rock-stable, and the hardware is not much more expensive (I paid $400 for the KPC).

My backups are now much faster since I upgraded to 20Mbps symmetric Metro Ethernet service from Webpass a month ago.

Update (2014-01-09):

Since I moved to a semi-suburban house two years ago and had to revert to AT&T’s abysmally slow DSL service, remote backups over rsync are no longer a viable option and I have to use sneakernet. My current setup is:

  • A Time Machine backup onto a 4TB internal drive inside my Mac
  • hourly rsync backups onto a 2TB WD My Passport Studio. I actually have two of these and rotate them between home and office. They have a metal case (helps heat dissipation and increase drive lifetime and reliability) as well as hardware AES encryption

Push recruiting

As I was debugging why feedparser is mangling the GigaOM feed titles, I found this easter egg on the WordPress hosted site:

zephyr ~>telnet gigaom.com 80
Trying 72.232.101.40...
Connected to gigaom.com.
Escape character is '^]'.
GET /feed HTTP/1.0
Host: gigaom.com

HTTP/1.0 301 Moved Permanently
Vary: Cookie
X-hacker: If you're reading this, you should visit automattic.com/jobs and
apply to join the fun, mention this header.
Location: http://feeds.feedburner.com/ommalik
Content-type: text/html; charset=utf-8
Content-Length: 0
Date: Thu, 20 Mar 2008 23:36:17 GMT
Server: LiteSpeed
Connection: close

Connection closed by foreign host.

Knowing how to issue HTTP requests by hand is one of my litmus tests for a web developer, but I had never thought of using it in this creative way as a recruiting tool…

US banks lag behind in secure email adoption

My banks send me monthly reminders when a statement is ready, but I have to log onto their site to actually get it. This is quite annoying, I would much rather have them simply attach the statements to the notification emails, but I can understand their security concerns. The current system does encourage bad habits that can be exploited by phishers, however.

One of my colleagues informed me that in Japan, banks will actually send them by email using S/MIME public key encryption. I have a S/MIME certificate courtesy of the Thawte web of trust (in fact I am also a Thawte WOT notary) but no US bank that I know of supports this. Secure email adoption is so low in no small part due to the NSA’s successful campaign to make encryption inconvenient to obtain. All major email clients support it (Outlook, Apple Mail.app, Thunderbird, and so on), but webmail users don’t even have the option. This is just another illustration of how the US is lagging behind Asia and Europe in Internet adoption.

Macworld 2008 round-up

MacBook AirThe MacBook Air was what I was waiting for (I pre-ordered the SSD version just before the online Apple Store buckled under the load). I have a MacBook Pro 15″, and because of its weight I end up leaving it at work and not carry it with me at all times (the MacBook is hardly any lighter). Sure, the Air has drastically limited connectivity (the lack of Gigabit Ethernet is probably what I will regret most, even though I clocked my Airport Extreme at 90 true Mbps throughput). Other minuses include the glossy screen (instead of an anti-reflective one), the MacBook-like chiclet keyboard rather than the much nicer MacBook Pro keyboard), or the sealed non user-replaceable battery.

I suspect people deriding it are people whose main machine is a laptop. My main machine is a tower desktop, and no laptop is ever going to compete in terms of capacity and expandability. The drive on the laptop is merely a cache for the desktop where the real data lives. The compromises the Air makes are acceptable ones in exchange for a machine that is light enough for me to carry all the time. I was considering getting an Asus Eee PC prior to the show, and the MacBook Air is a vastly more capable and polyvalent machine.

Apart from that, the show was a relatively quiet one with few truly noteworthy new products. Here are the main highlights:

  • Matias did not have the Tactilepro 2.0 keyboard on display. I love mine (a version 1 with the ALPS keyswitch) and would like to get a spare, but apparently they have parted ways with the manufacturer of the new Matias-designed keyswitches and are working on a 3.0 version for later this year.

  • Fujitsu were demonstrating an ultra-small, bus-powered document scanner, the S300M. Unfortunately, once again for reasons due to licensing of the bundled software, they could not release a single SKU that would work with both PCs and Macs.

  • The German company Project Wizards was demonstrating Merlin, a project management program similar to Microsoft Project. The scheduling and load-leveling algorithms look at least as capable as Project 2000, and they told me the next version will allow team members to report on task advancement by simply contacting a built-in web server. Looks like a promising product.

  • Samsung showed the CLP-300 which they bill the world’s smallest color laser printer. Indeed it looks roughly the same size as my monochrome HP LaserJet 1320, and much smaller than my bulky HP 2605dn, that’s quite an achievement. I am wary of Samsung lasers since buying the CLP-500 for Kefta a few years back. The print quality was fine, but it was ludicrously slow, taking something like 5 minutes per color page to print. The CLP-300 seems reasonably fast, faster than the 2605dn at any rate.

  • Samsung was also showing off the gorgeous XL30 30″ LED-backlit LCD monitor. LED backlight is more environmentally friendly, does not shift colors as it ages unlike a TFT backlight, and gives a wider color gamut. Unfortunately, its price is a princely “between $6000 and $7000”.

  • Microsoft was showing off Office 2008, emphasizing ease of use and productivity rather than features for features’ sake for a change. Microsoft Blogger lounge

    They even set up a bloggers-only salon to curry favor, complete with Internet cafe and snacks.

    • I tried Nikon’s humongous AF-S VR Nikkor 200mm f/2G IF-ED lens. Very heavy but impressive piece of gear.
    • Canon was showing off the new Flash-based HD camcorders they introduced at CES. They are not that much smaller than the HDV ones. The HV30 replaces the excellent HV20, but the only real improvements are 1080p30 mode and an articulating LCD.

A San Francisco local’s advice to Macworld attendees

Third StreetI have been living and working in downtown San Francisco for almost eight years now. Until a month ago, my office window (right) used to overlook Third Street and the Moscone center. San Francisco is a popular convention destination (one wonders why proctologists seem to prefer it to, say, Detroit) but Macworld Expo is definitely the biggest show in town. Restaurants and hotels are taken by storm, taxis become scarce, traffic gets even snarlier and the lines at Metron eateries cross the threshold of ludicrousness. So here are a few tips for Macworld attendees to have a better time and not caught in tourist traps.

Transportation

Driving in San Francisco is a non-starter. Traffic is horrendous, parking is scarce and you would lose far too much time just getting around. SF Muni is a pretty good public transport system (at least by admittedly paltry US standards) and their 1, 3 or 7 day Passport passes are good value.

Cars are mostly useless inside the city, but nice if you want to drive to make a Fry’s run or a day trip to Marin across the Golden Gate. If you must drive, the friendly folks at Reliable Rent-a-Car will give you decent rates on Toyotas. Until I bought a car last month, they were my go-to place for when I needed a car.

Lunch

San Francisco has the best food in the United States, but you wouldn’t know if from the overpriced eateries in a three block radius. The Firewood Cafe and Buckhorn Grill in the Metron are actually reasonably decent, but the throngs of convention-goers mean long lines. Mo’s Grille has excellent burgers (I recommend the aptly named “Belly Buster”), and since access to it is a little tortuous, you have a fighting chance (it is literally just above the Moscone South).

Ranging a little further, Nova has decent burgers and a lovely lobster quesadilla, and the new Westfield Mall three blocks to the west has a decent food court. Some good local chains are Bistro Burger, S.F. Soup Co. or Café Madeleine (official birthday cake purveyor to Kefta).

That said, the best lunch experience is to take the historic F line streetcar to the Ferry Building Marketplace with its wide variety of gourmet food stores and eateries. I heartily recommend the clam chowder at Ferry Plaza Seafood (it used to be my Friday lunch of choice) or the eclectic fare at Boulette’s Larder. Chocolates from Michael Recchiuti or fresh-pressed olive oil from Stonehouse make for great (and edible) souvenirs.

Staying hydrated is important when you expect to spend an entire day on the show floor. There is a Whole Foods store a mere block away where you can buy any required provisions.

Dining

Dining in San Francisco is an embarrassment of riches, it would be a shame to settle for overpriced hotel food. A word to the wise: most of the better places are hooked into the OpenTable reservation system which makes finding a good place with availability a much less hit-and-miss affair. This year Macworld coincides with the annual Dine About Town event where participating restaurants will offer specially discounted menus.

Equipment

Murphy’s law will strike at the worst possible moment. If you need help with your Mac, the geniuses at the San Francisco Apple Store (or the smaller Chestnut Street and Stonestown locations) can help. It’s also good to keep in mind the Apple stores all offer free WiFi connectivity.

If you need commodity spare parts like a USB hub in a hurry, Central Computers is a mere block away and carries a wide assortment, albeit PC-centric.

If you are an attendee and have questions I have not answered, please feel free to email me, my contact info is at the right.