On Joyent's recent storage mishaps

I have read with interests comments on Joyent's blog of disgruntled users among which professional system administrators seemed to be found. Beyond the technical merits of the recovery it is quite clear that selling storage services with no backup scheme that would allow data to be available within a matter of hours or a day at best is a dangerous proposition for an internet startup boasting technical prowess. At best it casts a negative light on their understanding of what a backup is and that ultimately I store data remotely not because ZFS or Thumper are cool (they are) but because I hope to be able to retrieve it more elegantly and more simply than going to the bank to retrieve the tape from the vault.

Interestingly enough the brand leader in distributed storage, amazon has an interesting Service Level Agreement that worries solely about the service being available, not that data returned are immune to corruption. I am wondering whether the next step is going to be data insurance, against data loss and format obsolescence. Something to ponder.

In my context neither of these options are appealing since data are the bread and butter of my company, hence the painstaking off-site backup process to mitigate risks and the oh-so-enjoyable-but-needed task of spelling out a comprehensive disaster recovery plan.

Personal back-up options

A quick personal note, yesterday I was looking for a good secondary backup solution from my personal files.  I currently have an external drive attached to my main machine, and every morning, cron faithfully rsync's some critical directories to that external drive.  This is only as good as:
  1. the filesystem layer on Mac OS X goes crazy and trashes my drive (unlikely).
  2. my house gets burglarized (likely)
  3. I make a mistake, delete files that I should not delete and don't realize it within at most 24 hours (likely).
Hence I was looking for yet another copy to have around, in case one of the three events occur.  The three contenders were:
  1. bingodisk
  2. strongspace
  3. S3
bingodisk is cheap (2$ per GB per year), has fairly high bandwidth caps and a nice webdav interface, but rsync does not play well with it.  strongspace has rsync but is too expensive (15$ per month only get you 5GB).  Both suffer from the fact that they are offered by a small company, joyent.  Sure they use Solaris, ZFS and Thumpers -- granted the combination is hot, but how much of a guarantee is that they protect the data from harm, or, for that matter, that they can stay in business long enough.  In the same categories fall xdrive, box.net and the likes. There remained S3 as a storage option from a company that will stay around for a while, can obviously operate a large-scale computing environment (I'm not saying that others don't, simply that they have yet to demonstrate the same scale). Now I have about 100 GB of data to back up.  Based on S3's prices, it comes down to 200 dollars per year, that's great.  Granted the interface is much more geared toward a non-file-based programmatic interaction so you need some third-party tool to really use it as a backup tool. rsync does not work well with it (I tried it with JungleDisk) but there is a ruby-based replacement called s3sync, which works quite well and optimizes the amount of transferred data.  Of I went with a little script based on that blog entry. Then I realized it would take a couple of weeks just to upload all my files...  So I ended up buying yet another external hard-drive that I'll refresh once a week and store in another location. The prospect of storage as utility is exciting nonetheless.