Backing up my world - Online back ups for work and play

For a good while I’ve been meaning to get around to backing up my data properly. I’ve always had an external hard disk on which I backed up my important data periodically but that only covers the most basic of situations. If my house burnt down, or more likely somebody broke into my house,even just a power surge, I could conceivably loose both my real copy and the backup. From this I decided the only sensible way was to explore an offsite backup solution to use in addition to local backups. I’ve also been spurred on after buying an SSD and being unconvinced about their reliability. Indeed since starting to write this post my new SSD was recalled by corsair…

Daily backup drives

In this post I’ll describe what I backup and the online services and software I used.

My Data

I consider my requirements to be fairly standard for a web developer, I have a few types of data which need to be considered separately:

Solutions

Code

All my code is stored under version control using either git or mercurial, this makes backing up very easy.

Closed Source

All my closed source code is work related and is backed up on a private Linode VPS using mercurial, this acts as the central repository and is also backed up automatically to a local machine at work using rsync.

Open Source

All the code I write personally is open source so I combine sharing with backup using github this does lead to me having one or two repositories with odd selections of code in but somebody may benefit from them one day!

Photos & Documents

Traditionally I had always survived with my photos stored in my desktop computer and a usb hard disk sitting next to the computer, now I’ve decided this isn’t sufficient so I’ve started to use Amazon S3 as an extra level of security. I use the command line too s3cmd to sync files in an rsync-like way. There is no encryption in this. I use a cron job to automate the synchronization periodically. The only negative point I found was the large amount of time taken to upload 30GB of photos on a domestic ADSL line.

For my documents containing confidential data I simply add the extra step of using the gpg encryption built into s3cmd. This means I can still get to my files even without the s3cmd tool.

Music

As I have around 150GB of music stored in mp3 format I fell this is an unrealistic amount to upload to amazon S3 as it would take several weeks. At present I use spotify for a large amount of my music listening and am becoming less and less dependent on mp3s so at the moment there is no other sensible option apart from mirroring on a usb drive, often this is kept at my parents house so it’s just as good as any online system.

Services & Tools

When choosing tools and services I was looking for known names which are likely to stick around, not the latest great start-up which might be gone next week.

Github & Mercurial

These standard version control systems require little explanation, there is little to choose between them and they work fast and reliably to record, version and push code around.

Amazon S3

There are several cloud storage providers available currently, I choose Amazon without looking round much as to me it seems the least likely to go away, it’s used by many large websites and based on the Amazon .com infrastructure. Out of the box it’s not a user friendly solution with only a primitive web interface, but when it’s comprehensive API is combined with one of the many third party tools it becomes a very powerful storage platform. Cost wise it’s around $0.14 per GB per month + $0.10 per GB of data transferred in or out, this to me is a negligible cost.

There are many tools available which provide different functionality and differing operating system.

S3cmd

S3cmd is a command line tool to interface with amazon s3, once you’ve given it your api details you can access your buckets on S3 much like any other filesystem using s3cmd to put/get/sync files or folders between your local filesystem and S3, the interface is very similar to rsync. For example to syncronise the current folder with a bucket called photos:

s3cmd sync ./ s3://photos

Familiar commands such as ls are also available:

s3cmd ls

would list all the buckets you have created. Similar commands are available to create and delete buckets.

On the fly gpg encryption is also available but sadly only with the put and get commands, not sync.

One slight gotcha is that unless you specifically state in the config file the location, all buckets are created in the US Amazon datacenter, for speed I’d prefer the one in Ireland.

I use this program for all my regular contact with S3 as it’s easily scriptable for use in a cron job.

S3cmd is also available in common linux repositories.

Duplicity

Duplicity is a general purpose command line backup tool with creates tar volumes of files or folders and moves them to external media, it then creates incremental backups as requested until another full backup is created. All files are compressed and optionally encrypted on demand so it’s very simple to the user. I have used duplicty in the past but I felt it was a little overly complex, s3cmd is much easier to setup and use, for most of my files I also felt that I’d prefer just to  keep a copy rather than incremental backups. If you need more sophistication than s3cmd, or indeed would like to backup to something that’s not s3 then Duplicity may be for you.

Deja Dup

Deja dup is a nice GUI for duplicity which is increasingly being included with Gnome based linux distributions, in short you specify some folders, a destination and a backup frequency and then it takes care of the rest. For me however it was all too simple and vague, there is no possibility to set when the backups take place, no possibility to decide if an incremental backup or a full one is made. It may be nice for a non geeky user who just wants their data backed up sometime regularly, but I want to specify and know and see a log.

...

All in all I’m very happy with a mix on encrypted and unencrypted files hosted on an external drive locally and on S3, as long as they are maintained this should provide me with the ability to recover from most eventualities. ….Famous last words…