Chris Holtz

Let's Have a Frank Talk About Your Backup Plan

You do backup you data, right? No, I’m not talking about the steps your IT staff takes to backup on your work network. That’s all well and good but it’s not enough for you. You need to be responsible for your data.

Edit: As some have pointed out, redundancy and backup are not the same thing, Redundancy protects you against a failing hard drive. Backup protects you from the time you accidentally delete something you shouldn’t have.


Let’s get this out of the way first: if you are not backing up your data at all, you are screwed. Hard drives fail – it’s not a matter of whether they will fail, it’s a matter of when they will fail. Do you have a solid state drive? Your chances of drive failure just went up (but you should get one anyway).

There all sorts of measures people take to recover their data. You don’t want to be one of them. You want to be proactive.

Source control is not a complete backup strategy

Actually, source control can be a pretty good backup strategy, but making it a complete solution a lot of hassle. Source control will duplicate your files and provide meta data . If you push your changes to a remote repository, it gets your files onto another machine – that’s a very good thing. That alone takes care of 90% of your needs.

Do you have mp3 files? How about video files? Are you going to check them into source control? Probably not… I mean you could, but a) it’s time consuming and b) it’s manual. Those two factors combined mean you probably won’t do it. As an experiment, go to your music folder and create a local git repository out of it. Add all the files and then commit… then track how long the commit takes. For bonus points, create an empty repository on another computer on your local network and push to it. How long did it take?

A copy of your data is not a complete backup strategy

A complete copy of your important data goes a long way towards maintaining a safe backup. In fact, if that is all you did, you’re light years beyond those with no backups. However… what happens if your primary drive starts to go bad and you don’t know it yet? It’s possible your files can become corrupted – then, when you make an exact copy of your data, your backup has corrupted files. Now you have a pristine copy of mangled data.

Couple your backup strategy with multiple revisions. Keep old copies of your backup or use a backup solution that has versioning built in.

Rule of three

Here is the often cited rule of three (google/duckduckgo ‘hard drive backup rule of three’ sometime):

  • Keep three copies of your data
  • Keep your data in two formats
  • Keep one off-site copy

I’d add one additional rule: backup your repositories. This is a little less critical if you use a distributed version control system, in which case your local copy likely reflects the remote repository. In the interest of paranoia however, I’d back it all up anyway.

Solutions

There are probably thousands of ways one could configure a backup solution, so I won’t detail out how you should solve this problem. However, I’ll list a few common tools that you can use to create your backup strategy.

Local Copies


Remote Copies


Offsite security

If you have security concerns about using a service such as drop box or crash plan to backup your data off-site, consider encrypting your data before moving it off-site. TrueCrypt is an excellent tool that creates encrypted drive volumes – you can then copy the encrypted volume to a remote location. It provides many encryption algorithms and is pretty easy to use.

Review

  • Your hard drive will fail eventually
  • Rule of three: Three copies, two formats, one off-site
  • Don’t just copy, keep old revisions of your files

My hard drive failed last week – it worked hard for me and it will be missed. I bought a new drive and restored my backup and am up and running again. It sure beats the hell out of sticking my broken drive in the freezer.