Automating Backups to the Cloud
Friday, February 08, 2013
Backups are extremely important and I’ve been experimenting with a few different methods. My concerns are always focused
on maintaining data integrity, security, and availability. One of my current methods involves using asymmetric keys for
secure storage and object versioning to ensure backup data can’t undesirably be overwritten.
For encryption and decryption I’m using asymmetric keys via gpg. This way, any server can generate and encrypt
the data, but only administrators who have the private key could actually decrypt the data. Generating the
administrative key looks like:
To actually use the public key on servers, it can be exported and copied…
Then pasted and imported on the machine(s) that will be encrypting data…
And then marked as “ultimately trusted” with the trust command (otherwise it always wants to confirm before using the
In my case, I wanted to regularly send the encrypted backups offsite and S3 seemed like a flexible, effective
storage place. This involved a couple steps:
Create a new S3 bucket (e.g. backup.secret-project.example.com) - this will just hold all the different backup
types and files for the project.
Enable Object Versioning on the S3 bucket - whenever a new backup gets dropped off, previous backups will remain.
This provides for additional security (e.g. a compromised server could not overwrite the backup with an empty file) and
more complex retention policies than Amazon’s Glacier lifecycle rules.
Create a new IAM user (e.g. com-example-secret-project-backup) - the user and it’s Access Key will be responsible
for uploading the backup files to the bucket.
Add a User Policy to the IAM user - the only permission it needs is PutObject for the bucket:
Upload Method - instead of depending on third-party libraries for uploading the backup files, I wanted to try simply
using curl with Amazon S3’s Browser-Based Upload functionality. This involved creating and signing the appropriate
policy via the sample policy builder for a particular backup type. My simple policy looked like:
Putting everything together, a single command could be used to backup the database, compress, encrypt, and upload:
And then to download, decrypt, decompress, and reload the database from an administrative machine:
The only task remaining is creating a cleanup script using the S3 API to monitor the different backup versions and
delete them as they expire.
While it has a bit of overhead to get things set up properly, using gpg makes secure backups trivial and S3 provides
the flexible storage strategy to ensure data is safe.