Backups: An introduction

online data redundancy #

Drives fail more often, than one would expect. Google has done a representative statistic concerning normal hard disk drives. In the FAQ of LUKS I learned, that even SSD's ain't guaranteed to last much longer. Therefore it seems wise to have the data saved twice or more in a production environment, where three drives seem to be the sweet spot in this manner. Even then your data is not safe! The next thunderstorm will come and terrible things are likely to happen if a lightning directly hits a power cable in your street. Lightning safe plugbars or uninterrupted power supply devices cannot avoid damage in all cases and professional grade over voltage protection must also include the whole building gets really expensive. It is very likely to be more economical to turn your devices off and pause your work while being close to the thunderstorm. But there is more: Even a storm on the sun can flip some bits on your hard disk or in your random access memory.

Explanation of incremental and off site backups #

Off site backups can be done on a different partition or drive and they do not necessarily represent the latest state of the data. Naive strategies include copying everything you have from an internal drive to an external one. Next step would be to identify files, which have already been stored on the external drive and only send those which have been added or changed since you last copied them. This can speed up backups dramatically and is called “incremental”. Off site backups are important because a catastrophe could render your hardware completely unusable, e.g. a fire in your house. For that reason you should think twice before leaving your off site backups laying around in the same place, where your data are. To locationally spread your data might be critical especially if you have sensitive data on your drive and you might therefore want to encrypt them.

Disk encryption #

If you have sensitive data you might be interested in encrypting your backup. I have a dedicted article for that.

choosing the right file system #

ZFS offers a feature called “resilience” to address the problem of bit errors and automatically correct them. It can also be used to span data over multiple drives or use multiple drives as mirror of each other, hence containing exactly the same data. ZFS can be shared across a Network and incremental backups on external drives are natively possible if these are also using ZFS.

Hardware considerations #

  • ECC RAM, because that could otherwise end up to be a single point of failure
  • Disk drives certified for 24/7
  • Something to speed encryption up: E.g. a fast CPU if you like my idea to use uncommon encryptions