Backup Overview
There are two kinds of people in the world - those who have had a hard drive failure, and those who will. Make sure this common occurrence does not lead to the loss of your image collection.
What's a backup?
Primary vs. backups
The 3-2-1 Rule
Threats
Lifecycle
Disaster recovery backups vs. rolling backups
Propagating corruption
Backup software
Where do the backups live?
Write-once backups
Backup vs. fault-tolerant storage
Encrypted backups
Data validation
Restoration
What's a backup?
The purpose of a backup is to make sure that your digital data can survive any of the hazards that await. In principle, this is a straightforward process. Copy all of your files to some other device(s), keep the backup somewhere safe, and use it to restore the data in the event of a problem. If you are a one-computer user and everything you want to preserve can fit on one hard drive, it can be nearly as simple as this.
For many of the readers, however, things are not so simple. The images and footage you want to back up may not be on one computer, much less on one hard drive. You probably have multiple versions of the images. For video projects you probably have many additional project files and assets (such as graphics and music). Which ones do you keep and how do you keep that straight? How do you update backups as you work on files? How do you validate the backups so you can have certainty that the archive can be properly restored in the event of a problem?
Let's outline the tools used in backups to see how we can put it all together safely and efficiently.
Primary vs. Backups
It may sound obvious, but you can't create a good backup strategy until you know what you're backing up. Therefore you need to designate a primary copy of the data before you create backups. If there is no primary copy, your backup system will always feel like a mess.
At each stage of an image's lifecycle, you need to know which is the primary copy of the data.
The 3-2-1 Rule
The simplest way to remember how to back up your images safely is to use the 3-2-1 rule.
- We recommend keeping 3 copies of any important file (a primary and two backups)
- We recommend having the files on 2 different media types (such as hard drive and optical media), to protect against different types of hazards.*
- 1 copy should be stored offsite (or at least offline).
*While 3-2-1 storage is the ideal arrangement, it's not always possible. A second media type, for instance, is impractical for many people in the ingestion or working file stage. In these cases, many people make do with hard-drive-only copies of their data. Best practices, however, still require 3 copies and some physical separation between the copies.
Threats
In order to design a backup system that works for you, it’s important to understand the kinds of problems that can lead to data loss. Let’s take a look at some of the dangers that threaten your data’s well being and examine some solutions.
Device failure
Any digital storage device can fail. Hard drives fail all the time, and even a multi-drive device can fall off a table and be destroyed. In order to provide real backup, a backup copy of the data needs to be on a separate device, such as an external drive or different media like optical disc.
Viruses
Viruses can propagate silently from one storage device to another, and then strike to destroy data. All rewritable data is potentially vulnerable to viruses (even on Macintosh), so any hard drive data is at risk. Write-once storage, like Optical disc provides the best protection against virus.
Malicious damage
Your archive can be exposed to other malicious damage, either from anonymous hackers or perhaps from people targeting you personally. Any computer that is online is theoretically vulnerable to hackers, although an enterprise-level firewall can offer lots of protection. The best protection is offline, and preferably offsite, storage of backups, as well as write-once media storage.
Volume and Directory glitches
The Volume and Directory information on your storage media are a map of where the files are stored, as well as a table of contents. If these get corrupted, then the computer may not be able to find the files on a drive. Aside from basic maintenance of your file system, the best protection is the use of write-once media.
Transfer corruption
Any time data is transferred from one device to another, there is some possibility of corruption. This can be because of problems with the RAM, drive, connectors, bridgeboard, network, or cables. The best protection against transfer corruption is to transfer files with a utility that performs a validated transfer. Use of write-once media can also help to prevent transfer corruption (after the initial creation of the disc).
Read more about validated transfers
Lightning strike/Voltage surge
Excess voltage from a lightning strike or a blown power company transformer can fry your computer in a heartbeat. A surge protector might protect your computer from damage caused by this excess voltage, but provides no real guarantee of protection. The best protection is provided by the use of off-site, or at least off-line backups.
Theft
While video and photo professionals have always been exposed to theft, that hazard rarely extended to our footage itself. Since our pictures are now stored on expensive devices, they are now at risk. Protection against theft includes security measures such as an alarms or a safe, but is best accomplished with offsite storage.
Fire or water damage
Like film archives, digital images can be destroyed by fire or water damage. But unlike a film archive, it’s possible to make a complete offsite duplicate of your digital archive for very little money, and thus to be fully protected.
Human error
One of the most common causes of data loss is simple human error. You can accidentally throw away or unintentionally modify files in some undesirable way (such as downsampling). Protection here is a little more complex, particularly for working files, since they generally can't be protected with write-once backups.
Off-line backups that don't get updated immediately are a valuable part of protection against human error.
Lifecycle
In the workflow section of this website, we describe 5 different phases of Lifecycle: Capture, Ingestion, Working, Publish, and Archive. One of the primary differences between these various stages is the handling of the backups. Let's take a quick look.
Capture
In many cases, there is no backup possible in the capture phase. Some professional level cameras will accept a second media card which can provide backup for the failure of the card itself. Tethered still photo shoots, where image files are automatically transferred to a computer, may or may not allow for a backup depending on the feature set of the capture software.
Read more in tethered capture
Ingestion
When the images or footage are first downloaded from the camera, backups should be created automatically. Data should not be erased from media cards until they have been inspected for visual integrity. If you copy to the backup first, then to the primary location, a single visual inspection will confirm the integrity of both.
With Video footage, this can be particularly time-consuming, since you’ll want to watch all the footage at normal speed to ensure that everything has transferred properly. A calculated risk is to perform a spot check of the footage to look for potential problems. You can also use a verified copy method like Carbon Copy Cloner, SuperDuper, or Shotput Pro to ensure all data is written.
Figure 2 shows a very good ingestion backup system. Files are copied first to a backup drive, then to the primary location and another backup. A single visual inspection of the primary version also confirms the integrity of the first backup version. |
Working
Working files present a special problem for backup. It’s more difficult to protect them because they are in a state of change. In particular, this makes it tough to maintain updated off-site backups.
As images or footage is being optimized, backups should be made and updated automatically. We suggest that you have a backup storage device connected to the imaging workstation for daily automated mirror backup. We also suggest that you have an offline backup system, that can be updated periodically, such as at the end of each day or after each important download. We suggest using swapper drives as part of your working file backups
Figure 3 shows a system for backing up working files. The dark blue drive is the primary copy of the data, and the backup drives are in light blue. The drive on the left is the automatic backup that is periodically updated by backup software. The Swapper drives on the right are connected at the end of a day's work and files are added or updated. They are rotated offsite to further protect works in progress. |
Archive
Once images have been put into their long-term home, they should get a full 3-2-1 backup.
Figure 4 shows the permanent home of the image archive. An onsite primary archive is protected with an offsite copy of the images, as well as a write-once or tape backup. |
Disaster recovery backups vs. rolling backups
Another useful concept in backup design is the distinction between disaster recovery and rolling backups. Some of the threats outlined above have the capacity of wiping out your entire media collection. Some of the threats might only affect the work done to recent images. It's possible to protect for both, but that's hard to do with a single backup device. We suggest that you look at this as two problems.
Disaster recovery backups
Disaster recovery backups typically require a degree of physical separation between the copies. This part is pretty obvious, since fire, flood or theft can destroy everything in a single building. Disaster recovery backups also require some separation in time, and in handling. A backup that is immediately updated upon any change will fail to protect against hazards like virus, volume or directory corruption, media degradation, or human error.
Rolling backup
The gap in time between updates of a disaster recovery backup leaves the collection exposed. This gap must be filled by a rolling backup: one that provides automatic updates to the backup whenever data is added or altered. This can protect today's or this week's editing work by transferring it to a second device.
The solutions we present in this website show how you can set up systems that offer both disaster recovery and rolling backup protection.
Propagating corruption
Backup systems are supposed to protect file integrity, but sometimes they are set up in ways that miss an important hazard. If your system only employs rolling backup, it runs the risk of replacing a valid backup of a file with a damaged copy. If the primary version of the file is damaged by media failure, directory corruption, transfer error, virus, or human error, then the act of updating the backup will damage the backup, and the file may be lost.
There are four basic ways of dealing with this issue:
Write-once media
Since write-once media can't be rewritten, it protects against any corruption that happens to the file at a later date. For a video workflow, we recommend backing up the entire card to a write once media source after first transfer.
Incremental backup
Some backup systems offer the ability to backup a set of files, and then add new copies of any changed file, while keeping the older version as well. Most digital tape backups work this way, and some hard drive backup software, such as Retrospect, Super Duper, and Carbon Copy Cloner offer this option as well.
For photography, incremental backup can be a challenge since the data set is so large. You may need your backup media to be more than ten times larger than the primary archive in order to accomplish incremental backup, which may make this system financially or technically unworkable.
For video, this method is more practical. With digital video you are rarely modifying the source media, only the project file which contains instructions. Chances are you will continue to add additional files like music, sound effects, graphics, and animation, but these files are not very large when compared to the original footage.
Additive Backup
You can also structure part of your system to never update a copy of the backup files once they have been written.
For photography, this is most appropriate for a raw file archive, and can protect against most propagated corruption except virus. If you are not using a second media type, we strongly recommend the use of an additive backup for one of your copies, in order to provide disaster recovery protection.
For video, this is most appropriate for a disk image based archive where you are creating cloned copies of each memory card after a shoot. If you are archiving your graphic and project files separately, we strongly recommend the use of an additive backup for one of your copies, in order to provide disaster recovery protection for your video footage.
Read more in the mirror section
Validate before updating
You can protect against propagating many errors if you can be sure that the primary version of the files is in good condition before updating the backups. Unfortunately, this is not an easy thing to do at the moment. Check out the Data Validation sections for more on that process.
Read more in the Data Validation section
Backup software
To keep your images safe, you'll need to run the appropriate backup software to manage the process. Backup software might be included with your operating system, but most of this is not geared to the problems encountered by photo and video professionals. There are several different types of tasks that backup software can perform. Some programs can do nearly all types of backup, and some types only do one or two. Here are the categories:
Basic mirror
The simplest kind of backup, basic mirror software can create a duplicate of the primary copy in a new location. There are a number of different ways that mirrors can be implemented.
Compressed mirror
In a compressed mirror backup, the files are copied to a new location, and then are compressed into a single gigantic file.This method is not very practical for video files as there is not much space savings due to compression. It also requires that you restore the entire larger mirror file before you can recover a specific file.
Mirror plus incremental backup
In this method, an original copy of the data is updated by remembering changes to files. This can let the user "roll back" to different versions of the data.
Bootable clone
This mirror backup contains all the invisible configuration files necessary to let the computer boot up from the copy. Some software creates a compressed bootable copy of the drive that must first be extracted to a new drive before being used.
Read more about mirror configurations
Where does the backup media live?
One of the main ways backups can provide protection is to be physically separated from the primary copy of the data. If every copy of the files were in the same enclosure, then the loss of that enclosure would lose all the data. Let's look at the various configurations you could use for storage.
Same-Enclosure Storage
A backup drive could live in the same enclosure as the primary copy of the data. Usually this is part of a RAID1 system, or a periodic backup to an additional internal drive. While it can offer the easiest way to provide a rolling backup, the protection is less than optimal, since a single event can destroy both copies.
Attached storage
You can get additional protection by housing your backups in come kind of external attached storage, such as an external USB, eSATA, Firewire or Thunderbolt drive, or a backup drive available on a network. It's nearly as convenient as same-enclosure backups, but has more protection because it's inside a different box. Ideally, these drives would be plugged into a different surge protector than the computer.
Offline storage
A drive that is unplugged from the power and connection cables is called Offline storage. Offline storage provides excellent protection for many types of hazards, but the backup copy may not reflect the current state of the files because it's not being updated constantly. Be sure to plugin and let offline storage spin up for at least 10 minutes every 6 moths to prevent a corruption in data due a loss of charge in the drive’s magnetic elements.
Offsite storage
The gold standard for disaster-recovery storage is off-site backup, since it can provide protection even if there is a total loss of the primary data. For photographic and digital video collections, this is generally accomplished by physically carrying the storage media to a different location. It is possible to backup some or all data with the Internet, but this will be impractical for many readers.
Swappers
We use the term Swappers to refer to a backup arrangement where a pair (or more) of backup drives rotates off-line or off-site for added protection.
Internet backups
Backup to commercial services like Amazon S3 can provide reliable offsite storage, but it can be expensive for large collections. Additional services like CrashPlan allow the user to utilize a mixture of online and local storage options. Internet backup can also be very slow to use, and generally does not offer data validation. For most professionals, Internet backups are generally not feasible for the entire archive - perhaps only for some of the best images or footage in the collection.
Do It Yourself (DIY) Collocation
In this arrangement, you place some internet-connected storage hardware offsite, and update them periodically. It can provide automatic off-site storage at a reasonable price, if you can navigate the technical issues. Devices like a DroboPro FS make this process easier.
Write-once backups
Write-once backups—also called write-once, read many (WORM) backups—refer to backups stored on media that is not updated. Write-once backups are an important part of any backup plan because they protect against any kind of error or corruption that is introduced to the data after original archiving. They are an essential way to protect against viruses, directory corruption, and accidental erasure. The most cost-effective way to get write-once protection is to use optical disks that are not rewritable. You can also configure LTO (Linear Tape-Open) tape to be write-once only. It’s even possible to make hard drives read-only; this will not remove the write capability from the drive, but creates an instruction to prevent overwriting existing data.
In most of computing, write-once media is considered disaster-recovery backup, as opposed to a general-purpose backup since it can’t be updated. In a parametric imaging or video editing environment, it’s better than that. For photography, all parametric changes to the media can be saved out as metadata. For video most of the changes to the media are actually contained within the project file created by the video editing software. This separation of source media from the editing instructions makes write-once backup very attractive for media backup.
We recommend the use of write-once backups for disaster recovery.
Backup vs. fault-tolerant storage
People sometimes confuse backup with fault-tolerant storage. A backup is a separate copy of a file, written to a different device. It should be able to survive even when the primary fails. Some storage, such as RAID devices, looks like backup at first blush, but they are really better described as fault-tolerant storage. Fault-tolerant storage has some internal protections against failure, but it does not provide enough protection to be considered a backup.
A drive spanning device such as a RAID 5 or Drobo can preserve the data in the event a single drive malfunctions, but the device itself is still a single unit, and has nearly all the vulnerabilities that a single copy of the data has.
Don't confuse fault-tolerance with true backup.
Encrypted backups
If you would like to make the data unreadable to others, it's possible to encrypt entire drives or optical disks if you would like to make the data unreadable to others. This function is primarily used for other kinds of data backup, such as financial information or other sensitive communications, although certain kinds of photographs might be sensitive enough to warrant encryption. This would likely be most important for offsite storage where the data is not under your direct control.
Data validation
Just because a file shows up in the Finder or the Windows Explorer is no guarantee that the file has integrity. The file could have become corrupted, or it might not even be there at all. If you want to be sure that your files are safely backed up, you should implement some kind of data validation practices.
Read more about Data Validation
Restoration
The object of making backups is to restore the files in the event of a problem. If you have a whole-drive mirror, perhaps all you need to do is swap drives to get back up and running. If your computer system is more complex than that, the restoration might be a more complicated process. There is a saying among IT professionals that a backup is not a backup until you have done a trial restoration. It's not uncommon for some part of the backup system to set up improperly. You don't want to find this out after it's too late to be corrected.
Relinking Media
Video editing software and Parametric Image Editors like Aperture and Lightroom have separate the image edits from the media itself. If you need to restore your files from backup, it’s possible that you will need to relink the editing software to the media.
This is generally much easier if the filenames and folder structure are identical on the primary and backup copies. Like the other facets of restoration, make sure you know how this works before you actually need it. Read more about Restoration
Up to Backup
On to Backup Types