Author: | Martin Blais <blais@furius.ca> |
---|---|
Date: | 2005-07-23 |
Abstract
A solution for my offsite backups solution. This is an internal design document and may not be up-to-date. These are just notes I took in the process of implementing that project.
We want:
Some definitions:
- Client
- The machine that needs to be backed up.
- Server
- The machine where backed up data archives are to be sent.
A file with the lists of files that were last backed up, including a CRC for each file (e.g. an MD5 sum) is kept and maintained on the client. If the file is not available, a full backup should be run.
We assume that we have a list of backup archives, whose union contain at least one copy of the files in file list stored in a backup archive.
Another idea for restore would be to untar all the archives in order over each other and then to remove the files that are not supposed to be present. This might actually be faster than the lookup I was planning to do.
There are two options for the restore algorithm to sort the files by date:
Decision: We archive with information both in the filename and in the mtime of the history file contained therein. The restore script by default will rely on the filename to determine its timestamp, but optionally will be able to look into the history file in case the filenames have been munged somehow.
Decision: full backups should not be named specially. The restore script will be able to list which archives will be required to extract the list of files from the contained history file.
Check if .tar.bz2 can contain empty directories.
Answer: Yes! And it also stores the permissions of directories ONLY if the directories are stored as entries of their own. Otherwise, the permissions are not kept on the parent directories.
So you must store entries for the directories as you go if you want to recover the entire permissions.
Can empty directories be created with Python's tarfile module? Can I store permissions too? If so, then we should store the list of directories in the history file, to make sure that we can recreate them exactly the same, with the same permissions as well.
Yes! Extracting a directory file extracts the directory with all its permissions.
- We could use the time specified on the arniehistory file inside the archive instead of the string that is embedded within the archive filename.
Dates between machines could be different, so choosing a date that is stored on the source machine seems appropriate and important.
Implement that date/time embedded within the filename (default) or the date/time stored as the mtime of the contained history file (slower, have to open all the archives to find out the date/times)
Attributes not supported: st_uid st_gid st_atime st_mtime st_ctime
Add support for some of those.