SUBJECT TO CHANGE WITHOUT NOTICE
The backup process is as follows:
Find all files that should be backed up, taking into account exclude and include expressions (stored as part of the client configuration), but not taking into account previous backups or the current state of the server. This list of files is stored temporarily on disk (due to its potentially large size) including file metadata.
[TODO: specification for the listing file]
During the selection stage, the client determines the exact work to be done during the backup. This includes:
Take the list of files to be backed up and compare with the client index (the known state of the backup archive, specification below). Files that have been removed since the last backup are marked for deletion, files that changed will be marked for full archival or metadata-only update (subject to the configuration of the client, allowing less important files to be backed up less often).
Next, files to be deleted are removed from the archive (in order to reduce usage on the server before sending new data).
Each deletion request is sent to the storage agent.
For each file, the packaging agent creates a cpio-formatted archive, compresses the archive (unless additional compression would be pointless such as for files that are compressed or contain random data) and encrypts it. It then checks the client index to see if that file already has been assigned a name. If not, a random name is assigned to the file and stored in the client index. Optionally, the agent can create files with random keys that are stored on the server too and used to encrypt files.
[TODO: handling of files that are hard links]
Each encrypted file is sent to the storage agent.
As the storage agent confirms each transaction (deletion or package), the state of each file is updated in the client index file. Deleted files are marked as deleted and are only removed from the client index after a configurable amount of time has passed (and the storage agent confirms that it has deleted all versions of the file).
The client index is not saved on the server due to possibly very large size.
If the client index exists, the client can:
The client index is stored as a linked list of variable length entries. Each entry contains the entry length, the original file name and its length, the metadata information and its length, and the random file name and its length.
By using variable length entries, it is possible to store long original file names without wasting disk space and also to expand the metadata information or support alternative client operating systems.
The structure of each entry is shown here:
entry length | unsigned 16-bit integer, little-endian |
filename length | unsigned 16-bit integer, little-endian |
filename string | char array, NUL terminated |
metadata length | unsigned 16-bit integer, little-endian |
metadata C struct | on Unix systems, struct stat |
random name length | unsigned 8-bit integer |
random name string | char array, NUL terminated |
Length field values count the terminating NUL characters in the length calculation.
[TODO: allow length of zero for random name length so format can be reused for temporary listing during discovery stage?]
[TODO: sorting of list or use of hash or tree for faster lookups and comparison against temporary listing?]
This format matches the standard "newc" or "crc" cpio formats except that the trailer has explicitly been removed to save space.
[cpio archive] = [cpio file] | [cpio archive] + [cpio file] [cpio file] = [cpio header] + [cpio file name] + [cpio file data] [cpio header] = All the fields in the header are ISO 646 strings of hexadecimal numbers (case does not matter, but lowercase is used), left padded with '0' character, not NUL terminated. field name bytes notes [c_magic] 6 "070701" for "new" portable format; "070702" for CRC format [c_ino] 8 [c_mode] 8 definition below [c_uid] 8 [c_gid] 8 [c_nlink] 8 [c_mtime] 8 [c_filesize] 8 must be 0 for FIFOs and directories [c_maj] 8 [c_min] 8 [c_rmaj] 8 only valid for devices, 0 otherwise [c_rmin] 8 only valid for devices, 0 otherwise [c_namesize] 8 count includes terminating NUL [c_chksum] 8 0 for "new" portable format; for CRC format the sum of all the bytes in the file [cpio file name] = [file name] + '\0' + [pad 4] [cpio file data] = [file data] + [pad 4] [pad 4] = 0-3 bytes to end field on 4-byte boundary [c_mode] = c_mode bits defined as follows (in hexadecimal and traditional octal) /* st_mode bits */ #define C_ISSOCK 0000c000 00000140000 #define C_ISLNK 0000a000 00000120000 #define C_ISCTG 00009000 00000110000 #define C_ISREG 00008000 00000100000 #define C_ISBLK 00006000 00000060000 #define C_ISDIR 00004000 00000040000 #define C_ISCHR 00002000 00000020000 #define C_ISFIFO 00001000 00000010000 #define C_ISUID 00000800 00000004000 #define C_ISGID 00000400 00000002000 #define C_ISVTX 00000200 00000001000 #define C_IRUSR 00000100 00000000400 #define C_IWUSR 00000080 00000000200 #define C_IXUSR 00000040 00000000100 #define C_IRGRP 00000020 00000000040 #define C_IWGRP 00000010 00000000020 #define C_IXGRP 00000008 00000000010 #define C_IROTH 00000004 00000000004 #define C_IWOTH 00000002 00000000002 #define C_IXOTH 00000001 00000000001