Redoubt: client specification version 0.1

SUBJECT TO CHANGE WITHOUT NOTICE

Backup Process

The backup process is as follows:

File Listing

Find all files that should be backed up, taking into account exclude and include expressions (stored as part of the client configuration), but not taking into account previous backups or the current state of the server. This list of files is stored temporarily on disk (due to its potentially large size) including file metadata.

[TODO: specification for the listing file]

File Selection

During the selection stage, the client determines the exact work to be done during the backup. This includes:

Take the list of files to be backed up and compare with the client index (the known state of the backup archive, specification below). Files that have been removed since the last backup are marked for deletion, files that changed will be marked for full archival or metadata-only update (subject to the configuration of the client, allowing less important files to be backed up less often).

File Deletion

Next, files to be deleted are removed from the archive (in order to reduce usage on the server before sending new data).

Each deletion request is sent to the storage agent.

File Packaging

For each file, the packaging agent creates a cpio-formatted archive, compresses the archive (unless additional compression would be pointless such as for files that are compressed or contain random data) and encrypts it. It then checks the client index to see if that file already has been assigned a name. If not, a random name is assigned to the file and stored in the client index. Optionally, the agent can create files with random keys that are stored on the server too and used to encrypt files.

[TODO: handling of files that are hard links]

Each encrypted file is sent to the storage agent.

Client Index Updates

As the storage agent confirms each transaction (deletion or package), the state of each file is updated in the client index file. Deleted files are marked as deleted and are only removed from the client index after a configurable amount of time has passed (and the storage agent confirms that it has deleted all versions of the file).

The client index is not saved on the server due to possibly very large size.


Restoration Process

If the client index exists, the client can:

If the client index is lost, the client has several options to offer:

Appendix: client index specification

The client index is stored as a linked list of variable length entries. Each entry contains the entry length, the original file name and its length, the metadata information and its length, and the random file name and its length.

By using variable length entries, it is possible to store long original file names without wasting disk space and also to expand the metadata information or support alternative client operating systems.

The structure of each entry is shown here:

entry lengthunsigned 16-bit integer, little-endian
filename lengthunsigned 16-bit integer, little-endian
filename stringchar array, NUL terminated
metadata lengthunsigned 16-bit integer, little-endian
metadata C structon Unix systems, struct stat
random name lengthunsigned 8-bit integer
random name stringchar array, NUL terminated

Length field values count the terminating NUL characters in the length calculation.

[TODO: allow length of zero for random name length so format can be reused for temporary listing during discovery stage?]

[TODO: sorting of list or use of hash or tree for faster lookups and comparison against temporary listing?]

Appendix: cpio specification

This format matches the standard "newc" or "crc" cpio formats except that the trailer has explicitly been removed to save space.

[cpio archive] = [cpio file] | [cpio archive] + [cpio file]

[cpio file] = [cpio header] + [cpio file name] + [cpio file data]

[cpio header] = All the fields in the header are ISO 646 strings of
		hexadecimal numbers (case does not matter, but
		lowercase is used), left padded with '0' character,
		not NUL terminated.

                field name	bytes	notes
                [c_magic]	6	"070701" for "new" portable format;
					"070702" for CRC format
		[c_ino]		8
		[c_mode]	8	definition below
		[c_uid]		8
		[c_gid]		8
		[c_nlink]	8
		[c_mtime]	8
		[c_filesize]	8	must be 0 for FIFOs and directories
		[c_maj]		8
		[c_min]		8
		[c_rmaj]	8	only valid for devices, 0 otherwise
		[c_rmin]	8	only valid for devices, 0 otherwise
		[c_namesize]	8	count includes terminating NUL
		[c_chksum]	8	0 for "new" portable format;
					for CRC format the sum of all the
					bytes in the file

[cpio file name] = [file name] + '\0' + [pad 4]

[cpio file data] = [file data] + [pad 4]

[pad 4] = 0-3 bytes to end field on 4-byte boundary

[c_mode] = c_mode bits defined as follows (in hexadecimal and
           traditional octal)

/* st_mode bits */
#define C_ISSOCK	0000c000	00000140000
#define C_ISLNK		0000a000	00000120000
#define C_ISCTG         00009000	00000110000
#define C_ISREG         00008000	00000100000
#define C_ISBLK         00006000	00000060000
#define C_ISDIR         00004000	00000040000
#define C_ISCHR         00002000	00000020000
#define C_ISFIFO        00001000	00000010000
#define C_ISUID         00000800	00000004000
#define C_ISGID         00000400	00000002000
#define C_ISVTX         00000200	00000001000
#define C_IRUSR         00000100	00000000400
#define C_IWUSR         00000080	00000000200
#define C_IXUSR         00000040	00000000100
#define C_IRGRP         00000020	00000000040
#define C_IWGRP         00000010	00000000020
#define C_IXGRP         00000008	00000000010
#define C_IROTH         00000004	00000000004
#define C_IWOTH         00000002	00000000002
#define C_IXOTH         00000001	00000000001