'File System Store'; draft, to be re-read
Condensation objects and accounts can be stored in a folder on any file system. The present document describes structure and operations of a Condensation store as a folder on a file system.
For the remainder of this text, we assume that the base folder is called base-folder.
Objects are stored as files named
where H* are the lowercase hex digits of the object's hash. The first two digits are used as sub folder, while the remaining 62 digits denote the file name. An example of an object file name is:
To get an object, simply read the corresponding file.
To add an object, create the destination folder (base-folder/objects/HH) if necessary, and write the contents as a temporary file within this folder. Then rename that file to its final name.
On all major operating systems, renaming a file in the same folder is atomic. If the object exists already (and its contents match the expected SHA-256 sum), no new file needs to be written, but the existing file must be touched to set its modification date to now.
To book an object, touch the corresponding file (i.e. set its modification date to now).
Accounts and boxes
Each box is a folder named
where ACCOUNT are 64 hex digits and BOX is either in-queue, private, or public. Each hash within a box is an empty file named after the hash (64 lowercase hex digits).
A store with one account may look as follows:
/srv/condensation accounts eae220..c6 in-queue 465545..da 543c50..1a private public 29767d..da
To list a box, enumerate the files of the corresponding folder, and report all file names that consist of exactly 64 hex digits.
To add an envelope, put the object onto the store, and create a hash file in the corresponding box folder.
To remove an envelope from a box, simply delete the corresponding hash file from the box folder. Return success irrespective of whether deletion succeeded or not. The envelope remains on the store until garbage is collected.
Path length considerations
A Condensation object path is always 74 ASCII characters long, while a box entry requires up to 165 ASCII characters (in-queue box).
On Windows, it is recommended to use the "\\?\" prefix, as regular paths are limited to about 256 characters.
On 8.3 type file systems, the present protocol cannot be used.
Recognizing Condensation folders
A folder containing the sub folders objects and accounts (written with lowercase characters) is considered a Condensation store folder. Note that other files and folders may be present as well.
POSIX permissions (private store)
A store used by POSIX user U should use the following permissions and ownership:
|Object folders||0711 (rwx, ––x, ––x)||User U|
|Object files||0644 (rw–, r––, r––)||User U|
|Account folders||0711 (rwx, ––x, ––x)||User U|
|In-queue box folder||0700 (rwx, –––, –––)||User U|
|Private box folder||0700 (rwx, –––, –––)||User U|
|Public box folder||0755 (rwx, r–x, r–x)||User U|
|In-queue box files||0600 (rw–, –––, –––)||User U|
|Private box files||0600 (rw–, –––, –––)||User U|
|Public box files||0644 (rw–, r––, r––)||User U|
In general, everything belongs to user U. To share objects with other people, the object store must be publicly readable.
Note that it is not possible to receive messages from other people through a private store, as they cannot post envelopes. Hence, the in-queue box can remain private.
POSIX permissions (shared store)
To share a store among multiple users, add all users to a group G, and use the following permission scheme:
|Object folders||0771 (rwx, rwx, ––x)||Any user, group G|
|Object files||0664 (rw–, rw–, r––)||Any user, group G|
|Account folders||0771 (rwx, rwx, ––x)||Any user, group G|
|In-queue box folder||0770 (rwx, rwx, –––)||Any user, group G|
|Private box folder||0770 (rwx, rwx, –––)||Any user, group G|
|Public box folder||0775 (rwx, rwx, r–x)||Any user, group G|
|In-queue box files||0660 (rw–, rw–, –––)||Any user, group G|
|Private box files||0660 (rw–, rw–, –––)||Any user, group G|
|Public box files||0664 (rw–, rw–, r––)||Any user, group G|
Shared folder stores allow users to communicate within the group.
Users must minimally trust each other. They cannot read or modify each others data (beyond of what they share with each other), but can delete each others accounts and objects.
To thwart against private data deletion, users may use their private store to store private data, and a shared store for communication only. An actor thereby announces itself on both the private and the shared store. Messages are sent and read through the shared store, while private data is stored on the private store:
|Private store||Shared store|
Centralized garbage collection through tree traversal
Garbage collection can be performed by an external program (or by any user) when no user is actively writing to the store. For that, start with the boxes, and follow the objects down the tree. Keep a list of seen objects, and delete all other objects once all trees have been traversed.
Since every object needs to be opened to read the header, this procedure can take a few seconds for a store with several thousand objects.
Conceptually, this is a centralized strategy. The garbage collector must be able to follow all trees. If an intermediate node is missing, that whole subtree will be deleted, since the garbage collector is unable to traverse these nodes.
Client-driven garbage collection through stages
With client-driven garbage collection, stage folders are created. Each stage folder is named after the creation date (UTC timestamp in ISO 8602 format), and contains a Condensation store folder (i.e., the objects and accounts folders as mentioned above). As an example, consider the following store with two stage folders:
base-folder 20140610T174211Z objects accounts 20140712T112958Z objects accounts
- writes new objects to the most recent stage
- moves all his objects from older stages to the most recent stage, and then moves his account to the newest stage
- looks up objects in all stages (get object)
- deletes old stages that do not contain any accounts
- creates a new stage if the newest stage is more than 30 days old
This garbage collection scheme works in a completely distributed setting, and is fault-tolerant. It requires cooperation of all users, however. Should a user not connect for a prolonged amount of time, he/she will block deletion of a stage.