A Condensation object is a byte sequence with the following structure:
H is a 4-byte big-endian integer denoting the number of items in the hash list. Each of these items is a 32-byte hash pointing to another object. Through the hash list, objects may span a tree.
The data part is a byte sequence. It often holds a record, but can carry any type of data.
Integrity and atomicity
Objects are identified by their SHA-256 hash:
That hash can also be used to verify the object's integrity after transmission, or retrieval from a store.
Although some protocols allow partial downloads (e.g., HTTP range requests), the integrity can can only be checked if the whole object is available. Objects are simply not meant to be divided further. Large amounts of data should be structured as a tree of medium-sized objects.
In practice, objects are often between 1 KiB to 100 MiB in size.
The optimal object size depends on how the data is accessed. In general, objects should be smaller the more often the underlying data is modified. In addition, data should be split into several objects if only small parts are needed at a time. This avoids downloading a large object to extract a small piece of information.
On the other hand, each object incurs a storage overhead of about 100 bytes, and a similar transmission overhead.
The size of an object is not inherently limited. While the hash list may contain at most 2^32 hashes, the data section may be arbitrarily long.
Object stores however impose practical limits, often somewhere between 10 GiB and 1 TiB. The maximum object size is limited either by the remaining space, or some parameter intrinsic to the storage system. Some stores also keep objects fully in memory during the transfer, and may therefore be limited by the available memory. Object stores may even refuse exceedingly large objects.
It is safe to assume that an object store can handle objects up to 1 GiB in size.
The majority of objects are encrypted:
The data section is thereby encrypted using AES 256 in CTR mode with a random 256-bit key. The CTR counter starts at 0, and is incremented by 1 for each AES block (16 bytes). The last block may be truncated. No padding is applied.
The hash list remains unencrypted. This allows stores to know which objects are still in use.
The object's hash is calculated over the encrypted object. Hence, the integrity of an object can be verified without decrypting it.
Within a tree, an object's encryption key is stored in its parent object.