Data integrity draft
Object trees are Merkle hash trees: each object includes the SHA-256 hashes of all children.
The integrity of such trees can be checked by verifying the SHA-256 hash of all objects, usually starting with the root object. This is often done automatically, after loading objects from a store, but can also be done manually.
Hence, the integrity of a whole tree can be verified knowing only the hash of the root object (tree hash). Conversely, modifying any data in the tree will almost certainly change its root hash.
Manual integrity check
An object's integrity can be checked using command line tools like curl, wget, shasum, sha256sum, or openssl sha256:
>>> curl http://condensation.io/objects/b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9 > object ... >>> shasum -a 256 object b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9 object
If the SHA-256 sum (red) matches the object hash (violet), the object is valid.
To verify a whole tree, repeat the above procedure for all objects of the tree.
Periodic integrity checks
A store may asynchronously check the integrity of the user's data – e.g. as part of garbage collection – and notify the user about data loss if necessary.
Error causes
A wrong object hash could have one of the following causes:
- Storage error: the object bytes read from the storage system are not the same as those written earlier.
- Transmission error: the object bytes received are not the same as those sent.
- Malicious modification: an attacker has modified objects on the storage system or during transmission.
To thwart attacks, the whole subtree below a faulty object must be considered faulty, and discarded.
Security considerations
The SHA-256 algorithm is believed to have a collision resistance of about 248 bits against attacks, and a false positive rate of 2‒256 with random errors.
Root hashes are stored in envelopes, and signed. RSA 2048 signatures provide about 120 to 150 bits of security.