Features

While mostly correct, this section is a bit dry (no pictures, just linear text).

Conflict-free synchronization

Condensation is built around a conflict-free merge function (semilattice join). Any two states of the data can thereby be merged in a way that no merge conflicts occur.

Unlike distributed revision control systems, Condensation does not need to keep the complete history of all transactions. The mathematical properties of the merge function guarantee that every transaction is applied exactly once, no matter in which order they are merged.

Furthermore, the merge function is transaction-aware. Changes applied atomically are merged atomically.

Data synchronization is efficient. Only changes are physically transported over the network, and these changes can be determined efficiently.

Distributed

Unlike most file systems and databases, Condensation is a distributed data system. Condensation clients are autonomous. There is no central server.

If two clients are connected to each other, they can share and synchronize data with each other, even if neither of them is connected to the internet.

Federated

Every user or organization can set up their own Condensation system, and interoperate with other users and organizations.

End-to-end secure

Condensation uses SHA 256 to verify data integrity, AES 256 to encrypt data, and RSA 2048 to encrypt and sign envelopes.

Data is encrypted and decrypted inside the application, and therefore safe even if a network or storage system is compromised. Neither an infected hard disk firmware nor a malicious storage system administrator get to see (or modify) the data.

Zero-configuration, no passwords

Apart from selecting a store, Condensation does not require any configuration, and is therefore easy to set up.

Users identify themselves with their RSA 2048 keys. While this key may be protected with a password when stored on the device, the Condensation system itself does not use any passwords.

Flexible storage

The main structure of a Condensation store is a hash table with immutable key-value pairs. This can be implemented on virtually all storage systems – from a raw disk partition to a cloud storage service.

A simple Condensation store can be as short as 200 lines of code.

System scalability

The distributed and federated nature of Condensation makes it highly scalable. There is no intrinsic bottleneck, such as a central server, in the system.

Store scalability

A hash table with immutable entries – the primary data structure of stores – may span thousands of disks with little overhead, and excellent performance. Hence, a single store can be arbitrarily large.

In addition, a set of Condensation stores can be combined to a single Condensation store.

Lock-free transactions

Transactions are carried out in space rather than in time. No locking is required, and as a consequence, no deadlocks or livelocks can occur.

Versioning

Versioning (or keeping different snapshots or branches of the data) is intrinsic to the protocol, and incurs no additional overhead.

Passive deduplication

If two data objects are exactly the same, they are stored only once on the physical storage system.

However, if the same data is encrypted with two different encryption keys, the resulting data objects will be different, and therefore stored separately. Since the store does not know the encryption keys, it cannot possibly know that these objects contain the same data.

Data objects can be shared without re-encrypting them. Hence, deduplication works most of the time within a group of people sharing data.

Encryption instead of access rights management

Condensation users and services share data through encrypted message-passing rather than access rules. This is easier, and less error-prone.

One protocol for storage and messaging

On most computer systems, storage (file system, database) and communication (e-mail, chat, ..., synchronization protocols) systems are separated. Condensation combines data storage and communication in one simple and efficient protocol.

Simplicity

A fully operational Condensation system consists of barely 20'000 lines of code. Only 5'000 to 10'000 lines thereof are relevant to data security. Errors outside of this core code may have nasty effects, but will not compromise data security.

Such a small footprint reduces the probability of implementation errors, and facilitates code audit.

Free and open

Condensation is an open protocol. Source code and documentation are freely available.

Compliance

Condensation can be configured to comply with various data policies. The data of employees may automatically be shared with a trusted third party, for instance, so that it can be accessed in the case of a legal investigation.