Concepts Specifications API Downloads
ConceptsForward-moving data structures

Forward-moving data structures

Technically, any type of data can be stored in a Condensation tree. However, conflict-free synchronization is possible only with forward-moving data structures.

With forward-moving data structures, data always moves forwards when applying changes, according to some defined order. For example, we could compare different versions of a documents by their last modification time, and only keep the newest version:

May 6 May 17 May 19

Whenever we modify the document, we move forwards to a new modification date. We can never move backwards to an old date. Note that we can still move to an old version of the document if we make it look newer by assigning a newer modification date to it.

While keeping only the most recent version of a document is forward-moving, there are much more powerful forward-moving structures.

Formal definition

A forward-moving data structure with a merge function ⊗ and a comparison function ≤ has the following properties:

Commutativity ensures that two versions of the data can be merged in either way, while the associativity rule says that different versions can be merged in any order. That's important in a distributed setting where no global total order is imposed.

Itempotency ensures that merging a version with itself does not change anything. This means that we can merge the same version as often as we want, and do not need to keep track about what has been merged, and what not:

A ⊗ B ⊗ B ⊗ ... ⊗ B = A ⊗ B

This also ensures that two versions with a common past are merged correctly:

A B C D ΔB ΔC
D = B ⊗ C
  = (A ⊗ ΔB) ⊗ (A ⊗ ΔC)
  = A ⊗ A ⊗ ΔB ⊗ ΔC
  = A ⊗ ΔB ⊗ ΔC

The order ensures that merging is forward-moving with respect to the defined order, and can also be written as follows:

A ⊗ B = C   ⇔   A ≤ C and B ≤ C

Mathematical background

Forward-mergeability is closely related to the mathematical concept of semilattices. A forward-moving data structure forms a join-semilattice, and any version of the data is an element of the semilattice (set). Merging two versions corresponds to the join operation of the semilattice.

Maximum

Among the simplest forward-moving structures are numbers merged by taking their maximum:

A = 3
B = 15
A ⊗ B = max(3, 15) = 15

It is obvious that the result can never be smaller than any input value. Hence, merging never moves the state — here expressed as a number — backwards.

Boolean OR

Similarly, boolean values are forward-moving under the OR function and the order [false → true]:

A =               false  true   false  true
B =               false  false  true   true
A ⊗ B = A or B =  false  true   true   true

In fact, OR-merging can be mapped to maximum-merging by mapping false to the number 0, and true to 1.

Similarly, AND is forward-moving under the order [true → false]. XOR, however, is not trivially forward-moving.

Set union

Sets are forward-moving under union and set size.

A = {apple, banana}
B = {banana, lemon}
A ⊗ B = {apple, banana} ∪ {banana, lemon} = {apple, banana, lemon}

When unioning sets, the result necessarily has at least as many elements as any input set. Thus, we can never move backwards to a set with fewer elements.

Maximum revisited

Maximum-merge can be mapped to union-merge by putting all numbers into a set, and considering the highest number only:

A = {3}
B = {15}
A ⊗ B = {3, 15}
C = {4, 17}
A ⊗ B ⊗ C = {3, 4, 15, 17}

This is conceptually equivalent, but not very efficient, since the sets contain numbers that are not used, and will never be used again. In the above example, the number 4 could be removed from C, and the end result — 17 — would still be the same.

By merging, we cannot remove the number 4. If we "forgot" it, or "lost" it, however, nobody would notice.

Union-forget structures

A union-forget structure is a forward-moving data structure based on union-merge, and a set of rules defining which values are still in use.

Maximum-merge, for example, can be seen as a union-forget structure where only the highest value of a set is in use. All other values can be forgotten, as they won't make a difference in the future.

Similarly, we can build forward-moving structures to keep the 5 largest values, all timestamps not older than a week, and so on.

Broom merge

TBD