Stores Actors Data Tree Downloads
AnalysisMerging

Merging preliminary draft, to be adapted to semilattice theory

Data is often modified in several places and then merged together at a later point in time. In a distributed setting, a conflict-free merge operation must fulfill three criteria:

Commutativity and associativity are necessary because the different data versions have no intrinsic order, i.e. one cannot say a priori that A is more important than B.

Stability is necessary because data versions may be merged multiple times, such as in the following example:

A B C D ΔB ΔC

B has been derived from A, and can therefore be written as B = A ⊕ ΔB, where ΔB denotes the changes. The same goes for C. Merging B and C therefore becomes:

D = B ⊕ C = (A ⊕ ΔB) ⊕ (A ⊕ ΔC) = A ⊕ A ⊕ ΔB ⊕ ΔC = A ⊕ ΔB ⊕ ΔC

Stability can be obtained by keeping the history of all changes, i.e., if it is known that both B and C have been derived from A, the merge algorithm can compensate for that. However, keeping the history of all changes is often impractical, and can be complex.

History-free merge operations

Fortunately, there are two merge operations that do not require any history: set union, and top-N merge.

Set union merge

Combining two sets using set union is commutative, associative, and stable.

This merge operation is typically used for dictionary keys.

Top-N merge

If the elements in the two sets have an order, one can throw away all but the top N elements after union merge.

As a special case, one can keep the most recent element only (N = 1, ordering by date).

Merging with some history to be written