Merging preliminary draft, to be adapted to semilattice theory
Data is often modified in several places and then merged together at a later point in time. In a distributed setting, a conflict-free merge operation must fulfill three criteria:
- Commutativity: Merging A ⊕ B must yield the same as merging B ⊕ A.
- Associativity: Merging (A ⊕ B) ⊕ C must yield the same as A ⊕ (B ⊕ C).
- Stability: Merging A ⊕ A must yield A.
Commutativity and associativity are necessary because the different data versions have no intrinsic order, i.e. one cannot say a priori that A is more important than B.
Stability is necessary because data versions may be merged multiple times, such as in the following example:
B has been derived from A, and can therefore be written as B = A ⊕ ΔB, where ΔB denotes the changes. The same goes for C. Merging B and C therefore becomes:
D = B ⊕ C = (A ⊕ ΔB) ⊕ (A ⊕ ΔC) = A ⊕ A ⊕ ΔB ⊕ ΔC = A ⊕ ΔB ⊕ ΔC
Stability can be obtained by keeping the history of all changes, i.e., if it is known that both B and C have been derived from A, the merge algorithm can compensate for that. However, keeping the history of all changes is often impractical, and can be complex.
History-free merge operations
Fortunately, there are two merge operations that do not require any history: set union, and top-N merge.
Set union merge
This merge operation is typically used for dictionary keys.
If the elements in the two sets have an order, one can throw away all but the top N elements after union merge.
As a special case, one can keep the most recent element only (N = 1, ordering by date).