Leaderless Replication: The Dynamo Approach
Leaderless replication abandons the concept of a designated leader entirely. Any node can accept both reads and writes. This design was popularized by Amazon's Dynamo paper (2007) and inspired Cassandra, Riak, and Voldemort.
How Writes Work
When a client writes a value, it sends the write to multiple nodes in parallel. The client (or a coordinator node) sends the write to all N replicas that own the key. The write is considered successful when at least W nodes acknowledge it. The client does not wait for all N nodes—only W.
If some nodes are unreachable, the write still succeeds as long as W nodes respond. There is no single point of failure: any node going down does not block writes.
How Reads Work
The client reads from R nodes in parallel and takes the value with the highest version number. If responses disagree (some nodes have older data), the most recent version is returned to the client.
Sloppy Quorum
In a strict quorum, the W writes and R reads must go to the designated N nodes for that key. In a sloppy quorum, if some designated nodes are unavailable, the write can go to any available node, even if that node isn't normally responsible for the key. This dramatically improves write availability during partial failures. The temporarily-storing node holds the data with a "hint" about which node it's meant for (hinted handoff).
Version Vectors (Vector Clocks)
Since writes can arrive at different nodes in different orders, the system needs to track causality. A version vector assigns a counter per node. When node A processes a write, it increments its own counter: {A:3, B:2, C:1}. When comparing two version vectors:
- If every counter in V1 is ≥ the corresponding counter in V2, then V1 dominates (is a successor of V2).
- If neither dominates, the writes are concurrent and must be resolved (siblings in Riak, or LWW in Cassandra).
Anti-Entropy
A background process that compares data across replicas and repairs inconsistencies. Unlike read repair (which only fixes data that is actually read), anti-entropy proactively synchronizes all data, ensuring eventual convergence even for infrequently-read keys.
Real-Life: Amazon DynamoDB and Cassandra
Amazon DynamoDB evolved from the original Dynamo paper:
- Each item is replicated across three Availability Zones within a region.
- Writes use a Paxos-based protocol for the "leader" of each partition, but the system appears leaderless to the client since any node can coordinate.
- Eventually consistent reads (default): read from any one replica, fast but possibly stale.
- Strongly consistent reads: read from the partition leader, guaranteeing the latest value.
Apache Cassandra implements Dynamo-style leaderless replication:
- Configurable consistency levels per query: ONE, QUORUM, ALL, LOCAL_QUORUM.
QUORUM=floor(N/2) + 1nodes must respond. With replication factor 3, QUORUM = 2.- A write with
QUORUM+ a read withQUORUMguarantees overlap: at least one node in the read set has the latest write. - Lightweight transactions (LWT): Cassandra supports
IF NOT EXISTSusing Paxos for linearizable operations when needed.
Riak (now Riak KV): Stores conflicting concurrent writes as siblings and lets the application resolve them. This avoids data loss from LWW but requires application logic.