Why Multi-Leader Replication?
In single-leader replication, every write must go through one node. This creates a bottleneck: if the leader is in US-East, users in Europe and Asia experience high write latency due to cross-datacenter round trips. Multi-leader replication (also called multi-master or active-active) allows multiple nodes to accept writes, each acting as a leader for its local region.
Use Cases
-
Multi-datacenter operation: Each datacenter has its own leader. Writes in Europe go to the European leader, writes in Asia go to the Asian leader. Each leader replicates to the others asynchronously. This dramatically reduces write latency for geographically distributed users. If one datacenter fails, others continue operating independently.
-
Collaborative editing: Applications like Google Docs allow multiple users to edit the same document simultaneously. Each user's local state acts as a "leader" that accepts changes locally and replicates to others.
-
Offline-capable applications: A mobile app that works offline and syncs when reconnected is essentially a multi-leader system. Each device is a leader that accepts writes locally. When devices reconnect, their changes must be merged.
Conflict Detection
The fundamental challenge: two leaders can concurrently modify the same row. In single-leader replication, writes are serialized through one node, so conflicts are impossible. In multi-leader, they must be detected and resolved.
- Synchronous conflict detection: Before committing a write, check with all other leaders. This negates the benefit of multi-leader (you'd wait for cross-datacenter round trips).
- Asynchronous conflict detection: Allow both writes to succeed locally, detect the conflict when replication delivers both changes to the same node. This is the practical approach but requires a conflict resolution strategy.
Replication Topologies
How changes flow between leaders:
- Circular: Each leader forwards to the next in a ring. Simple but a single node failure breaks the chain.
- Star (hub-and-spoke): One central node relays changes between all others. Central node is a single point of failure.
- All-to-all: Every leader replicates directly to every other leader. Most fault-tolerant. However, different replication paths have different latencies, so changes may arrive out of order. Causality must be tracked (e.g., using version vectors) to ensure correct ordering.
Real-Life: Multi-Datacenter with CockroachDB
CockroachDB and YugabyteDB are distributed SQL databases that support multi-region deployments with a form of multi-leader writes:
- Regional tables: Data is partitioned by region. Each region's data has a leader in that region, so local writes are fast. Cross-region reads fall back to the nearest replica.
- Global tables: Replicated to all regions with follower reads. Writes go through a consensus protocol across regions (higher latency).
Other real-world examples:
- MySQL Group Replication: Supports multi-primary mode where any node can accept writes. Conflicts are detected at commit time using certification-based conflict detection (row-level hashing).
- Cassandra: While technically leaderless, its multi-datacenter replication behaves similarly: writes go to a local coordinator and replicate asynchronously to other datacenters.
- Google Docs / Figma: Use operational transformation or CRDTs to merge concurrent edits from multiple users, each of whom acts as a local "leader" for their changes.