Back to DAG

Consistent Hashing

databases

Consistent Hashing for Distributed Replication

Consistent hashing maps both keys and nodes onto a circular hash space (a "ring"). Each key is stored on the first N nodes found by walking clockwise from the key's hash position. This creates a natural mapping of keys to replicas that is resilient to cluster changes.

The Hash Ring with Replication

  1. Hash each node's identifier onto the ring at one or more positions.
  2. Hash each key onto the ring.
  3. Walk clockwise from the key's position. The first N distinct physical nodes encountered form the key's preference list—the set of nodes responsible for storing replicas of that key.

Preference List

The preference list is an ordered list of nodes responsible for a given key. The first node is the "coordinator" for writes. If that node is down, the next node in the list takes over (sloppy quorum). The preference list ensures that every key has a deterministic set of N replicas.

Virtual Nodes (Vnodes)

Rather than mapping each physical node to a single point on the ring, assign it multiple positions (virtual nodes). A node with 256 vnodes has 256 positions on the ring, each responsible for a small key range. Benefits:

  • Better load balance: Keys are distributed more uniformly across physical nodes. Without vnodes, a new node only takes over one contiguous range; with vnodes, it absorbs load from many existing nodes.
  • Proportional assignment: A more powerful machine can be given more vnodes, receiving a proportionally larger share of keys.
  • Smoother rebalancing: When a node joins or leaves, only the key ranges of its vnodes are affected, spreading the data movement across many other nodes.

Node Joins and Leaves

When a node joins: it claims positions on the ring (vnodes). Only the key ranges adjacent to those positions are affected. Data is transferred from the nodes that previously owned those ranges. All other keys are untouched.

When a node leaves: its key ranges are absorbed by the next nodes clockwise. Only the departing node's data needs to be redistributed.

This is a dramatic improvement over naive hash(key) % N, where changing N reshuffles nearly every key.

Production Usage

  • Cassandra: Uses vnodes (default 256 per node) for token-based partitioning. Each vnode owns a token range on the ring.
  • DynamoDB: Uses consistent hashing internally to assign partitions to storage nodes.
  • Riak: Divides the ring into a fixed number of partitions, distributed evenly across nodes.

Real-Life: Cassandra Token Ring

Real-World Example

Cassandra's token ring is the canonical example of consistent hashing with replication:

  • The hash space is the range of a Murmur3 hash: -2^63 to 2^63 - 1.
  • Each vnode is assigned a token (a position on the ring). The vnode is responsible for all keys whose hash falls between the previous token and its own token.
  • With replication factor (RF) = 3, each key is stored on the 3 nodes whose vnodes are the first 3 distinct nodes clockwise from the key's hash.

Adding a node to a 4-node cluster:

  1. The new node (E) is assigned 256 vnodes spread across the ring.
  2. For each vnode, E takes over a small key range from the node that previously owned it.
  3. Data streams from existing nodes to E. During streaming, reads still work (existing nodes still have the data).
  4. Once streaming completes, E starts serving reads and writes for its ranges.
  5. Total data moved: approximately 1/(N+1) of the total data, spread across all existing nodes.

Contrast with hash(key) % N: Adding a 5th node to a 4-node cluster would remap ~80% of keys. With consistent hashing, only ~20% of keys move, and the movement is localized.

Consistent Hash Ring with Replication (N=3)

A B C D key k1 Preference list for k1 (N=3): A 1st (coordinator) B 2nd replica C 3rd replica Virtual Nodes (vnodes) Physical node A has 4 vnodes: A-0 A-1 A-2 A-3 Node join: only adjacent ranges move hash(key) % N: ~80% of keys remapped Consistent hashing: ~1/(N+1) of keys move
Step 1 of 2