Back to DAG

Hash Ring

databases

What is a Hash Ring?

A hash ring (also called consistent hashing) is a technique for distributing data across a dynamic set of nodes in a way that minimizes disruption when nodes are added or removed. Both keys and nodes are mapped to positions on a circular hash space (imagine a ring from 0 to 2^32 - 1). Each key is assigned to the first node encountered clockwise from the key's position on the ring.

The Problem with Simple Hashing

With simple modular hashing (hash(key) % N where N is the number of nodes), adding or removing a single node changes the modulus, causing almost every key to be reassigned to a different node. For a distributed cache with 100 million keys and 10 nodes, adding an 11th node invalidates ~90% of the cache — a catastrophic "thundering herd" of cache misses hitting the database.

How Consistent Hashing Works

  1. Hash both nodes and keys to positions on the ring using the same hash function (e.g., SHA-1, MurmurHash, or MD5).
  2. Assign each key to the first node clockwise from the key's ring position. This is the key's "owner."
  3. Adding a node: the new node takes over keys in the range between itself and its predecessor. Only keys in that one arc are affected — all other keys stay where they are.
  4. Removing a node: its keys are redistributed to the next node clockwise. Again, only keys from that one arc move.

With N nodes, adding or removing one node only moves approximately 1/N of the keys — far better than the nearly 100% reshuffling of modular hashing.

Virtual Nodes

With only a few physical nodes, the ring partitions are often uneven. One node might be responsible for a large arc while another has a tiny one, causing load imbalance. Virtual nodes (vnodes) solve this: each physical node is assigned multiple positions on the ring (e.g., 100-200 virtual nodes each). Keys are assigned to virtual nodes, and each virtual node maps back to its physical node. This spreads the load more evenly because each physical node's virtual nodes are distributed across the ring.

Load Balancing with Virtual Nodes

Without virtual nodes and 3 physical nodes, one node might handle 50% of keys while the others handle 25% each. With 150 virtual nodes per physical node (450 total positions), the standard deviation of load drops dramatically. Cassandra defaults to 256 virtual nodes per physical node. When a physical node is added, its virtual nodes are spread across the ring, taking small slices from many other nodes rather than one big chunk from a single neighbor.

Replication on the Ring

For fault tolerance, each key is typically replicated to the next R-1 nodes clockwise on the ring (where R is the replication factor). If the primary node fails, the replicas can serve the data. The ring structure makes it natural to identify replica locations — just walk clockwise.

Real-World Usage

  • Amazon DynamoDB: uses consistent hashing with virtual nodes for partitioning
  • Apache Cassandra: consistent hashing with configurable virtual nodes
  • Memcached: client-side consistent hashing for cache distribution
  • Akka Cluster: uses a hash ring for actor shard distribution
  • CDN load balancing: some CDNs use hash rings to route requests to edge servers

Real-Life: Adding a Node to a Distributed Cache

Real-World Example

A distributed cache has 4 nodes (A, B, C, D) arranged on a hash ring, each with 3 virtual nodes (12 positions total). The ring looks like:

Position 0-99: vnode A1, Position 100-199: vnode B1, Position 200-299: vnode C1, ... (simplified)

Initial state: 1 million cached keys evenly distributed. Each node holds ~250,000 keys.

Adding node E (with 3 vnodes):

  1. E's 3 virtual nodes are hashed to positions, say 150, 350, and 450.
  2. Keys between positions 100-150 (previously owned by B1) now map to E's vnode at 150.
  3. Keys between positions 300-350 and 400-450 similarly move to E.
  4. Only ~200,000 keys move (20% = 1/5, since N goes from 4 to 5). The other 800,000 keys remain on their original nodes — cache hits!

With simple modular hashing (hash(key) % N): Changing N from 4 to 5 would remap ~800,000 keys (80%) to different nodes. This causes 800,000 cache misses, overwhelming the backend database.

Removing node C:

  1. C's 3 virtual nodes are removed from the ring.
  2. Keys that were owned by C's vnodes are absorbed by the next clockwise vnodes.
  3. Only C's ~250,000 keys are redistributed. The rest stay put.

Cassandra in production: Cassandra uses 256 virtual nodes per physical node by default. When a new node joins the cluster, it receives a proportional share of data from all other nodes simultaneously, enabling fast rebalancing through parallel streaming.

Hash Ring with Virtual Nodes

Hash Ring: Keys Map to First Node Clockwise clockwise A1 A2 B1 B2 C1 C2 k1 → B1 k2 → C1 k3 → A1 Adding Node D: D's vnodes inserted on ring Only keys between D and its predecessor move to D. ~1/N keys reassigned (25%) Naive hash(key) % N: Changing N from 3 to 4 changes the modulus for almost every key. ~75% keys reassigned!

Interactive Hash Ring

Loading demo...
Step 1 of 3