Splitting Data Across Nodes

Prerequisites(2)

Builds on to(3)

Distributed Secondary Index Data Warehouse (Snowflake, BigQuery)Time-Series DB (InfluxDB)

Partitioning (also called sharding) divides a dataset across multiple nodes so that each node stores and processes only a subset of the data. This enables horizontal scaling: as data grows, add more nodes with more partitions. Each partition is a fully independent subset of the data.

Range Partitioning

Assign each partition a contiguous range of keys. For example, partition 1 owns keys A–F, partition 2 owns G–N, partition 3 owns O–Z. Range partitioning is excellent for range queries: "give me all users with last names starting with M" hits only one partition.

Risk: Hotspots. If keys are not uniformly distributed (e.g., many names start with S), one partition handles disproportionate load. Time-stamped keys are particularly dangerous—all current writes go to the latest time range.

Hash Partitioning

Apply a hash function to each key and use the hash to determine the partition: partition = hash(key) % numPartitions. This distributes keys uniformly regardless of the key distribution, eliminating hotspots.

Tradeoff: Range queries become impossible because adjacent keys are scattered across different partitions. A range scan must hit all partitions (scatter-gather).

Compound (Hybrid) Partitioning

Combine both approaches. For example, in Cassandra, a compound primary key (user_id, timestamp) hashes user_id to determine the partition, then stores rows sorted by timestamp within that partition. This enables range queries on timestamp within a given user_id.

Resharding

When the dataset outgrows the current number of partitions, new partitions must be created and data redistributed. Splitting a partition divides a large partition into two (splitting the key range or hash range). This is disruptive because data must be moved while the system continues serving requests. Some systems (e.g., DynamoDB) perform this automatically in the background.

Hot Partitions

Even with hash partitioning, a single extremely popular key (e.g., a celebrity's user ID during a viral event) can overload one partition. Mitigations include key salting (appending a random suffix to spread writes across partitions) and application-level load shedding.

Partition Pruning

The query optimizer analyzes query predicates to determine which partitions are relevant. A query with WHERE region = 'US' on a table partitioned by region only reads the US partition, skipping all others. This is called partition pruning and dramatically reduces I/O for selective queries.

Real-Life: Sharding at Scale

Real-World Example

Vitess (YouTube/Slack):

Shards MySQL databases by routing queries based on a "vindex" (virtual index) that maps keys to shards.
Supports range-based and hash-based vindexes. Custom vindexes can implement application-specific partitioning.
Resharding: Vitess can split a shard into two while the system is live, using a combination of schema copy, binlog streaming, and traffic cutover.

DynamoDB:

Automatically partitions tables as data grows. Each partition handles up to 10 GB of data and 3,000 RCU / 1,000 WCU.
When a partition exceeds these limits, DynamoDB transparently splits it. This is invisible to the application.
Adaptive capacity: if one partition is hot, DynamoDB automatically reallocates throughput from underused partitions.

PostgreSQL (Declarative Partitioning):

Supports RANGE, LIST, and HASH partitioning natively since version 10.
Partition pruning is automatic: SELECT * FROM orders WHERE order_date >= '2024-01-01' on a range-partitioned table only scans the relevant partition.
Indexes can be created per partition, allowing parallel index scans.

**Partitioning** (also called **sharding**) divides a dataset across multiple nodes so that each node stores and processes only a subset of the data. This enables horizontal scaling: as data grows, add more nodes with more partitions. Each partition is a fully independent subset of the data.

### Range Partitioning

Assign each partition a **contiguous range of keys**. For example, partition 1 owns keys A–F, partition 2 owns G–N, partition 3 owns O–Z. Range partitioning is excellent for **range queries**: "give me all users with last names starting with M" hits only one partition.

**Risk**: Hotspots. If keys are not uniformly distributed (e.g., many names start with S), one partition handles disproportionate load. Time-stamped keys are particularly dangerous—all current writes go to the latest time range.

### Hash Partitioning

Apply a hash function to each key and use the hash to determine the partition: `partition = hash(key) % numPartitions`. This distributes keys uniformly regardless of the key distribution, eliminating hotspots.

**Tradeoff**: Range queries become impossible because adjacent keys are scattered across different partitions. A range scan must hit all partitions (scatter-gather).

### Compound (Hybrid) Partitioning

Combine both approaches. For example, in Cassandra, a compound primary key `(user_id, timestamp)` hashes `user_id` to determine the partition, then stores rows sorted by `timestamp` within that partition. This enables range queries on `timestamp` within a given `user_id`.

### Resharding

When the dataset outgrows the current number of partitions, new partitions must be created and data redistributed. **Splitting** a partition divides a large partition into two (splitting the key range or hash range). This is disruptive because data must be moved while the system continues serving requests. Some systems (e.g., DynamoDB) perform this automatically in the background.

### Hot Partitions

Even with hash partitioning, a single extremely popular key (e.g., a celebrity's user ID during a viral event) can overload one partition. Mitigations include **key salting** (appending a random suffix to spread writes across partitions) and application-level load shedding.

### Partition Pruning

The query optimizer analyzes query predicates to determine which partitions are relevant. A query with `WHERE region = 'US'` on a table partitioned by region only reads the US partition, skipping all others. This is called **partition pruning** and dramatically reduces I/O for selective queries.

**Vitess (YouTube/Slack):**
- Shards MySQL databases by routing queries based on a "vindex" (virtual index) that maps keys to shards.
- Supports range-based and hash-based vindexes. Custom vindexes can implement application-specific partitioning.
- Resharding: Vitess can split a shard into two while the system is live, using a combination of schema copy, binlog streaming, and traffic cutover.

**DynamoDB:**
- Automatically partitions tables as data grows. Each partition handles up to 10 GB of data and 3,000 RCU / 1,000 WCU.
- When a partition exceeds these limits, DynamoDB transparently splits it. This is invisible to the application.
- **Adaptive capacity**: if one partition is hot, DynamoDB automatically reallocates throughput from underused partitions.

**PostgreSQL (Declarative Partitioning):**
- Supports RANGE, LIST, and HASH partitioning natively since version 10.
- Partition pruning is automatic: `SELECT * FROM orders WHERE order_date  = '2024-01-01'` on a range-partitioned table only scans the relevant partition.
- Indexes can be created per partition, allowing parallel index scans.

Range vs Hash Partitioning

Step 1 of 3