What is Serializable Snapshot Isolation?

Prerequisites(2)

Isolation Levels & Anomalies Snapshot (point-in-time read)

Serializable Snapshot Isolation (SSI) extends standard Snapshot Isolation (SI) by detecting and aborting transactions that would produce non-serializable outcomes. It provides true serializability — the strongest isolation level — while preserving the key MVCC benefit that readers never block writers. SSI was introduced in a 2008 paper by Cahill, Röhm, and Fekete, and has been the implementation behind PostgreSQL's SERIALIZABLE isolation level since version 9.1.

The Problem with Plain Snapshot Isolation

Snapshot Isolation prevents dirty reads, non-repeatable reads, and phantoms. However, it is vulnerable to write skew anomalies. Write skew occurs when two concurrent transactions each read a shared dataset, make disjoint writes based on what they read, and the combined result violates an invariant that neither transaction alone would break.

Classic example: a hospital requires at least one doctor on call. Doctors Alice and Bob each query the on-call table, see that two doctors are on call, and each independently signs off. Under SI, both commits succeed — leaving zero doctors on call.

RW-Dependencies (Anti-Dependencies)

SSI tracks rw-dependencies (also called anti-dependencies): a directed edge from T1 to T2 meaning "T1 read a version of data that T2 later overwrote." In other words, T1's read was based on data that T2's write has since changed. A single rw-dependency is not a problem. The danger arises when two consecutive rw-dependencies form a specific pattern.

The Dangerous Structure

A serialization anomaly under SI can only occur when there is a dangerous structure: a cycle in the serialization graph where two consecutive rw-dependency edges meet at a pivot transaction. Formally, if T1 --rw--> T2 --rw--> T3, and T1 and T3 are concurrent, this is a potential anomaly. When SSI detects this pattern, it aborts one of the transactions involved (typically the one in the middle or the one that tries to commit last).

How SSI Tracks Dependencies

The SSI implementation maintains two types of conflict information for each transaction:

inConflict: another transaction read data that this transaction later modified (someone has an rw-dependency pointing TO this transaction).
outConflict: this transaction read data that another transaction later modified (this transaction has an rw-dependency pointing FROM it).

When a transaction has both an inConflict and an outConflict, it sits in the middle of two consecutive rw-dependencies — the dangerous structure. SSI aborts it.

False Positives

SSI may sometimes abort transactions that would not actually cause an anomaly. The detection is conservative: it identifies the dangerous structure but does not fully verify that a true serialization cycle exists. This means SSI can produce false positive aborts. In practice, the false positive rate is low, and the benefit of avoiding locks far outweighs the cost of occasional unnecessary aborts.

SSI vs 2PL

Compared to Two-Phase Locking (2PL), SSI has important tradeoffs:

SSI: no read locks, better read throughput, but aborted transactions must be retried.
2PL: readers block writers (and vice versa) but no false aborts once a lock is granted.
SSI performs better under read-heavy workloads; 2PL can be better for write-heavy workloads with many conflicts.

**Serializable Snapshot Isolation (SSI)** extends standard Snapshot Isolation (SI) by detecting and aborting transactions that would produce non-serializable outcomes. It provides **true serializability** — the strongest isolation level — while preserving the key MVCC benefit that **readers never block writers**. SSI was introduced in a 2008 paper by Cahill, Röhm, and Fekete, and has been the implementation behind PostgreSQL's SERIALIZABLE isolation level since version 9.1.

### The Problem with Plain Snapshot Isolation

Snapshot Isolation prevents dirty reads, non-repeatable reads, and phantoms. However, it is vulnerable to **write skew** anomalies. Write skew occurs when two concurrent transactions each read a shared dataset, make disjoint writes based on what they read, and the combined result violates an invariant that neither transaction alone would break.

**Classic example**: a hospital requires at least one doctor on call. Doctors Alice and Bob each query the on-call table, see that two doctors are on call, and each independently signs off. Under SI, both commits succeed — leaving zero doctors on call.

### RW-Dependencies (Anti-Dependencies)

SSI tracks **rw-dependencies** (also called anti-dependencies): a directed edge from T1 to T2 meaning "T1 read a version of data that T2 later overwrote." In other words, T1's read was based on data that T2's write has since changed. A single rw-dependency is not a problem. The danger arises when **two consecutive rw-dependencies** form a specific pattern.

### The Dangerous Structure

A serialization anomaly under SI can only occur when there is a **dangerous structure**: a cycle in the serialization graph where **two consecutive rw-dependency edges** meet at a **pivot transaction**. Formally, if T1 --rw--  T2 --rw--  T3, and T1 and T3 are concurrent, this is a potential anomaly. When SSI detects this pattern, it aborts one of the transactions involved (typically the one in the middle or the one that tries to commit last).

### How SSI Tracks Dependencies

The SSI implementation maintains two types of conflict information for each transaction:
- **inConflict**: another transaction read data that this transaction later modified (someone has an rw-dependency pointing TO this transaction).
- **outConflict**: this transaction read data that another transaction later modified (this transaction has an rw-dependency pointing FROM it).

When a transaction has **both** an inConflict and an outConflict, it sits in the middle of two consecutive rw-dependencies — the dangerous structure. SSI aborts it.

### False Positives

SSI may sometimes abort transactions that would not actually cause an anomaly. The detection is **conservative**: it identifies the dangerous *structure* but does not fully verify that a true serialization cycle exists. This means SSI can produce **false positive aborts**. In practice, the false positive rate is low, and the benefit of avoiding locks far outweighs the cost of occasional unnecessary aborts.

### SSI vs 2PL

Compared to Two-Phase Locking (2PL), SSI has important tradeoffs:
- **SSI**: no read locks, better read throughput, but aborted transactions must be retried.
- **2PL**: readers block writers (and vice versa) but no false aborts once a lock is granted.
- SSI performs better under read-heavy workloads; 2PL can be better for write-heavy workloads with many conflicts.

Real-Life: Write Skew Detection with SSI

Real-World Example

Consider the hospital on-call scenario:

Table: oncall(doctor, on_duty)

Alice: on_duty = true
Bob: on_duty = true

Invariant: at least one doctor must remain on duty.

Without SSI (plain Snapshot Isolation):

T1 (Alice): reads the table, sees both on duty. Decides it is safe to go off duty.
T2 (Bob): reads the table, sees both on duty. Decides it is safe to go off duty.
T1: UPDATE oncall SET on_duty = false WHERE doctor = 'Alice' — commits.
T2: UPDATE oncall SET on_duty = false WHERE doctor = 'Bob' — commits.
Result: zero doctors on call. Invariant violated.

With SSI (PostgreSQL SERIALIZABLE):

T1 reads the on-call rows. SSI notes T1 read both rows.
T2 reads the on-call rows. SSI notes T2 read both rows.
T1 writes Alice's row. SSI detects: T2 read data that T1 modified (rw-dep T2 -> T1).
T2 writes Bob's row. SSI detects: T1 read data that T2 modified (rw-dep T1 -> T2).
Dangerous structure detected: T1 --rw--> T2 --rw--> T1 (two consecutive rw-deps forming a cycle).
T2 is aborted with "could not serialize access" error. T1 commits successfully.
Application retries T2, which now sees Alice is off duty and keeps Bob on duty.

PostgreSQL SERIALIZABLE in practice:

Uses SSI since version 9.1
Applications should be written to retry aborted transactions
Minimal overhead for read-only transactions (they cannot be involved in write skew)
No lock waits for reads — latency is predictable

SSI: Detecting the Dangerous Structure

Step 1 of 2