Back to DAG

MVCC

databases

What is Multi-Version Concurrency Control?

Multi-Version Concurrency Control (MVCC) is a concurrency control technique where writers create new versions of data instead of overwriting existing values in place. Each write produces a new version tagged with the writing transaction's ID, while old versions are preserved. Readers access the version consistent with their snapshot, meaning they see a consistent point-in-time view of the data without ever needing to acquire read locks.

The Core Principle

The fundamental guarantee of MVCC is: readers never block writers, and writers never block readers. A read-only query can scan millions of rows while concurrent transactions insert and update data freely. The reader simply ignores versions created by transactions that started after its snapshot. Writers only block other writers when they attempt to modify the same row (write-write conflict).

Version Chains

Each logical row in an MVCC database has a version chain: a sequence of physical row versions, each tagged with the transaction ID that created it. When a transaction updates a row, it creates a new version and links it into the chain. The chain may be ordered newest-first or oldest-first depending on the implementation.

Visibility Check

When a transaction reads a row, it walks the version chain and selects the most recent version that is visible according to its snapshot. A version is visible if:

  1. The version's creating transaction committed before the reader's snapshot.
  2. The version's creating transaction is not in the reader's active-transaction set (xip[]).
  3. The version has not been deleted (or the deleting transaction has not yet committed from the reader's perspective).

PostgreSQL's MVCC Implementation

PostgreSQL stores all versions directly in the table's heap pages. Each tuple (row version) has two hidden columns:

  • xmin: the transaction ID that created this version (via INSERT or UPDATE).
  • xmax: the transaction ID that deleted or replaced this version (via DELETE or UPDATE). If xmax is 0 or the deleting transaction has not committed, the version is still "live."

When a transaction updates a row, PostgreSQL inserts a new tuple with the new values and sets the old tuple's xmax to the updating transaction's ID. The old tuple remains in the heap until it is removed by VACUUM.

InnoDB's MVCC Implementation

MySQL InnoDB takes a different approach. The primary table always stores only the latest committed version of each row. Older versions are stored in the undo log (rollback segment). When a reader needs to see an older version, InnoDB follows undo pointers from the current row backward through the undo log to reconstruct the version visible at the reader's snapshot. This is called a consistent read.

InnoDB's read view is its snapshot equivalent. It records: the list of active transaction IDs, the low-water mark (oldest active txnId), and the high-water mark (next txnId to be assigned). The visibility check compares the row's creating transaction ID against these boundaries.

Write-Write Conflicts

MVCC does not eliminate all conflicts. When two transactions attempt to update the same row, only one can succeed. The second writer either blocks (waiting for the first to commit or abort) or receives an error, depending on the isolation level and database implementation.

Real-Life: MVCC in Action with Concurrent Reads and Writes

Real-World Example

Consider a product inventory table with a row: Product "Widget", quantity = 100.

Timeline with MVCC (PostgreSQL-style):

  1. T10 starts a REPEATABLE READ transaction (snapshot taken at T10).
  2. T11 starts and executes: UPDATE products SET qty = 80 WHERE name = 'Widget'.
    • PostgreSQL creates a new tuple (xmin=T11, qty=80) and sets old tuple's xmax=T11.
  3. T11 commits.
  4. T10 reads: SELECT qty FROM products WHERE name = 'Widget'.
    • T10's snapshot was taken before T11 committed. It walks the version chain.
    • New version (xmin=T11): T11 was in-progress when T10's snapshot was taken, so NOT visible.
    • Old version (xmin=T5, xmax=T11): T5 committed before snapshot, and T11 (the deleter) was in-progress at snapshot time, so the deletion is not yet visible. This version IS visible.
    • T10 sees qty = 100 (the old value).
  5. T12 starts a new transaction after T11 committed.
  6. T12 reads the same row — sees qty = 80 (T11's version is now visible).

Key insight: T10 and T12 read the same row at the same time but see different values, each consistent with their own snapshot. No locks were needed for any read.

Real systems using MVCC:

  • PostgreSQL: heap-based MVCC with xmin/xmax on every tuple
  • MySQL InnoDB: undo-log-based MVCC with read views
  • Oracle: uses rollback segments (similar to InnoDB's undo log) for consistent reads
  • CockroachDB: MVCC with hybrid-logical clock timestamps

MVCC Version Chain and Visibility

MVCC: Version Chain for Row "Widget" Version 1 xmin=T5 xmax=T11 qty = 100 Version 2 xmin=T11 xmax=T15 qty = 80 Version 3 (latest) xmin=T15 xmax=0 qty = 50 Reader T10 (snapshot before T11 commit) Sees Version 1: qty = 100 Reader T13 (snapshot after T11 commit) Sees Version 2: qty = 80 PostgreSQL: all versions in heap Needs VACUUM to remove dead tuples InnoDB: latest in table, old in undo log Follows undo pointers to find old versions

Interactive MVCC

Loading demo...
Step 1 of 3