Back to DAG

Version Chain (undo log)

databases

What is a Version Chain?

A version chain is a linked list of row versions that enables Multi-Version Concurrency Control (MVCC). When a row is updated, the database does not overwrite the old value in place. Instead, it creates a new version and links the old version into a chain, allowing concurrent readers to see a consistent snapshot of the data without blocking writers.

How version chains work

Each row version contains:

  • The actual data (column values)
  • A transaction ID (xid) — the ID of the transaction that created this version
  • A pointer to the previous version (the undo pointer)

When a transaction updates a row:

  1. The current (latest) version's before-image is copied to the undo log (a separate storage area).
  2. The row is updated in place with the new values and the current transaction's ID.
  3. A pointer from the new version links back to the old version in the undo log.

When a transaction reads a row:

  1. Start at the latest version.
  2. Check if this version is visible to the reader's snapshot (i.e., the creating transaction committed before the reader's snapshot timestamp).
  3. If not visible, follow the undo pointer to the previous version and repeat.
  4. Return the first version that is visible.

Undo log

The undo log stores before-images of modified rows. It serves two purposes:

  1. MVCC reads: readers traverse the version chain to find their visible version.
  2. Rollback: if a transaction aborts, the database follows the version chain and restores the previous version. The undo log contains exactly the data needed to "undo" each change.

Visibility rules

A version created by transaction T is visible to reader R if:

  • T committed before R's snapshot was taken, OR
  • T is the same transaction as R (a transaction can see its own uncommitted changes)

A version is not visible if:

  • T is still active (uncommitted) when R reads
  • T committed after R's snapshot (R should not see future changes)

Version chain length and performance

Long version chains hurt read performance. If a row has been updated 100 times, a reader with an old snapshot must traverse up to 100 versions to find the one it can see. This is why:

  • Long-running transactions are problematic — they prevent old versions from being garbage collected.
  • Vacuum/purge processes periodically clean up versions that are no longer visible to any active transaction.
  • PostgreSQL stores old versions in the main heap (no separate undo log), making vacuum even more critical.
  • MySQL/InnoDB stores old versions in a dedicated undo tablespace and has a background purge thread.

Approaches to version storage

ApproachWhere old versions liveUsed by
Append-only (O2N)Main table, oldest-to-newest chainPostgreSQL
Append-only (N2O)Main table, newest-to-oldest chain(less common)
Delta storageUndo log (only changed columns)MySQL/InnoDB, Oracle
Full undoUndo log (complete row copies)SQL Server

Delta storage is the most space-efficient: if only one column out of 20 changes, only that column's before-image is stored in the undo log. PostgreSQL's approach creates a full copy of the row for every update, which can cause significant table bloat (known as "bloat" or "dead tuples").

Rollback via version chain

When a transaction aborts:

  1. Follow the version chain for each modified row.
  2. Restore the previous version (the before-image from the undo log).
  3. Remove the aborted version.

This is efficient because the undo log already contains exactly the data needed.

Real-Life: MVCC in PostgreSQL and MySQL

Real-World Example

PostgreSQL's version chain (append-only):

-- Transaction A (xid=100) inserts a row
INSERT INTO accounts (id, balance) VALUES (1, 1000);
-- Row version: {id:1, balance:1000, xmin:100, xmax:null}

-- Transaction B (xid=101) updates the balance
UPDATE accounts SET balance = 900 WHERE id = 1;
-- Old version: {id:1, balance:1000, xmin:100, xmax:101}
-- New version: {id:1, balance:900, xmin:101, xmax:null}

Both versions exist in the heap. A concurrent reader with snapshot before xid=101 still sees balance=1000. PostgreSQL's VACUUM later removes the old version when no transaction needs it.

MySQL/InnoDB's version chain (undo log):

In InnoDB, the latest version lives in the clustered index (the main table). Old versions are pushed to the undo log:

  1. Row in table: {id:1, balance:900, trx_id:101, roll_ptr: → undo_log_entry}
  2. Undo log entry: {id:1, balance:1000, trx_id:100, roll_ptr: → older_entry_or_null}

A reader with an old snapshot follows roll_ptr to find its visible version.

The cost of long chains: If a table row is updated once per second and a reporting query takes 5 minutes, the query might need to traverse up to 300 versions of that row. This is why DBAs warn against long-running transactions in OLTP databases — they cause version chain buildup, increase undo log size, and slow down reads.

Practical tip: Monitor pg_stat_user_tables.n_dead_tup in PostgreSQL or trx_undo_history_len in MySQL to detect version chain buildup.

Version Chain and Visibility

Main Table (latest version) Version 3 (current) balance = 700 | xid = 103 undo_ptr → v2 Undo Log Version 2 balance = 900 | xid = 102 | ptr → v1 Version 1 (original) balance = 1000 | xid = 100 | ptr → null Reader Visibility Reader A (snapshot @ 103) Sees v3: balance = 700 Reader B (snapshot @ 102) Sees v2: balance = 900 Reader C (snapshot @ 100) Sees v1: balance = 1000 Timeline xid 100 xid 102 xid 103 Rollback Follow chain, restore v2
Step 1 of 2