Back to DAG

Linked List (overflow pages)

databases

Linked Lists in Database Storage

In databases, linked lists appear not as in-memory node structures, but as chains of disk pages connected by page pointers. When a single page cannot hold all the data it needs to, the database chains a new overflow page and links them together.

Where overflow page chains appear

  1. Hash index overflow buckets: In a static hash index, each bucket is a page. When a bucket fills up, a new overflow page is allocated and linked from the original. Searching a bucket means traversing the chain — each hop is potentially a random disk I/O.

  2. TOAST chains: When a large column value is stored out-of-line (via TOAST in PostgreSQL), it is split into chunks stored across multiple pages. These chunks are accessed sequentially — effectively a singly-linked chain of data.

  3. Large Object (LOB) storage: Binary large objects (images, documents) stored in a database are often broken into fixed-size chunks linked together.

  4. Heap file free-space lists: Some databases organize pages into two linked lists: pages with free space and full pages. Inserting a record means grabbing the head of the free list.

Singly-linked vs doubly-linked

  • Singly-linked overflow pages: each page has a "next page" pointer. Traversal is forward-only. Used in hash overflow chains and TOAST.
  • Doubly-linked leaf pages: B+ tree leaf nodes are connected in a doubly-linked list so range scans can go in either direction. Each leaf has both a "next leaf" and "prev leaf" pointer.

The performance problem: pointer chasing

Every link traversal is a potential random disk I/O. If a hash bucket has 5 overflow pages, reading all entries requires 5 separate page fetches — and if those pages are scattered across the disk, each fetch may cost ~10 ms on an HDD. This is why databases strongly prefer B-trees over linked structures for indexes: a B-tree with fanout 500 can index billions of rows in just 3-4 levels (3-4 I/Os), while a linked chain of 1000 overflow pages requires 1000 I/Os.

When linked structures are acceptable

  • Short chains: if overflow is rare (good hash function, low load factor), chains stay short (1-2 pages).
  • Sequential access: if you always read the entire chain (e.g., reading a full LOB), the I/O pattern is sequential, which is much faster.
  • B+ tree leaf chains: leaf-to-leaf traversal is sequential during range scans, and the database can prefetch the next leaf.

Complexity

OperationSingly-linkedDoubly-linked
Traverse nextO(1) per hopO(1) per hop
Traverse prevO(n) from headO(1) per hop
Insert at headO(1)O(1)
Delete (given pointer)O(n)O(1)

Each "O(1) per hop" may still be a disk I/O, which is why hop count matters enormously.

Real-Life: Hash Index Overflow in PostgreSQL

Real-World Example

PostgreSQL's hash index maps hash(key) to a bucket page. When a bucket overflows, PostgreSQL allocates an overflow page and chains it:

Bucket 7 → Overflow Page 1 → Overflow Page 2 → null

To find key "alice" with hash("alice") = bucket 7:

  1. Read bucket page 7. Scan all entries. Not found.
  2. Follow pointer to overflow page 1. Scan entries. Not found.
  3. Follow pointer to overflow page 2. Scan entries. Found.

That's 3 random page reads. If the hash function distributes well and the table is not overloaded, most buckets have zero overflow pages, making lookups a single page read.

B+ tree leaf-level linked list:

In contrast, a B+ tree's leaf nodes form a doubly-linked list: ← Leaf[A-D] ↔ Leaf[E-K] ↔ Leaf[L-R] ↔ Leaf[S-Z] →

A range query like WHERE name BETWEEN 'F' AND 'P' descends the tree to find Leaf[E-K], then follows the "next" pointer to Leaf[L-R], reading just 2 leaves sequentially. The doubly-linked structure allows reverse scans (ORDER BY name DESC) by following "prev" pointers.

Overflow Page Chain vs B+ Tree Leaf Chain

Hash Index: Overflow Page Chain (singly-linked) Each hop = potential random I/O Bucket 7 k1, k2, k3 (page full) next Overflow 1 k4, k5, k6 (page full) next Overflow 2 k7 (has space) null 3 page reads to find k7 — pointer chasing! B+ Tree Leaf Level: Doubly-Linked List Sequential scan for range queries Leaf [A-D] a=1, b=2, d=4 Leaf [E-K] e=5, g=7, k=11 Leaf [L-R] l=12, n=14, r=18 Leaf [S-Z] s=19, z=26 Range scan WHERE key BETWEEN 'E' AND 'R': read 2 leaves sequentially scan range: Leaf[E-K] → Leaf[L-R]
Step 1 of 2