Linked Lists in Database Storage

Prerequisites(1)

Builds on to(1)

In databases, linked lists appear not as in-memory node structures, but as chains of disk pages connected by page pointers. When a single page cannot hold all the data it needs to, the database chains a new overflow page and links them together.

Where overflow page chains appear

Hash index overflow buckets: In a static hash index, each bucket is a page. When a bucket fills up, a new overflow page is allocated and linked from the original. Searching a bucket means traversing the chain — each hop is potentially a random disk I/O.
TOAST chains: When a large column value is stored out-of-line (via TOAST in PostgreSQL), it is split into chunks stored across multiple pages. These chunks are accessed sequentially — effectively a singly-linked chain of data.
Large Object (LOB) storage: Binary large objects (images, documents) stored in a database are often broken into fixed-size chunks linked together.
Heap file free-space lists: Some databases organize pages into two linked lists: pages with free space and full pages. Inserting a record means grabbing the head of the free list.

Singly-linked vs doubly-linked

Singly-linked overflow pages: each page has a "next page" pointer. Traversal is forward-only. Used in hash overflow chains and TOAST.
Doubly-linked leaf pages: B+ tree leaf nodes are connected in a doubly-linked list so range scans can go in either direction. Each leaf has both a "next leaf" and "prev leaf" pointer.

The performance problem: pointer chasing

Every link traversal is a potential random disk I/O. If a hash bucket has 5 overflow pages, reading all entries requires 5 separate page fetches — and if those pages are scattered across the disk, each fetch may cost ~10 ms on an HDD. This is why databases strongly prefer B-trees over linked structures for indexes: a B-tree with fanout 500 can index billions of rows in just 3-4 levels (3-4 I/Os), while a linked chain of 1000 overflow pages requires 1000 I/Os.

When linked structures are acceptable

Short chains: if overflow is rare (good hash function, low load factor), chains stay short (1-2 pages).
Sequential access: if you always read the entire chain (e.g., reading a full LOB), the I/O pattern is sequential, which is much faster.
B+ tree leaf chains: leaf-to-leaf traversal is sequential during range scans, and the database can prefetch the next leaf.

Complexity

Operation	Singly-linked	Doubly-linked
Traverse next	O(1) per hop	O(1) per hop
Traverse prev	O(n) from head	O(1) per hop
Insert at head	O(1)	O(1)
Delete (given pointer)	O(n)	O(1)

Each "O(1) per hop" may still be a disk I/O, which is why hop count matters enormously.

Real-Life: Hash Index Overflow in PostgreSQL

Real-World Example

PostgreSQL's hash index maps hash(key) to a bucket page. When a bucket overflows, PostgreSQL allocates an overflow page and chains it:

Bucket 7 → Overflow Page 1 → Overflow Page 2 → null

To find key "alice" with hash("alice") = bucket 7:

Read bucket page 7. Scan all entries. Not found.
Follow pointer to overflow page 1. Scan entries. Not found.
Follow pointer to overflow page 2. Scan entries. Found.

That's 3 random page reads. If the hash function distributes well and the table is not overloaded, most buckets have zero overflow pages, making lookups a single page read.

B+ tree leaf-level linked list:

In contrast, a B+ tree's leaf nodes form a doubly-linked list: ← Leaf[A-D] ↔ Leaf[E-K] ↔ Leaf[L-R] ↔ Leaf[S-Z] →

A range query like WHERE name BETWEEN 'F' AND 'P' descends the tree to find Leaf[E-K], then follows the "next" pointer to Leaf[L-R], reading just 2 leaves sequentially. The doubly-linked structure allows reverse scans (ORDER BY name DESC) by following "prev" pointers.

Overflow Page Chain vs B+ Tree Leaf Chain

Step 1 of 2

Linked List (overflow pages)