Linked Lists in Database Storage
In databases, linked lists appear not as in-memory node structures, but as chains of disk pages connected by page pointers. When a single page cannot hold all the data it needs to, the database chains a new overflow page and links them together.
Where overflow page chains appear
-
Hash index overflow buckets: In a static hash index, each bucket is a page. When a bucket fills up, a new overflow page is allocated and linked from the original. Searching a bucket means traversing the chain — each hop is potentially a random disk I/O.
-
TOAST chains: When a large column value is stored out-of-line (via TOAST in PostgreSQL), it is split into chunks stored across multiple pages. These chunks are accessed sequentially — effectively a singly-linked chain of data.
-
Large Object (LOB) storage: Binary large objects (images, documents) stored in a database are often broken into fixed-size chunks linked together.
-
Heap file free-space lists: Some databases organize pages into two linked lists: pages with free space and full pages. Inserting a record means grabbing the head of the free list.
Singly-linked vs doubly-linked
- Singly-linked overflow pages: each page has a "next page" pointer. Traversal is forward-only. Used in hash overflow chains and TOAST.
- Doubly-linked leaf pages: B+ tree leaf nodes are connected in a doubly-linked list so range scans can go in either direction. Each leaf has both a "next leaf" and "prev leaf" pointer.
The performance problem: pointer chasing
Every link traversal is a potential random disk I/O. If a hash bucket has 5 overflow pages, reading all entries requires 5 separate page fetches — and if those pages are scattered across the disk, each fetch may cost ~10 ms on an HDD. This is why databases strongly prefer B-trees over linked structures for indexes: a B-tree with fanout 500 can index billions of rows in just 3-4 levels (3-4 I/Os), while a linked chain of 1000 overflow pages requires 1000 I/Os.
When linked structures are acceptable
- Short chains: if overflow is rare (good hash function, low load factor), chains stay short (1-2 pages).
- Sequential access: if you always read the entire chain (e.g., reading a full LOB), the I/O pattern is sequential, which is much faster.
- B+ tree leaf chains: leaf-to-leaf traversal is sequential during range scans, and the database can prefetch the next leaf.
Complexity
| Operation | Singly-linked | Doubly-linked |
|---|---|---|
| Traverse next | O(1) per hop | O(1) per hop |
| Traverse prev | O(n) from head | O(1) per hop |
| Insert at head | O(1) | O(1) |
| Delete (given pointer) | O(n) | O(1) |
Each "O(1) per hop" may still be a disk I/O, which is why hop count matters enormously.
Real-Life: Hash Index Overflow in PostgreSQL
PostgreSQL's hash index maps hash(key) to a bucket page. When a bucket overflows, PostgreSQL allocates an overflow page and chains it:
Bucket 7 → Overflow Page 1 → Overflow Page 2 → null
To find key "alice" with hash("alice") = bucket 7:
- Read bucket page 7. Scan all entries. Not found.
- Follow pointer to overflow page 1. Scan entries. Not found.
- Follow pointer to overflow page 2. Scan entries. Found.
That's 3 random page reads. If the hash function distributes well and the table is not overloaded, most buckets have zero overflow pages, making lookups a single page read.
B+ tree leaf-level linked list:
In contrast, a B+ tree's leaf nodes form a doubly-linked list: ← Leaf[A-D] ↔ Leaf[E-K] ↔ Leaf[L-R] ↔ Leaf[S-Z] →
A range query like WHERE name BETWEEN 'F' AND 'P' descends the tree to find Leaf[E-K], then follows the "next" pointer to Leaf[L-R], reading just 2 leaves sequentially. The doubly-linked structure allows reverse scans (ORDER BY name DESC) by following "prev" pointers.