Anatomy of a B-Tree Node
A B-tree node is the fundamental building block of B-tree and B+ tree indexes — the most widely used index structure in databases. Understanding the internal layout of a single node is essential before studying full B-tree operations.
Node structure
A B-tree node contains:
- An array of keys: stored in sorted order. A node with k keys divides the key space into k+1 regions.
- An array of child pointers (for internal nodes): a node with k keys has k+1 children. Child pointer i points to the subtree containing all keys between key[i-1] and key[i].
- An array of values/record pointers (for leaf nodes): each key is associated with a value or a pointer to the actual record.
- Metadata: whether the node is a leaf, the current number of keys, a pointer to the parent (optional).
Order and degree
The minimum degree (often called t) defines the capacity constraints:
- Each non-root node has at least t-1 keys and at most 2t-1 keys.
- Each non-root internal node has at least t children and at most 2t children.
- The root can have as few as 1 key (or 0 if the tree is empty).
For example, with t=3: each node has 2-5 keys and 3-6 children. A typical database page (8 KB) can fit hundreds of keys, giving t values in the hundreds.
Key search within a node
When the B-tree search arrives at a node, it must find where the target key falls among the node's sorted keys. Two strategies:
- Linear scan: O(k) where k is the number of keys. Simple, but slow for large nodes.
- Binary search: O(log k). For nodes with hundreds of keys (which is typical), this is significantly faster.
The search determines either:
- The key is found at index i → return the associated value (or descend to the correct child in a B+ tree).
- The key falls between key[i-1] and key[i] → descend into child[i].
Why high fanout matters
A B-tree with minimum degree t has a maximum fanout of 2t. A node that fits in a single disk page can hold hundreds of keys, giving a fanout of hundreds. This means:
- A tree with fanout 500 can index 500^3 = 125 million keys in just 3 levels.
- Each level requires one disk I/O, so any key can be found in 3 disk reads.
- Compare this to a binary search tree: log2(125 million) = 27 levels = 27 disk reads.
This is why B-trees dominate database indexing: they minimize the number of disk I/Os by maximizing the number of keys per node (per disk page).
Node types
- Internal node: has keys and child pointers, no record data (in B+ trees) or has record data (in B-trees).
- Leaf node: has keys and record pointers/values, no children. In B+ trees, leaf nodes also have sibling pointers for range scans.
- Root node: can be internal or leaf. In a small tree, the root is the only node and is a leaf.
Real-Life: PostgreSQL B-Tree Index Page
In PostgreSQL, a B-tree index page (8 KB) is laid out using the slotted page format, with index tuples instead of heap tuples.
Internal node page:
- Each index tuple contains: a key value + a pointer to a child page.
- With 8-byte keys and 6-byte page pointers, each entry is ~14 bytes + overhead.
- An 8 KB page can fit roughly 300-400 entries — that is the fanout.
- A 3-level tree can index 400^3 = 64 million rows.
Leaf node page:
- Each index tuple contains: a key value + a TID (tuple identifier = ctid pointing to the heap page and slot).
- Leaf pages are linked in a doubly-linked list for range scans.
Binary search within a page:
PostgreSQL performs binary search over the index tuples within a page to find the target key or the correct child pointer. This is the _bt_binsrch() function in the source code.
Example: searching for key 42 in a 3-level B-tree:
- Read root page. Binary search finds 42 is between key 30 and key 50. Follow child pointer for that range.
- Read internal page. Binary search finds 42 is between key 40 and key 45. Follow child pointer.
- Read leaf page. Binary search finds key 42. Return the TID (page 17, slot 3). Fetch the heap tuple.
Total: 3 index page reads + 1 heap page read = 4 I/Os.