What is a B-Tree?
A B-Tree is a self-balancing, multi-way search tree designed specifically for systems that read and write large blocks of data — most importantly, disk-based storage systems like databases and file systems.
Why not a binary tree?
A binary search tree has at most 2 children per node, meaning its height is O(log₂ n). For a table with 1 billion rows, that is about 30 levels — and each level is a separate disk I/O. Disk seeks take ~10ms, so 30 levels means 300ms per lookup. Unacceptable.
A B-Tree solves this by using a high fanout: each node can hold hundreds or even thousands of keys. A node is sized to fit exactly one disk page (typically 4–16 KB). With a fanout of 500, a 3-level B-Tree can index 500³ = 125 million keys with only 3 disk reads.
Structure and properties
A B-Tree of order m (also called the branching factor) satisfies:
- Every node has at most m children and m-1 keys.
- Every non-root internal node has at least ⌈m/2⌉ children (minimum occupancy guarantee).
- All leaves are at the same depth — the tree is perfectly balanced.
- Keys within a node are stored in sorted order.
Search algorithm
- Start at the root node.
- Binary search (or linear scan) within the node's keys to find the correct child pointer.
- Follow that child pointer to the next level.
- Repeat until you reach a leaf. If the key is found at any level, return it.
The height of a B-Tree is O(log_m(n)) where m is the order and n is the number of keys. With m = 1000, even 1 trillion keys require only 4 levels.
Insert algorithm (simplified)
- Search for the correct leaf node.
- Insert the key in sorted position within the leaf.
- If the leaf now has more than m-1 keys, it overflows and must be split (covered in the Split & Merge tutorial).
Why B-Trees dominate database indexing
- Minimizes disk I/O: each node = one page read.
- High fanout: fewer levels than binary trees.
- Sorted keys: supports both equality and range queries.
- Balanced: worst-case search is still O(log_m(n)).
- Used in PostgreSQL, MySQL, SQLite, Oracle, SQL Server, and virtually every RDBMS.
Real-Life: Database Index Lookup
When you run SELECT * FROM users WHERE id = 42 on a table with a B-Tree index on id, here is what happens:
- The database reads the root page of the B-Tree index from the buffer pool (likely already cached in RAM).
- The root contains keys like [100, 500, 900]. Since 42 < 100, follow the leftmost child pointer.
- The next node contains [20, 50, 80]. Since 42 is between 20 and 50, follow the second child pointer.
- The leaf node contains [35, 38, 42, 47]. Binary search finds 42, and the associated value is a pointer (Record ID) to the actual row on the heap page.
- The database reads that heap page and returns the row.
Total disk reads: 3–4 (root + 1–2 internal levels + leaf + heap page). Compare this to a full table scan that might read thousands of pages.
File systems use B-Trees too. The ext4 file system uses an HTree (a specialized B-Tree variant) for directory indexing, and NTFS uses B+Trees for its Master File Table. Btrfs is literally named after B-Trees.