Back to DAG

B-Tree

databases

What is a B-Tree?

A B-Tree is a self-balancing, multi-way search tree designed specifically for systems that read and write large blocks of data — most importantly, disk-based storage systems like databases and file systems.

Why not a binary tree?

A binary search tree has at most 2 children per node, meaning its height is O(log₂ n). For a table with 1 billion rows, that is about 30 levels — and each level is a separate disk I/O. Disk seeks take ~10ms, so 30 levels means 300ms per lookup. Unacceptable.

A B-Tree solves this by using a high fanout: each node can hold hundreds or even thousands of keys. A node is sized to fit exactly one disk page (typically 4–16 KB). With a fanout of 500, a 3-level B-Tree can index 500³ = 125 million keys with only 3 disk reads.

Structure and properties

A B-Tree of order m (also called the branching factor) satisfies:

  • Every node has at most m children and m-1 keys.
  • Every non-root internal node has at least ⌈m/2⌉ children (minimum occupancy guarantee).
  • All leaves are at the same depth — the tree is perfectly balanced.
  • Keys within a node are stored in sorted order.

Search algorithm

  1. Start at the root node.
  2. Binary search (or linear scan) within the node's keys to find the correct child pointer.
  3. Follow that child pointer to the next level.
  4. Repeat until you reach a leaf. If the key is found at any level, return it.

The height of a B-Tree is O(log_m(n)) where m is the order and n is the number of keys. With m = 1000, even 1 trillion keys require only 4 levels.

Insert algorithm (simplified)

  1. Search for the correct leaf node.
  2. Insert the key in sorted position within the leaf.
  3. If the leaf now has more than m-1 keys, it overflows and must be split (covered in the Split & Merge tutorial).

Why B-Trees dominate database indexing

  • Minimizes disk I/O: each node = one page read.
  • High fanout: fewer levels than binary trees.
  • Sorted keys: supports both equality and range queries.
  • Balanced: worst-case search is still O(log_m(n)).
  • Used in PostgreSQL, MySQL, SQLite, Oracle, SQL Server, and virtually every RDBMS.

Real-Life: Database Index Lookup

Real-World Example

When you run SELECT * FROM users WHERE id = 42 on a table with a B-Tree index on id, here is what happens:

  1. The database reads the root page of the B-Tree index from the buffer pool (likely already cached in RAM).
  2. The root contains keys like [100, 500, 900]. Since 42 < 100, follow the leftmost child pointer.
  3. The next node contains [20, 50, 80]. Since 42 is between 20 and 50, follow the second child pointer.
  4. The leaf node contains [35, 38, 42, 47]. Binary search finds 42, and the associated value is a pointer (Record ID) to the actual row on the heap page.
  5. The database reads that heap page and returns the row.

Total disk reads: 3–4 (root + 1–2 internal levels + leaf + heap page). Compare this to a full table scan that might read thousands of pages.

File systems use B-Trees too. The ext4 file system uses an HTree (a specialized B-Tree variant) for directory indexing, and NTFS uses B+Trees for its Master File Table. Btrfs is literally named after B-Trees.

B-Tree Structure (Order 4)

B-Tree of order 4 (max 3 keys per node) Search path for key 42 shown in green 30 60 42 > 30 10 20 35 45 50 70 80 42 > 35 38 42 Found! 44 48 52 55 Height = 3 levels. Only 3 disk I/Os to find any key among dozens of entries. Real B-Trees: order 500+, billions of keys in 3-4 levels.

Interactive B-Tree

Loading demo...
Step 1 of 3