Back to DAG

B+ Tree

databases

What is a B+ Tree?

A B+ Tree is a variant of the B-Tree that has become the default index structure in virtually every major relational database — PostgreSQL, MySQL InnoDB, SQLite, Oracle, and SQL Server all use B+ Trees for their primary indexes.

Key difference from B-Trees

In a standard B-Tree, both internal nodes and leaf nodes can store data (key-value pairs). In a B+ Tree:

  • Internal nodes store only keys — they are pure routing/index nodes. They contain keys and child pointers, but no data values.
  • ALL data lives in the leaf nodes — every key-value pair is stored at the leaf level.
  • Leaf nodes are linked together in a doubly-linked list, enabling efficient sequential access.

Why this design is superior for databases

1. Higher fanout in internal nodes. Because internal nodes do not store values (which can be large — think row pointers, or in clustered indexes, entire rows), more keys fit per page. More keys per node means higher fanout, which means fewer tree levels, which means fewer disk I/Os per lookup.

2. Efficient range scans. Finding all users with ages between 25 and 35 requires: (a) one root-to-leaf traversal to find the first leaf with age >= 25, then (b) a sequential scan through the linked leaf list until age > 35. No backtracking up the tree is needed. This is dramatically faster than performing individual searches for each key in the range.

3. Predictable performance. Every search traverses exactly the same number of levels (root to leaf), because all data is at the leaf level. In a regular B-Tree, some searches terminate at internal nodes while others must go to leaves, making performance less uniform.

Structure

  • Internal node: [P₀ | K₁ | P₁ | K₂ | P₂ | ... | Kₙ | Pₙ] where Pᵢ is a child pointer and Kᵢ is a routing key. All keys in subtree Pᵢ₋₁ are < Kᵢ, and all keys in subtree Pᵢ are >= Kᵢ.
  • Leaf node: [K₁,V₁ | K₂,V₂ | ... | Kₙ,Vₙ | →next] where each entry is a key-value pair and →next points to the next leaf.

Search: O(log_m(n))

Same as B-Tree traversal, but the search always ends at a leaf node.

Range query: O(log_m(n) + k)

Where k is the number of keys in the range. The log_m(n) is for finding the start leaf, and k is the sequential scan through linked leaves.

Real-Life: SQL Range Query Execution

Real-World Example

Consider running SELECT * FROM orders WHERE total BETWEEN 100 AND 500 ORDER BY total on a table with 10 million rows and a B+ Tree index on total.

Without the B+ Tree index (full table scan):

  • Read all ~125,000 data pages (at 80 rows per 8KB page).
  • Filter each row. Return matching ones. Then sort them.
  • Cost: 125,000 disk reads + sort overhead.

With the B+ Tree index:

  1. Traverse from root to the leaf containing total = 100. This takes 3–4 page reads.
  2. Scan right through the linked leaf list, collecting all entries where total <= 500. If 50,000 orders match, and each leaf holds 200 entries, that is about 250 leaf pages.
  3. For each matching entry, use the stored Record ID (page_id, slot) to fetch the actual row from the heap file.
  4. Results come out already sorted by total — no extra sort needed!
  • Cost: ~4 + 250 + 50,000 heap reads (or much fewer with sequential I/O optimizations).

MySQL InnoDB's clustered index is a B+ Tree where the leaf nodes store the entire row, not just a pointer. This means range scans on the primary key do not need a separate heap fetch — the data is right there in the leaves.

B+ Tree with Linked Leaves

B+ Tree — internal nodes are routing only, all data in leaves Range scan [20..45] follows green arrows 30 60 routing keys only 15 22 40 50 70 85 5 10 d1 d2 15 20 22 d3 d4 d5 30 35 d6 d7 40 45 d8 d9 50 55 d10 d11 60 70 85 Range [20..45]: find leaf with 20, scan right through linked leaves Leaves with green borders are visited during the range scan.
Step 1 of 3