Back to DAG

Composite Index

databases

What is a Composite Index?

A composite index (also called a multi-column index or compound index) is a B+ tree index built on two or more columns. The keys in the B+ tree are sorted first by the first column, then by the second column within ties of the first, and so on — exactly like a phone book sorted by last name, then first name.

The Leftmost Prefix Rule

This is the single most important concept for composite indexes. An index on (a, b, c) can efficiently satisfy queries that filter on:

  • (a) alone
  • (a, b)
  • (a, b, c)

But it cannot efficiently satisfy queries that filter only on:

  • (b) alone — the index is not sorted by b in isolation
  • (c) alone
  • (b, c) — same problem

The index can only be used when the query starts from the leftmost column and uses contiguous columns from the left.

Equality vs Range Predicates

Column order in a composite index is critical. Consider an index on (a, b, c):

  • WHERE a = 1 AND b = 5 AND c = 10 — all three columns used (equality on all).
  • WHERE a = 1 AND b > 5 — both columns used. a narrows to a = 1, then b > 5 uses the sorted order within a = 1.
  • WHERE a > 1 AND b = 5 — only a is used for the range scan. Once a is a range, the index cannot use b because within the range a > 1, the b values are interleaved across different a values.

Rule of thumb: put equality columns first, range columns last. Put the most selective equality column first to narrow down candidates quickly.

Column Order Design Strategy

When designing a composite index:

  1. Identify the columns in your WHERE clause.
  2. Separate them into equality predicates (=) and range predicates (>, <, BETWEEN, LIKE 'prefix%').
  3. Place equality columns first (in order of selectivity — most selective first).
  4. Place the range column last (only one range column can be used per index scan).
  5. If ORDER BY follows the same column order, the index also avoids a sort.

Index Skip Scan

Some databases (Oracle, MySQL 8.0.13+, PostgreSQL 13+) support index skip scan, which can use a composite index even when the leftmost column is not in the WHERE clause — but only if the leftmost column has very few distinct values. The optimizer effectively does a separate scan for each distinct value of the first column. This is an optimization, not a replacement for proper index design.

Real-Life: Designing the Right Composite Index

Real-World Example

A SaaS application has a logs table with millions of rows:

CREATE TABLE logs (
  id BIGINT PRIMARY KEY,
  tenant_id INT,
  severity VARCHAR(10),    -- 'INFO', 'WARN', 'ERROR'
  created_at TIMESTAMP,
  message TEXT
);

The most common query:

SELECT * FROM logs
WHERE tenant_id = ? AND severity = 'ERROR'
  AND created_at > NOW() - INTERVAL '1 hour'
ORDER BY created_at DESC
LIMIT 50;

Index design using the equality-first, range-last rule:

  • Equality columns: tenant_id, severity (both = comparisons)
  • Range column: created_at (> comparison)
  • ORDER BY: created_at DESC (matches index order if placed last)

Best index: CREATE INDEX idx_logs ON logs(tenant_id, severity, created_at DESC)

Why this works:

  1. B+ tree seeks to tenant_id = ? — skips all other tenants.
  2. Within that tenant, seeks to severity = 'ERROR' — skips INFO and WARN.
  3. Within ERROR for that tenant, scans created_at backward from now — pure sequential read.
  4. ORDER BY created_at DESC is already satisfied by the index order — no sort step.
  5. LIMIT 50 stops after 50 rows — very fast.

Wrong index: (created_at, tenant_id, severity) — the range column first means the index cannot efficiently narrow by tenant_id and severity.

Composite Index: Leftmost Prefix and Column Order

Composite Index on (a, b, c) — Sorted B+ Tree Leaf Order a=1 b=2 c=5 a=1 b=2 c=8 a=1 b=5 c=1 a=2 b=1 c=3 a=2 b=1 c=7 a=2 b=3 c=2 a=3 b=1 c=4 a=3 b=2 c=9 Leftmost Prefix Rule: WHERE a=1 uses (a) OK WHERE a=1 AND b=2 uses (a,b) OK WHERE a=1 AND b=2 AND c=5 uses (a,b,c) OK WHERE b=2 cannot use idx NO WHERE c=5 cannot use idx NO WHERE b=2 AND c=5 cannot use idx NO Equality First, Range Last: WHERE a=1 AND b>5 -> a (equality), then b (range) -> both cols used WHERE a>1 AND b=5 -> a (range) stops further use -> only a used, b ignored
Step 1 of 3