Back to DAG

Secondary Index

databases

What is a Secondary Index?

A secondary index is an additional B+ tree built on a non-primary-key column to speed up queries that filter or sort on that column. Unlike the clustered index (which stores the actual row data), a secondary index's leaf nodes contain the indexed column value and a pointer back to the row.

How the pointer works (InnoDB)

In InnoDB, the secondary index leaf does not store a physical disk address. Instead, it stores the primary key value of the row. To retrieve the full row, the database must perform a second lookup in the clustered index using that primary key. This is called a bookmark lookup or double lookup:

  1. Search the secondary index B+ tree for the value (e.g., email = 'alice@example.com').
  2. Get the primary key from the leaf node (e.g., id = 42).
  3. Search the clustered index B+ tree for id = 42 to retrieve the full row.

This design means secondary indexes remain valid even when pages in the clustered index are split or reorganized — the primary key value does not change.

In PostgreSQL (heap-based)

In PostgreSQL, where tables are stored as heap files, secondary indexes store a ctid (a physical tuple identifier: page number + offset). This avoids the double B+ tree lookup, but if the heap tuple moves (e.g., during an UPDATE with MVCC), the index entry becomes stale and must be updated or a redirect chain must be followed. PostgreSQL's HOT (Heap-Only Tuple) optimization avoids updating indexes when the indexed columns have not changed.

Index maintenance cost

Every secondary index must be maintained on writes. For each INSERT, the database must insert an entry into every secondary index on the table. For each DELETE, it must remove entries. For an UPDATE that modifies an indexed column, the old entry must be removed and a new one inserted. A table with 5 secondary indexes means every single INSERT does 6 B+ tree insertions (1 clustered + 5 secondary).

When to create secondary indexes

Create a secondary index when:

  • Queries frequently filter (WHERE), join (JOIN ON), or sort (ORDER BY) on a column.
  • The column has high selectivity — it narrows down to a small fraction of rows.

Do NOT create secondary indexes blindly — each one slows down writes, consumes storage, and requires buffer pool memory. A common anti-pattern is indexing every column "just in case."

Real-Life: Looking Up Users by Email

Real-World Example

Consider a users table clustered on id (auto-increment):

CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  email VARCHAR(255),
  name VARCHAR(100),
  created_at DATETIME
);
CREATE INDEX idx_email ON users(email);

Query: SELECT * FROM users WHERE email = 'alice@example.com'

Without idx_email: full table scan — read every leaf page of the clustered index. O(n).

With idx_email:

  1. Search idx_email B+ tree for 'alice@example.com' — O(log n), perhaps 3 page reads.
  2. Find the stored primary key: id = 42.
  3. Search the clustered index for id = 42 — O(log n), perhaps 3 more page reads.
  4. Return the full row.

Total: ~6 random I/O operations instead of scanning millions of rows.

Write cost: Every INSERT into users must also insert into idx_email. If you have 5 indexes, each INSERT does 6 B+ tree modifications. On write-heavy tables, too many indexes can cut throughput by 50% or more.

Real-world systems:

  • E-commerce: index on orders(customer_id) for "show my orders"
  • Social media: index on posts(user_id, created_at) for "show feed"
  • SaaS: index on tenants(domain) for multi-tenant routing

Secondary Index: Double Lookup in InnoDB

Query: SELECT * FROM users WHERE email = 'alice@ex.com' Secondary Index (email) [ dave@ | kate@ ] alice@ -> PK 42 bob@ -> PK 17 carol@ -> PK 88 dave@ -> PK 5 eve@ -> PK 23 frank@ -> PK 91 1 Find 'alice@' in secondary index -> PK = 42 PK=42 Clustered Index (id) [ 30 | 60 ] id=5 Dave ... id=17 Bob ... id=42 Alice ... id=60 Eve ... id=88 Carol ... id=91 Frank ... 2 Look up PK=42 in clustered index -> full row Write Cost: INSERT into users (with 3 secondary indexes) 1. Insert row into clustered index B+ tree (id) 2. Insert into secondary index B+ tree (email) 3. Insert into secondary index B+ tree (name) 4. Insert into secondary index B+ tree (created_at) 4 B+ tree inserts!
Step 1 of 2