Back to DAG

Covering Index

databases

What is a Covering Index?

A covering index is an index that contains all the columns a query needs, allowing the database to answer the query entirely from the index without ever accessing the underlying table data (the heap or clustered index). When the query planner detects this, it performs an index-only scan — a major performance optimization.

Why it matters

Recall that a normal secondary index lookup in InnoDB requires a double lookup: search the secondary index to get the primary key, then search the clustered index to get the full row. This second lookup is called a bookmark lookup or table access by index rowid. If the query only needs columns that are already in the secondary index, the bookmark lookup is completely unnecessary.

For example, if you have an index on (email) and run SELECT email FROM users WHERE email = 'alice@example.com', the database can return email directly from the index leaf node without touching the clustered index at all.

The INCLUDE Clause (PostgreSQL)

PostgreSQL supports the INCLUDE clause, which adds non-key columns to the index leaf nodes without affecting the sort order:

CREATE INDEX idx_email_inc ON users(email) INCLUDE (name, created_at);

This index is sorted by email (the key column), and each leaf node also stores name and created_at as payload. A query like SELECT name, created_at FROM users WHERE email = 'alice@example.com' becomes an index-only scan. The included columns are NOT part of the sort key, so they do not affect the index's ordering or range-scan behavior.

How to identify covering index opportunities

In EXPLAIN output, look for:

  • PostgreSQL: "Index Only Scan" (covering) vs "Index Scan" (non-covering, requires heap fetch)
  • MySQL: "Using index" in the Extra column means covering
  • SQL Server: "Index Seek" with no "Key Lookup" operator

Trade-offs: Wide vs Narrow Indexes

Making an index wider (more columns) means:

  • Larger index size on disk and in the buffer pool
  • More maintenance cost on writes (every INSERT/UPDATE/DELETE touches more data)
  • Faster reads for queries that are covered

A narrow index (1-2 columns) is small and cheap to maintain but rarely covers queries. A wide index (many columns or with INCLUDE) is expensive to maintain but eliminates table lookups for specific queries. The decision depends on the workload: read-heavy systems benefit more from covering indexes.

Composite Index as Covering Index

A composite index on (a, b, c) naturally covers any query that only needs columns a, b, and/or c. This is a common pattern: designing a composite index that serves both as a search index AND a covering index for the most frequent queries.

Real-Life: Eliminating Bookmark Lookups

Real-World Example

An e-commerce dashboard runs this query thousands of times per minute:

SELECT order_id, total, status
FROM orders
WHERE customer_id = ?
ORDER BY created_at DESC
LIMIT 20;

Without a covering index: A secondary index on (customer_id, created_at) speeds up the lookup and ordering, but the database must do a bookmark lookup for each of the 20 rows to retrieve order_id, total, and status from the clustered index. That is 20 extra random I/O operations.

With a covering index:

CREATE INDEX idx_orders_covering
ON orders(customer_id, created_at DESC)
INCLUDE (order_id, total, status);

Now the index contains everything the query needs. The database:

  1. Seeks to customer_id = ? in the index.
  2. Reads the first 20 entries (already sorted by created_at DESC).
  3. Returns order_id, total, status directly from the index leaf.
  4. Zero bookmark lookups. All data comes from the index.

Impact: This reduced p99 query latency from 12ms to 0.8ms in a real production system — a 15x improvement — because it eliminated 20 random disk reads per query.

Covering Index: Index-Only Scan vs Bookmark Lookup

Non-Covering: SELECT name, email FROM users WHERE email = ? Index on (email) only Secondary Index email='alice@' PK=42 (no name column!) PK=42 Clustered Index id=42 Alice alice@... (fetch full row for name) 2 B+ tree lookups Covering: SELECT name, email FROM users WHERE email = ? Index on (email) INCLUDE (name) Covering Index email='alice@' PK=42 name='Alice' (all columns present!) 1 lookup, index-only! No table access needed PostgreSQL INCLUDE syntax: CREATE INDEX idx ON users(email) INCLUDE (name, created_at); Key columns: email (searchable, sorted) Included columns: name, created_at (payload only, not sorted) EXPLAIN shows: "Index Only Scan"
Step 1 of 2