What is a Bitmap Index?
A bitmap index creates one bit vector (bitmap) for every distinct value in a column. Each bitmap has N bits — one per row in the table. Bit i is set to 1 if row i has that value, and 0 otherwise.
Example
Consider a table with 8 rows and a column color with values {red, blue, green}:
| Row | color |
|---|---|
| 0 | red |
| 1 | blue |
| 2 | red |
| 3 | green |
| 4 | blue |
| 5 | red |
| 6 | green |
| 7 | blue |
Three bitmaps are created:
- red:
1 0 1 0 0 1 0 0 - blue:
0 1 0 0 1 0 0 1 - green:
0 0 0 1 0 0 1 0
Query evaluation with bitwise operations
Bitmap indexes shine for multi-predicate queries. To evaluate WHERE color = 'red' AND size = 'large':
- Fetch the
redbitmap from the color index. - Fetch the
largebitmap from the size index. - Compute bitwise AND: the result bitmap has 1s only where both conditions are true.
- Scan the result bitmap for set bits — those are the matching row IDs.
For OR predicates (WHERE color = 'red' OR color = 'blue'), compute bitwise OR of the two bitmaps.
The power of bitmaps is that these operations process 64 rows at a time on a 64-bit CPU (one machine word), making them extremely fast.
When bitmap indexes excel
- Low cardinality columns: columns with few distinct values (gender, status, country, color). Each bitmap is compact.
- Data warehouse queries: complex filters combining many predicates with AND/OR. Each predicate is a single bitmap lookup; combining them is trivial bitwise operations.
- Read-heavy workloads: bitmap indexes are fast to scan but expensive to update (inserting a row requires updating every bitmap in the index).
When NOT to use bitmap indexes
- High cardinality columns (e.g., email, UUID): one bitmap per value means millions of sparse bitmaps — wasteful.
- OLTP workloads: frequent INSERT/UPDATE/DELETE causes constant bitmap rebuilding. Row-level locking on bitmaps is impractical.
Bitmap compression
Raw bitmaps for large tables waste space (millions of bits, mostly zeros for rare values). Compression techniques:
- Run-Length Encoding (RLE): encode runs of consecutive 0s or 1s as (value, length) pairs. Highly effective for sorted data.
- Roaring Bitmaps: hybrid structure — uses arrays for sparse chunks, bitmaps for dense chunks, and RLE for runs. Used in Apache Lucene, Apache Druid, and Apache Spark.
- Word-Aligned Hybrid (WAH): compresses at word boundaries for fast bitwise operations without decompression.
Where bitmap indexes are used
- Oracle Database: explicit
CREATE BITMAP INDEXstatement for star schema queries. - Apache Druid: bitmap indexes on every dimension column by default for sub-second OLAP queries.
- Apache Lucene/Elasticsearch: posting lists stored as Roaring Bitmaps for inverted index intersection.
- ClickHouse: bitmap functions for set operations in analytics.
Real-Life: E-Commerce Product Filtering
When you filter products on an e-commerce site ("Brand: Nike AND Size: 10 AND Color: Black AND Price: Under $100"), the backend often uses bitmap indexes.
Without bitmaps: For each filter, scan the table or use a separate B-tree index, then intersect results in memory. With 4 filters, that could mean 4 index lookups and a complex merge.
With bitmaps:
- Fetch bitmap for
brand = 'Nike'→1 0 1 1 0 0 1 0 ... - Fetch bitmap for
size = 10→0 0 1 0 0 1 1 0 ... - Fetch bitmap for
color = 'Black'→1 0 1 0 0 0 1 0 ... - Fetch bitmap for
price < 100→1 1 1 1 0 0 0 0 ... - AND all four:
0 0 1 0 0 0 0 0 ...— only row 2 matches.
This takes microseconds because each AND is a single CPU instruction per 64 rows.
Apache Druid uses this exact approach for real-time analytics dashboards. A query like "count orders by region where status='shipped' and category='electronics'" resolves to bitmap intersections followed by a count of set bits (popcount), all in sub-second time even on billions of rows.