What are IVF and Product Quantization?
Inverted File Index (IVF) and Product Quantization (PQ) are complementary techniques for scaling approximate nearest neighbor search to billions of vectors. IVF reduces the number of candidates; PQ compresses vectors so distance computations fit in memory.
IVF: Inverted File Index
IVF partitions the vector space into clusters using k-means:
- Training phase: run k-means on a representative sample to produce K centroids (typically K = 256 to 65536).
- Index phase: assign each database vector to its nearest centroid. Store vectors in inverted lists — one list per centroid, containing all vectors assigned to that cluster.
- Query phase: compute the distance from the query to all K centroids. Select the nprobe closest centroids. Search only the inverted lists of those clusters.
The key parameter is nprobe — the number of clusters searched at query time. With K=1024 and nprobe=10, you search only ~1% of the data. Higher nprobe increases recall but slows the query.
Product Quantization (PQ)
PQ compresses high-dimensional vectors to dramatically reduce memory usage and speed up distance computation:
- Split each D-dimensional vector into M subvectors of dimension D/M. For example, a 128-dim vector split into M=8 subvectors of 16 dimensions each.
- Train a small codebook (typically 256 centroids) for each of the M subspaces using k-means.
- Encode each subvector as the index of its nearest centroid — a single byte. A 128-dim float vector (512 bytes) becomes M=8 bytes.
- Distance computation uses Asymmetric Distance Computation (ADC): precompute a distance table from the query subvectors to all codebook centroids, then sum up M table lookups per candidate.
IVF + PQ combined
The standard pipeline for billion-scale search:
- Coarse quantizer (IVF): quickly identifies the nprobe most relevant clusters.
- Fine quantizer (PQ): within each cluster, vectors are PQ-encoded. Distance computation uses the precomputed distance table — just M byte lookups and additions per vector.
- The result is a system that searches billions of vectors in milliseconds using a few GB of RAM.
Residual encoding
In IVF+PQ, vectors are often encoded as residuals — the difference between the vector and its cluster centroid. This improves PQ accuracy because residuals have smaller variance and are easier to quantize.
Complexity
| Operation | Time | Space per vector |
|---|---|---|
| Brute force | O(n * D) | D * 4 bytes (float32) |
| IVF only | O(nprobe/K * n * D) | D * 4 bytes |
| IVF + PQ | O(K + nprobe/K * n * M) | M bytes |
Where IVF+PQ is used
- Faiss (Meta): the IVFPQ index is the workhorse for billion-scale search.
- Milvus: supports IVF_PQ index type for large-scale deployments.
- Spotify: uses IVF+PQ (via Annoy/Faiss) for music recommendation at scale.
Real-Life: Billion-Scale Image Search
A reverse image search engine indexes 2 billion images, each represented as a 256-dimensional embedding vector.
Brute force is impossible: 2 billion * 256 floats * 4 bytes = 2 TB of vectors. Even if they fit in memory, scanning all 2 billion vectors per query takes seconds.
IVF+PQ solution (Faiss):
import faiss
# Train IVF+PQ index
d = 256 # vector dimension
K = 65536 # number of clusters
M = 32 # number of PQ subquantizers (32 bytes per vector)
nprobe = 64 # clusters to search at query time
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(quantizer, d, K, M, 8)
index.train(training_vectors)
index.add(all_vectors)
# Query
index.nprobe = nprobe
distances, indices = index.search(query_vector, k=10)
Memory savings: 2 billion vectors * 32 bytes = 64 GB (fits on one machine!) vs. 2 TB for raw floats.
Speed: With nprobe=64 and K=65536, only 0.1% of vectors are scanned. PQ makes each distance computation ~10x faster than float32. Result: < 10ms per query.
Recall tradeoff: At nprobe=64, recall@10 is typically 85-95%. Increasing to nprobe=256 pushes recall above 95% at ~4x latency cost.