Back to DAG

Vector Index (ANN)

databases

Approximate Nearest Neighbor Search

Modern AI applications represent text, images, and audio as embedding vectors — dense arrays of floating-point numbers (typically 256 to 1536 dimensions). Finding the most similar items means finding the nearest neighbors in this high-dimensional space. The naive approach — computing the distance from the query vector to every stored vector — is O(n * d) where n is the number of vectors and d is the dimensionality. For a database of 10 million 1536-dimensional vectors, that is 15 billion floating-point operations per query. This is far too slow for real-time applications.

ANN: Trading Accuracy for Speed

Approximate Nearest Neighbor (ANN) indexes solve this by accepting a small accuracy loss in exchange for orders-of-magnitude speedup. Instead of guaranteeing the exact top-k nearest vectors, they return a set that is correct with high probability (measured by recall@k — the fraction of true top-k results that appear in the ANN result).

Distance Metrics

The choice of distance metric depends on the embedding model:

MetricFormulaUse case
L2 (Euclidean)sqrt(sum((a_i - b_i)^2))General purpose, raw magnitude matters
Cosine similaritydot(a,b) / (norm(a) * norm(b))Text embeddings (direction matters, not magnitude)
Inner productdot(a,b)When vectors are already normalized (equivalent to cosine)

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph where each layer is a "small world" network — most nodes have short-range connections, but a few have long-range connections that enable fast traversal. Search starts at the top (sparsest) layer, greedily navigates to the region nearest the query, then descends to denser layers for refinement.

  • Pros: best recall@k among ANN methods, no training phase, supports incremental inserts.
  • Cons: high memory usage (stores the graph in RAM), slow to build for very large datasets.
  • Typical performance: 95–99% recall@10 with sub-millisecond latency on millions of vectors.

IVF (Inverted File Index) + PQ (Product Quantization)

IVF partitions the vector space into clusters using k-means. At query time, only the nprobe nearest clusters are searched instead of the entire dataset.

Product Quantization compresses each vector by splitting it into subvectors and encoding each with a small codebook. A 1536-dim float32 vector (6 KB) can be compressed to ~48 bytes — a 128x reduction.

  • Pros: low memory (compressed vectors), tunable recall via nprobe.
  • Cons: requires a training phase on representative data, lower recall than HNSW at equal latency.

Scalar Quantization

A simpler compression: convert each float32 component to int8 (or float16), reducing memory 4x (or 2x) with minimal recall loss. Often combined with HNSW.

pgvector for PostgreSQL

The pgvector extension adds vector storage and ANN search directly to PostgreSQL. It supports two index types:

  • ivfflat: IVF-based, requires training via CREATE INDEX ... USING ivfflat
  • hnsw: graph-based, no training needed, higher recall

This lets you combine vector search with standard SQL filters in a single query — for example, find the 10 most similar products that are also in stock and under $50.

Recall@k

The standard quality metric for ANN indexes: run the same query with exact kNN and with the ANN index, then compute the overlap. Recall@10 = 0.95 means 9.5 out of 10 results are the true nearest neighbors.

Real-Life: Semantic Search with Embeddings

Real-World Example

A knowledge base has 2 million support articles. A user types: "my laptop won't turn on after a Windows update." Keyword search for "laptop won't turn on" might miss articles titled "Computer fails to boot post-update" because the exact words differ.

Semantic search with vector embeddings:

  1. Indexing: each article is passed through an embedding model (e.g., OpenAI text-embedding-3-small, 1536 dimensions). The resulting vector captures the article's semantic meaning. Store vectors in pgvector with an HNSW index.

  2. Query: the user's question is embedded with the same model. pgvector finds the 10 nearest vectors using the HNSW index — sub-millisecond even with 2 million vectors.

  3. Result: "Computer fails to boot post-update" ranks highly because its embedding vector is close to the query's vector in semantic space, even though they share zero keywords.

Hybrid search: combine BM25 keyword scores with vector similarity scores using Reciprocal Rank Fusion (RRF). This catches both exact keyword matches and semantic matches, giving better results than either alone.

Scale considerations: at 10M+ vectors, a single pgvector instance may struggle. Purpose-built vector databases like Pinecone, Weaviate, or Milvus shard vectors across nodes and optimize for this workload.

HNSW Multi-Layer Graph Search

HNSW: hierarchical graph traversal from sparse → dense layers Layer 2 (sparse — long-range links) greedy hop Layer 1 (medium density) Layer 0 (dense — all vectors) query Search steps: 1. Enter at Layer 2 2. Greedy walk to nearest 3. Descend to Layer 1 4. Refine with more nodes 5. Descend to Layer 0 6. Final nearest neighbors Complexity: O(log n) hops average vs O(n*d) brute force Result: top-k in <1ms Each layer is a navigable small world graph. Top layers have few nodes (long jumps), bottom has all nodes (fine-grained). HNSW typically achieves 95-99% recall@10 with sub-millisecond latency on millions of vectors.
Step 1 of 2