Translation Lookaside Buffer
Every memory access in a modern system uses virtual addresses. Before the CPU can read or write physical memory, the Memory Management Unit (MMU) must translate the virtual page number to a physical frame number by walking the page table -- a multi-level tree structure stored in main memory. On x86-64, this walk traverses four levels (PML4 -> PDPT -> PD -> PT), each requiring a separate memory read. At ~100 ns per read, a full page table walk costs ~400 ns -- far too expensive when the CPU issues billions of memory accesses per second.
The TLB: A Cache for Page Table Entries
The Translation Lookaside Buffer (TLB) is a small, fast hardware cache that stores recent virtual-to-physical page mappings. On a TLB hit, the translation completes in 1-2 cycles (sub-nanosecond), completely bypassing the page table walk.
TLB Structure
TLBs are typically fully associative -- any entry can map any virtual page. This maximizes hit rate but limits size (a fully associative lookup is expensive at large scale). Typical sizes:
| TLB Level | Entries | Page Size | Coverage |
|---|---|---|---|
| L1 DTLB | 64 | 4 KB | 256 KB |
| L1 ITLB | 64 | 4 KB | 256 KB |
| L2 STLB (unified) | 512-2048 | 4 KB | 2-8 MB |
| Huge page TLB | 32 | 2 MB | 64 MB |
With huge pages (2 MB or 1 GB), a single TLB entry covers vastly more memory, dramatically reducing TLB miss rates for large working sets.
TLB Miss Penalty
On a TLB miss, the MMU performs a hardware page table walk. If the page table levels are themselves cached in L1/L2 (which is common), the penalty is 10-30 cycles. If the page table entries must be fetched from main memory, the penalty is 200-400+ cycles. The worst case occurs when the page itself is not resident -- the OS must handle a page fault, which can cost millions of cycles for a disk I/O.
TLB Flush on Context Switch
Each process has its own page table (its own virtual address space). When the OS switches processes, the TLB entries from the old process are invalid for the new one. The naive approach is to flush the entire TLB on every context switch, but this is costly -- the new process starts with a cold TLB.
Modern hardware solves this with ASIDs (Address Space Identifiers) or Intel's PCIDs (Process-Context Identifiers). Each TLB entry is tagged with the ASID of the process that created it. On a context switch, the OS simply sets a new ASID; old entries remain but are not matched. This allows TLB entries to survive context switches, dramatically reducing the warm-up cost.
TLB Shootdown in Multi-Core Systems
When one core updates a page table mapping (e.g., the OS unmaps a page), other cores may still have the stale mapping in their local TLBs. The OS must perform a TLB shootdown: it sends an Inter-Processor Interrupt (IPI) to all affected cores, forcing them to invalidate the stale TLB entry. This is one of the most expensive operations in a multi-core OS -- it stalls all targeted cores until the invalidation completes. Reducing TLB shootdowns is a key optimization in high-performance systems (e.g., using RCU for page table updates, or batching invalidations).
TLB in Practice: Database Buffer Pools
Why databases use huge pages:
A database buffer pool of 128 GB mapped with 4 KB pages requires 33 million page table entries. Even a 2048-entry L2 TLB can only cover 8 MB -- a tiny fraction. Every random page access likely misses the TLB, adding 200+ cycles of page table walk overhead.
Switching to 2 MB huge pages reduces the entries needed to just 65,536. The same TLB can now cover 4 GB, and the miss rate drops by orders of magnitude. PostgreSQL, Oracle, and MySQL all recommend enabling huge pages for production buffer pools.
Other TLB-aware optimizations:
- JVM large pages (
-XX:+UseLargePages): Reduces GC pause-related TLB misses when scanning a multi-GB heap. - Linux transparent huge pages (THP): Automatically promotes 4 KB pages to 2 MB when possible, though THP compaction can cause latency spikes.
- DPDK and network stacks: Pin packet buffers to huge pages so per-packet TLB misses do not bottleneck 100 Gbps processing.