Back to DAG

Dirty Pages & Write-Back

databases

Dirty Pages and Write Strategies

When a database modifies a page in the buffer pool, that in-memory copy now differs from the on-disk version. The page is marked dirty — meaning it contains changes that have not yet been persisted to disk. Managing dirty pages correctly is essential for both performance and durability.

The dirty flag

Each frame in the buffer pool has a dirty bit. It is set to true when any transaction modifies the page (INSERT, UPDATE, DELETE). It is cleared to false after the page is successfully written back to disk. The dirty flag must use OR semantics during unpin — once a page is dirty, a subsequent unpin with isDirty=false must not clear it, because another transaction's changes may still be unflushed.

Write-back vs. write-through

Write-back (lazy writeback): modifications are made only in memory. The dirty page is written to disk later — either when it is evicted from the buffer pool or during a checkpoint. This is what most databases use because it batches I/O and avoids redundant writes when a page is modified multiple times between flushes.

Write-through: every modification is immediately written to disk. This is simpler and provides stronger durability guarantees per operation, but it is much slower because every write incurs a synchronous disk I/O. Some embedded databases and file systems use write-through for simplicity.

Checkpoints

A checkpoint is a periodic operation where the database flushes all (or some) dirty pages to disk. After a checkpoint completes, the database knows that all data up to that point is safely on disk. This bounds crash recovery time — the system only needs to replay the write-ahead log from the last checkpoint, not from the beginning of time.

Types of checkpoints:

  • Fuzzy checkpoint — flushes dirty pages in the background without stopping transactions. Most production databases use this.
  • Sharp checkpoint — pauses all activity, flushes everything, then resumes. Simpler but causes a pause.

Dirty page table

The dirty page table is a data structure (typically a hash set or list) that tracks which pages in the buffer pool are currently dirty. During recovery after a crash, the dirty page table (reconstructed from the WAL) tells the system which pages might be inconsistent on disk and need to be redone.

WAL and dirty pages

The write-ahead log (WAL) protocol requires that a log record describing a change is written to disk before the dirty page itself is written. This ensures that even if the system crashes during a page flush, the log contains enough information to redo or undo the change. The combination of WAL + lazy writeback gives databases both high performance (batch I/O) and full durability (no data loss on crash).

StrategyWrite latencyI/O volumeRecovery complexity
Write-backLow (in-memory)Low (batched)Needs WAL + checkpoint
Write-throughHigh (sync I/O)High (every write)Simpler recovery

Real-Life: Checkpoint in PostgreSQL

Real-World Example

PostgreSQL performs checkpoints at configurable intervals (default: every 5 minutes or after a configurable amount of WAL has been written). Here is what happens during a checkpoint:

  1. The checkpointer process scans the buffer pool for all dirty pages.
  2. It writes each dirty page to the corresponding file on disk, spreading the I/O over time to avoid sudden bursts (controlled by checkpoint_completion_target).
  3. Once all dirty pages are flushed, PostgreSQL writes a checkpoint record to the WAL.
  4. Old WAL segments (before the checkpoint) can now be recycled.

Why this matters:

  • Between checkpoints, modified pages live only in shared buffers (and the WAL). If the server crashes, PostgreSQL replays the WAL from the last checkpoint to reconstruct any dirty pages that were lost.
  • A longer checkpoint interval means fewer disk writes (better performance) but longer crash recovery time.
  • A shorter interval means faster recovery but more I/O overhead.

InnoDB (MySQL) uses a similar approach with its doublewrite buffer — before flushing dirty pages, it writes them to a special area on disk to protect against torn pages (partial writes due to a crash mid-flush).

Write-Back Lifecycle of a Dirty Page

Lifecycle: Read → Modify → Checkpoint → Flush 1. Fetch Disk: pg 7 Frame: pg 7 clean 2. UPDATE row Frame: pg 7 dirty WAL record WAL written BEFORE page flush 3. More UPDATEs Frame: pg 7 dirty still dirty, no disk I/O yet 4. Checkpoint Frame: pg 7 clean Disk: pg 7 dirty flag cleared after flush disk now matches memory
Step 1 of 2