What is WAL Checkpointing?

Prerequisites(1)

A checkpoint is a periodic operation that flushes dirty pages from the buffer pool to disk and records a marker in the WAL. Its purpose is to limit the amount of WAL that must be replayed during crash recovery, and to allow old WAL segments to be safely deleted.

The problem without checkpoints

Without checkpoints, crash recovery must replay the entire WAL from the beginning — every change ever made since the database was created. For a database that has been running for months, the WAL could be terabytes in size. Recovery would take hours or days.

How checkpoints work

A sharp (or consistent) checkpoint works as follows:

Pause all transactions momentarily.
Flush all dirty pages in the buffer pool to disk.
Write a checkpoint record to the WAL containing the current LSN.
Resume transactions.

After a crash, recovery starts from the checkpoint LSN, not the beginning of the WAL. Only the WAL records after the checkpoint need to be replayed. All changes before the checkpoint are guaranteed to be on disk already.

Fuzzy checkpoints

A sharp checkpoint is expensive because it blocks all transactions during the flush. Modern databases use fuzzy checkpoints instead:

Record the checkpoint-begin marker with the current LSN and the set of dirty pages (with their oldest dirty LSN).
Continue accepting transactions normally — no pause.
Gradually flush dirty pages in the background.
Record the checkpoint-end marker.

During recovery, the database starts replaying from the oldest dirty page LSN recorded in the checkpoint, not from the checkpoint LSN itself. This is slightly more WAL to replay, but the checkpoint does not block transactions.

Checkpoint frequency tradeoff

Too frequent: every checkpoint flushes dirty pages to disk, generating significant I/O. On a write-heavy workload, frequent checkpoints can bottleneck disk throughput.
Too rare: after a crash, recovery must replay all WAL since the last checkpoint. If the last checkpoint was 30 minutes ago, recovery takes 30 minutes of WAL replay.
Typical settings: PostgreSQL defaults to checkpoints every 5 minutes or every 1GB of WAL, whichever comes first.

WAL truncation

Once a checkpoint is complete and confirmed, all WAL segments before the checkpoint's oldest LSN are no longer needed for recovery. They can be truncated (deleted or recycled). This prevents the WAL from growing indefinitely. However, if WAL-based replication is active, the database must keep WAL segments until all replicas have consumed them.

PostgreSQL checkpoint details

PostgreSQL's checkpoint process is controlled by:

checkpoint_timeout (default: 5 min) — maximum time between checkpoints.
max_wal_size (default: 1 GB) — maximum WAL size before triggering a checkpoint.
checkpoint_completion_target (default: 0.9) — spread the I/O over 90% of the checkpoint interval to avoid write spikes.

Real-Life: PostgreSQL Checkpoint Behavior

Real-World Example

When a PostgreSQL server has been running under load, here is what happens during a checkpoint:

Triggered by time or WAL size:

Every 5 minutes (checkpoint_timeout), or when 1GB of WAL has been written (max_wal_size), the checkpointer process starts.

The checkpoint process:

The checkpointer records a checkpoint-start in the WAL.
It scans the buffer pool for dirty pages and writes them to disk, spreading the writes over ~4.5 minutes (90% of the 5-minute interval) to avoid overwhelming the disk.
Once all dirty pages are flushed, it writes a checkpoint-complete record to the WAL.
It updates pg_control with the new checkpoint location.
Old WAL segment files before this checkpoint are recycled.

Monitoring checkpoints:

SELECT * FROM pg_stat_bgwriter; shows checkpoints_timed (scheduled) and checkpoints_req (forced by WAL size). If forced checkpoints are frequent, increase max_wal_size.
In the PostgreSQL log with log_checkpoints = on, you see messages like: "checkpoint complete: wrote 15432 buffers (12.3%); 0 WAL file(s) added, 3 removed."

Impact on recovery time: if the server crashes, PostgreSQL reads pg_control to find the last checkpoint LSN, then replays all WAL from that point. With 5-minute checkpoints, recovery typically takes seconds to a few minutes — orders of magnitude faster than replaying the full WAL.

Checkpoint and WAL Truncation

Step 1 of 2