Copy-on-Write: Lazy Copying After fork()

Prerequisites(2)

Builds on to(1)

Copy-on-Write (COW) is an optimization technique that makes fork() nearly instant, regardless of how much memory the parent process uses. Without COW, fork() would need to copy the parent's entire address space (potentially gigabytes) for the child -- an enormously expensive operation. COW eliminates this cost by deferring the copy until it is actually needed.

How COW Works

When fork() is called:

The kernel does not copy the parent's physical pages. Instead, both parent and child page tables are updated to point to the same physical frames.
All shared pages are marked as read-only in both processes' page tables (even pages that were previously writable).
Both processes can read from these shared pages with zero overhead -- they are accessing the same physical memory.

When either process attempts to write to a shared page:

The CPU raises a page fault (because the page is marked read-only).
The kernel's page fault handler detects that this is a COW page (not a true permissions violation).
The kernel allocates a new physical frame, copies the contents of the original page into it, and updates the faulting process's page table to point to the new copy with write permissions.
The other process retains its mapping to the original page. If only one process has a reference left, the original can be made writable again without copying.
The write instruction is re-executed, this time succeeding on the private copy.

Why COW Matters

Scenario	Without COW	With COW
fork() a process using 1 GB RAM	Copy 1 GB (~250 ms)	Update page tables (~microseconds)
fork() + exec()	Copies 1 GB, then immediately discards it	Copies zero pages (exec replaces them all)
fork() + child reads only	Copies 1 GB unnecessarily	Shares all pages, copies nothing
fork() + child writes 10 pages	Copies entire 1 GB	Copies only 10 pages (~40 KB)

The fork+exec pattern benefits enormously from COW. When a shell forks and the child immediately calls exec(), the child's address space is replaced entirely by the new program. With COW, the fork() copied zero pages, and exec() unmaps the shared pages. Without COW, fork() would have copied gigabytes of memory only to throw it all away.

Reference Counting

The kernel maintains a reference count for each physical frame. When fork() maps a page as shared, the count increases. When a COW fault creates a private copy, the count for the original decreases. When the count drops to 1, the remaining process can write directly without triggering a fault. When it drops to 0, the frame is freed.

COW Beyond fork()

Copy-on-write is a general-purpose optimization used in many contexts:

Redis background saving: Redis forks to create a snapshot. The child writes the dataset to disk while the parent continues serving requests. Thanks to COW, the child shares the parent's memory and only pages that the parent modifies during the save need to be copied.
Filesystem snapshots: Btrfs and ZFS create snapshots using COW -- a snapshot initially shares all blocks with the live filesystem. Only blocks that are subsequently modified are copied.
Virtual machine live migration: VM memory pages are shared between source and destination, and only modified pages are re-transferred.

Real-Life: Redis Background Persistence

Real-World Example

Redis stores its entire dataset in memory, but needs to periodically save it to disk for durability. The BGSAVE command demonstrates COW perfectly:

Step 1: Redis calls fork() The Redis server (parent, say 10 GB RSS) forks. Thanks to COW, this takes microseconds -- no data is copied. The child shares all 10 GB of physical pages with the parent.

Step 2: Child writes the snapshot to disk The child iterates over the shared dataset and serializes it to an RDB file on disk. It only reads the shared pages, so no COW faults occur from the child's side.

Step 3: Parent continues serving writes While the child is saving, the parent handles new client commands. When a client modifies a key, the parent writes to a shared page, triggering a COW fault. The kernel copies that one page (~4 KB) so the parent gets a private writable copy. The child still sees the original, consistent snapshot.

Memory overhead: If clients modify 5% of keys during the save, only 5% of pages are copied (~500 MB for a 10 GB dataset). Without COW, Redis would need 20 GB of memory to fork safely.

Warning: Linux's /proc/sys/vm/overcommit_memory setting matters here. By default, Linux may refuse to fork if there is not enough virtual memory to theoretically back both processes. Setting it to 1 allows the fork, relying on COW to keep actual memory usage low. This is why the Redis documentation recommends configuring overcommit.

Another example: Docker containers. When you run multiple containers from the same image, the filesystem layers use COW (via overlay2 or similar). All containers share the same base image blocks; only files that a container modifies are copied into its writable layer.

Copy-on-Write: Fork and Write Fault

Step 1 of 2

Copy-on-Write (fork COW)