Back to DAG

Context Switch

os

What is a Context Switch?

A context switch is the mechanism by which the OS stops executing one process on a CPU core and starts executing a different process. It is the fundamental operation that enables multitasking -- the illusion that many processes run simultaneously on a machine with fewer cores than processes.

The Steps of a Context Switch

When the kernel decides to switch from Process A to Process B, the following sequence occurs:

  1. Save Process A's state -- The kernel copies all CPU register values (general-purpose registers, floating-point registers, the program counter, the stack pointer, the flags register) into Process A's PCB. This is the "context" being saved.
  2. Save Process A's memory context -- The base register of Process A's page table is recorded so the kernel can restore its address space later.
  3. Update Process A's state -- The PCB's state field is changed from Running to Ready (if preempted) or Blocked (if waiting on I/O).
  4. Select Process B -- The scheduler picks the next process to run from the ready queue.
  5. Load Process B's state -- The kernel loads all saved register values from Process B's PCB into the actual CPU registers.
  6. Restore Process B's memory context -- The page table base register is set to Process B's page table, so the CPU now sees Process B's virtual address space.
  7. Flush or tag the TLB -- Since the TLB caches virtual-to-physical mappings for the old process, it must be invalidated. Modern CPUs use ASID (Address Space Identifier) tagging to avoid full flushes -- each TLB entry is tagged with a process ID, so entries from other processes are ignored rather than evicted.
  8. Resume execution -- The CPU jumps to the instruction at Process B's saved program counter and resumes executing.

What Triggers a Context Switch

TriggerMechanism
Timer interruptA hardware timer fires periodically (e.g., every 1-10 ms), giving the scheduler a chance to preempt the running process
Blocking syscallThe process calls read(), sleep(), or wait(), voluntarily yielding the CPU
I/O completion interruptA higher-priority process becomes ready after its I/O completes
Explicit yieldThe process calls sched_yield() to voluntarily give up its time slice

The Cost of Context Switches

A context switch typically costs 1 to 10 microseconds of direct overhead (saving/loading registers). But the indirect costs are often much larger:

  • TLB flush penalty -- After a switch, the TLB is cold. The new process suffers TLB misses on nearly every memory access until the TLB is repopulated. This can add tens of microseconds of stall time.
  • Cache pollution -- The L1 and L2 caches are filled with data from the old process. The new process experiences cold cache misses as it loads its own working set, evicting the old data. When the old process resumes, it faces the same penalty in reverse.
  • Pipeline flush -- The CPU pipeline is flushed during the switch, discarding any in-flight speculative work.

Because of these costs, minimizing unnecessary context switches is critical for throughput. Techniques include using larger time quanta, CPU affinity (pinning a process to a specific core to preserve cache warmth), and batching I/O operations.

Real-Life: Context Switch Overhead in Servers

Real-World Example

High-performance servers are acutely sensitive to context switch overhead. Consider a web server handling 50,000 concurrent connections:

Thread-per-connection model (Apache prefork):

  • Each connection gets its own process/thread. With 50,000 connections, the OS constantly switches between thousands of processes.
  • Each switch flushes the TLB and pollutes the cache. At 5 microseconds per switch and thousands of switches per second, a significant fraction of CPU time is spent just switching rather than doing useful work.

Event-driven model (nginx, Node.js):

  • A single process handles many connections using non-blocking I/O and an event loop (epoll, kqueue).
  • Dramatically fewer context switches because one process handles everything. The TLB stays warm, caches stay populated.
  • This is a major reason why nginx can handle 10x more concurrent connections than Apache on the same hardware.

Measuring context switches: On Linux, you can observe context switches for a specific process:

cat /proc/[pid]/status | grep ctxt
voluntary_ctxt_switches: 1234
nonvoluntary_ctxt_switches: 56

voluntary means the process blocked (e.g., I/O). nonvoluntary means the scheduler preempted it (time slice expired). A high nonvoluntary count suggests the process is compute-bound and fighting for CPU time.

Context Switch: Save and Restore

Context Switch: Process A -> Process B CPU Registers PC, SP, R0-R15, FLAGS Page Table Base Reg Process A PCB PC: 0x40200a SP: 0x7fff3e10 Regs: R0..R15 PT Base: 0x1a000 State: Ready Priority: 5 Process B PCB PC: 0x7f3b20 SP: 0x7fff8c40 Regs: R0..R15 PT Base: 0x2f000 State: Running Priority: 2 1 Save A's state 2 Load B's state TLB Flush / ASID Switch Old mappings invalidated Direct cost: ~1-10 us | Indirect: TLB + cache misses
Step 1 of 2