Hardware Protection Rings
Modern CPUs enforce a hardware-level separation between trusted OS code and untrusted application code using protection rings. The x86 architecture defines four privilege levels (rings 0 through 3), though in practice only two are commonly used.
Ring 0: Kernel Mode
Ring 0 is the most privileged level. Code running at ring 0 has unrestricted access to all hardware:
- Execute any CPU instruction, including privileged instructions (
hlt,lgdt,movto control registers,in/outfor port I/O). - Access any physical memory address.
- Configure the MMU, page tables, and interrupt handlers.
- Enable/disable interrupts.
The OS kernel runs at ring 0. A bug here (e.g., a faulty device driver) can crash the entire system.
Ring 3: User Mode
Ring 3 is the least privileged level. Application code runs here with significant restrictions:
- Cannot execute privileged instructions (attempting to do so triggers a general protection fault, an exception the kernel handles by typically killing the offending process).
- Cannot directly access I/O ports or hardware devices.
- Can only access memory that the page table marks as user-accessible (the U/S bit).
- Cannot modify its own page tables or interrupt descriptor table.
The Mode Bit
The CPU's current privilege level (CPL) is stored in the lowest 2 bits of the CS (Code Segment) register. When CPL = 0, the CPU is in kernel mode. When CPL = 3, it is in user mode. The hardware checks the CPL on every instruction and memory access.
Ring 1 and Ring 2
x86 technically supports rings 1 and 2 for intermediate privilege levels (originally intended for device drivers and OS services). In practice, virtually no modern OS uses them. Both Linux and Windows use only ring 0 and ring 3. The reasons:
- Portability: other architectures (ARM, RISC-V) only have two privilege levels, so using rings 1/2 would make the OS x86-specific.
- Complexity: managing four rings adds complexity with minimal practical benefit.
- Virtualization: hypervisors (VMware, KVM) originally used ring 1 to run guest kernels (ring de-privileging), but hardware virtualization extensions (VT-x) made this unnecessary.
ARM Equivalent: Exception Levels
ARM processors use Exception Levels (EL0-EL3):
| Level | x86 Equivalent | Purpose |
|---|---|---|
| EL0 | Ring 3 | User applications |
| EL1 | Ring 0 | OS kernel |
| EL2 | (no direct equivalent) | Hypervisor |
| EL3 | (no direct equivalent) | Secure monitor (TrustZone) |
ARM's EL2 and EL3 provide dedicated levels for virtualization and security that x86 handles through extensions (VT-x, SMM).
Mode Switch
Transitioning from user mode (ring 3) to kernel mode (ring 0) occurs via controlled entry points:
- Trap/syscall instruction: the user program deliberately requests a kernel service (e.g.,
syscallon x86-64,svcon ARM). - Hardware interrupt: an external device (keyboard, disk, timer) signals the CPU.
- Exception/fault: the CPU encounters an error (division by zero, page fault, invalid opcode).
In all cases, the CPU automatically switches to ring 0, saves the user-mode state, and jumps to a handler address defined in the Interrupt Descriptor Table (IDT). The reverse transition (iret or sysret) restores the user-mode state and drops back to ring 3.
Why Ring 3 Restrictions Matter
Consider what would happen if applications ran at ring 0:
- A malicious program could read any process's memory (passwords, encryption keys), modify kernel data structures, or install rootkits.
- A buggy program with a buffer overflow could overwrite kernel memory, crashing the entire system instead of just the faulty process.
- A game could directly program the disk controller, bypassing the file system and corrupting data belonging to other programs.
With ring 3 enforcement:
- The buggy program triggers a segmentation fault. The kernel kills only that process; everything else continues.
- The malicious program cannot read other processes' memory because the page table (controlled by the kernel at ring 0) marks those pages as inaccessible from ring 3.
- The game must ask the kernel (via system calls) to perform I/O, and the kernel enforces file permissions.
Real-world example: the Meltdown vulnerability (2018) was so severe because it allowed ring 3 code to read ring 0 memory by exploiting speculative execution. The fix (KPTI -- Kernel Page Table Isolation) unmaps kernel pages from user-space page tables, adding overhead to every mode switch.