Hardware Registers
A register is a small, fast storage element inside the CPU built from a group of D flip-flops that share a common clock signal and a load enable line. Registers store data that the CPU is actively working with -- operands, results, addresses, and control state.
Register Construction
An N-bit register is simply N D flip-flops wired in parallel:
- All flip-flops share the same clock (CLK) edge
- A load enable (LD) signal gates whether new data is written on the clock edge
- When LD=1 and a rising clock edge occurs, all N flip-flops capture their respective input bits simultaneously
- When LD=0, the register holds its current value regardless of clock edges
Register Width
| Width | Typical Use |
|---|---|
| 1-bit | Status flags (Z, C, V, N) |
| 8-bit | Byte-sized I/O registers |
| 32-bit | General-purpose registers in 32-bit CPUs (x86, ARM32) |
| 64-bit | General-purpose registers in modern CPUs (x86-64, AArch64) |
| 128-bit | SIMD/vector registers (SSE, NEON) |
| 256-bit | AVX vector registers |
| 512-bit | AVX-512 vector registers |
Register File
A register file is an array of registers with read ports and write ports. A typical RISC CPU register file has:
- 32 registers (e.g., MIPS, RISC-V, ARM)
- 2 read ports (to read two source operands simultaneously for the ALU)
- 1 write port (to write back the result)
Read ports use multiplexers: a 5-bit register address selects one of 32 registers, and the MUX routes that register's contents to the output. Write ports use a decoder to activate the load enable of exactly one register.
Named / Special-Purpose Registers
Beyond general-purpose registers, CPUs have dedicated registers:
- PC (Program Counter): Holds the address of the next instruction to fetch
- SP (Stack Pointer): Points to the top of the call stack
- LR (Link Register): Stores the return address for function calls (ARM)
- IR (Instruction Register): Holds the currently executing instruction (internal, not programmer-visible)
- MAR / MDR: Memory Address Register and Memory Data Register for bus communication
- FLAGS / PSW: The status register holding Z, C, V, N flags from the ALU
Register vs. Memory
Registers are at the top of the memory hierarchy -- they are the fastest and smallest storage. A 64-bit CPU with 32 registers has only 256 bytes of register storage, but accessing a register takes less than 1 nanosecond (within a single clock cycle). By contrast, L1 cache access takes 1-2 ns, and main memory takes 50-100 ns. Compilers work hard to keep frequently used variables in registers (register allocation) to avoid slower memory accesses.
Real-Life: Whiteboard in a Meeting Room
Think of CPU registers as the whiteboards in a meeting room. There are only a few of them (like 32), but they are immediately visible and writable -- no need to walk to a filing cabinet. The "filing cabinet" is main memory: it holds vastly more information but takes much longer to access. When you need to work on a calculation, you copy the relevant numbers from the filing cabinet to the whiteboard, work with them there, and write the result back.
Other real-world analogies and uses:
- Clipboard on your phone: Your phone's clipboard is like a register -- it holds one piece of data (copied text) that's instantly accessible for pasting. Main storage (your notes app) holds more but takes taps to navigate.
- CPU context switch: When the OS switches between processes, it must save all register values of the outgoing process and restore the registers of the incoming process. This "register dump" is stored in the process control block. The more registers a CPU has, the more data must be saved/restored, increasing context switch cost.
- Compiler register allocation: Compilers use graph coloring algorithms to decide which variables live in registers. If there are more live variables than registers, the compiler must "spill" some variables to the stack (memory), which is slower.
- RISC vs. CISC: RISC architectures (ARM, RISC-V) typically have 32 general-purpose registers. x86-64 has only 16 (expanded from x86's 8). More registers generally mean less memory traffic and better performance for register-intensive code.