Back to DAG

ALU

hardware

Arithmetic Logic Unit

The Arithmetic Logic Unit (ALU) is the combinational circuit inside a CPU that performs all arithmetic and logical operations. It combines an adder, a logic unit, and a multiplexer into a single functional block controlled by an opcode (operation code).

ALU Architecture

An ALU receives two N-bit operand inputs (A and B), an opcode that selects the operation, and produces:

  • An N-bit result
  • A set of status flags describing the result

Internally, the ALU computes multiple results in parallel (add, subtract, AND, OR, etc.) and uses a MUX controlled by the opcode to select which result to output.

Common ALU Operations

OpcodeOperationDescription
000ADDA + B
001SUBA - B (via A + NOT(B) + 1)
010ANDBitwise A AND B
011ORBitwise A OR B
100XORBitwise A XOR B
101NOTBitwise NOT A
110SHLShift A left by 1 (multiply by 2)
111SHRShift A right by 1 (divide by 2)

Subtraction is implemented using the same adder hardware as addition: SUB = A + (~B) + 1. The ALU inverts B and sets the carry-in to 1, computing two's complement negation.

Status Flags

After every operation, the ALU sets condition flags that the control unit reads for branch decisions:

  • Zero (Z): Set when the result is all zeros. Used by conditional branches like BEQ (branch if equal).
  • Carry (C): Set when the addition produces a carry-out from the MSB. Indicates unsigned overflow.
  • Overflow (V): Set when a signed arithmetic operation produces a result too large or too small to represent. Detected when the carry into the MSB differs from the carry out of the MSB.
  • Negative (N): Set when the MSB of the result is 1, indicating a negative value in two's complement.

Bit-Slice Design

Large ALUs are often designed as bit-slices -- identical 1-bit or 4-bit ALU cells that are cascaded to form wider data paths. The classic 74181 was a 4-bit ALU slice; four of them could be chained (with carry-lookahead) to build a 16-bit ALU. Modern CPUs integrate the full ALU width on-chip, but the bit-slice concept persists in the design methodology.

ALU in the CPU Pipeline

In a pipelined CPU, the ALU sits in the execute stage. The decode stage extracts the opcode and operands, feeds them to the ALU, and the ALU produces the result and flags in a single clock cycle (for simple integer operations). More complex operations like multiplication and division typically use separate functional units with multi-cycle latency.

Real-Life: Calculator and Beyond

Real-World Example

A pocket calculator is essentially a user-facing ALU. When you press "5 + 3 =", the calculator's ALU receives operands 5 and 3, the opcode for addition, and outputs 8. The "+/-" button flips the sign using the same NOT + add-1 trick as SUB in hardware.

Other real-world uses and analogies:

  • GPU shader cores: A GPU contains hundreds of small ALUs (shader cores) that run in parallel. Each core has its own ALU performing the same operation on different data (SIMD). This is why GPUs excel at matrix math for graphics and AI.
  • Comparison instructions: When a CPU executes CMP A, B, it performs SUB internally but discards the result -- it only keeps the flags. The Zero flag tells if A == B, the Negative flag tells if A < B (signed), and the Carry flag tells if A < B (unsigned).
  • Network routers: Some high-speed routers use ALU-like circuits to perform checksum calculations on packet headers at wire speed.
  • Branch prediction validation: After a branch prediction, the ALU computes the actual condition. If the prediction was wrong, the pipeline is flushed -- the flags from the ALU are what determine correctness.

ALU Block Diagram

A L U N-bit A N bits B N bits Opcode Result N bits Z (Zero) C (Carry) V (Overflow) N (Negative) Adder + Logic + MUX
Step 1 of 2