Hard Disk Drive Internals

Prerequisites(1)

Builds on to(3)

A Hard Disk Drive (HDD) stores data on spinning magnetic platters. Despite being largely replaced by SSDs for primary storage, HDDs remain dominant for bulk storage (data centers, backups, archival) due to their cost-per-gigabyte advantage. Understanding HDD mechanics is essential because many database and filesystem designs were shaped by the need to minimize mechanical movement.

Physical Structure

An HDD contains one or more circular platters coated with a thin magnetic layer, spinning at 5,400 to 15,000 RPM. Each platter surface has a read/write head mounted on an actuator arm that moves radially across the platter.

Track: A concentric ring on one platter surface. A typical platter has tens of thousands of tracks.
Sector: The smallest addressable unit on a track, typically 512 bytes or 4 KB (Advanced Format). A track contains hundreds to thousands of sectors.
Cylinder: The set of tracks at the same radial position across all platters. The heads move together, so all tracks in a cylinder are accessible without seeking.

The Three Components of Access Time

Every HDD read/write involves three sequential delays:

Seek Time (0.5-10 ms): The actuator arm moves the head to the correct track. This mechanical movement dominates random I/O latency. Average seek time for a 7,200 RPM drive is ~4-8 ms.
Rotational Latency (0-8.3 ms at 7,200 RPM): Once the head is on the correct track, it waits for the desired sector to rotate under the head. On average, this is half a rotation: at 7,200 RPM, one rotation takes 8.3 ms, so average rotational latency is ~4.2 ms.
Transfer Time: The time to read the actual data as sectors pass under the head. For sequential data, modern drives achieve 150-250 MB/s. A 4 KB sector transfer at 200 MB/s takes just ~0.02 ms -- negligible compared to seek and rotation.

Sequential vs Random I/O

This is the single most important HDD characteristic for systems design. Sequential I/O (reading consecutive sectors on the same track) avoids seek and rotational delays after the first access, achieving 150-250 MB/s. Random I/O (reading scattered sectors) pays the full seek + rotation penalty for each access.

For a 4 KB random read: seek (4 ms) + rotation (4.2 ms) + transfer (0.02 ms) = ~8 ms per read, yielding only ~125 IOPS. That is roughly 100-1000x slower than sequential throughput for the same data volume.

Disk Scheduling Algorithms

The OS disk scheduler reorders I/O requests to minimize total seek distance:

SCAN (Elevator): The head sweeps in one direction (inner to outer tracks), servicing requests along the way, then reverses. Prevents starvation and reduces average seek distance.
C-SCAN (Circular SCAN): Like SCAN, but only services requests in one direction. When the head reaches the outermost track, it jumps back to the innermost without servicing, then sweeps again. Provides more uniform wait times.
LOOK / C-LOOK: Variants that only travel as far as the farthest pending request rather than going to the physical edge.

Why Databases Minimize Random I/O

The 100x+ gap between sequential and random HDD performance shaped decades of database design:

B+ trees store keys in sorted order within large pages (e.g., 8 KB), so range scans are sequential reads.
LSM trees batch random writes into an in-memory buffer, then flush as a large sequential write.
Clustered indexes physically order table rows by index key, turning range queries into sequential scans.
Write-ahead logs append entries sequentially rather than updating pages in place.

SSD Comparison

SSDs use flash memory with no moving parts, eliminating seek time and rotational latency entirely. A random 4 KB read takes ~10-100 us (vs 8 ms for HDD) -- a 100-800x improvement. However, SSDs have their own challenges: write amplification (writing a 4 KB page may require erasing and rewriting a 256 KB block), limited write endurance, and higher cost per gigabyte. Despite this, the sequential-vs-random gap still exists on SSDs (about 10x, not 100x), so sequential-friendly designs remain beneficial.

Real-World: PostgreSQL and Sequential Scan Decisions

Real-World Example

Why PostgreSQL sometimes prefers a sequential scan over an index scan:

Consider a query SELECT * FROM orders WHERE status = 'shipped' on a table with 10 million rows where 40% match.

Index scan: Look up the B-tree index for 'shipped', then fetch 4 million rows from scattered heap pages. On an HDD, each random page fetch costs ~8 ms. Even with caching, thousands of random reads make this extremely slow.
Sequential scan: Read the entire table from start to finish. The table might be 10 GB, and at 200 MB/s sequential throughput, this takes ~50 seconds -- but it is still faster than millions of random I/O operations.

PostgreSQL's query planner uses random_page_cost (default 4.0) vs seq_page_cost (default 1.0) to model this 4x ratio. For HDDs, the true ratio is closer to 50-100x, so DBAs often increase random_page_cost to 20-50 to help the planner make better decisions.

Other HDD-aware designs:

Kafka: Achieves millions of messages/sec by appending to sequential log files. The broker's throughput is limited by sequential disk bandwidth, not IOPS.
HDFS: Writes data in large blocks (128 MB default) to amortize seek cost across many sequential reads.

HDD Structure and Seek Operation

Step 1 of 2

HDD (Magnetic Disk)