What is Row-Oriented Storage?
Row-oriented storage (also called the N-ary Storage Model or NSM) stores all columns of a single row contiguously on a disk page. When the database reads a page, it gets every column of every row on that page.
Page layout
A typical page (4 KB or 8 KB) in row storage contains:
- Page header: page ID, number of tuples, free space pointer, checksum.
- Tuple directory (slot array): an array of (offset, length) pairs pointing to each tuple on the page. This allows tuples to vary in size and be rearranged without changing external references.
- Tuples: stored from the end of the page growing backward. Each tuple contains a header (null bitmap, transaction metadata) followed by the column values packed sequentially.
This layout is sometimes called a slotted page. The tuple directory grows forward while tuples grow backward; they meet in the middle when the page is full.
Why row storage is great for OLTP
Online Transaction Processing (OLTP) workloads are characterized by:
- Point lookups:
SELECT * FROM users WHERE id = 42. The database reads one page, finds the tuple, and returns all columns. One I/O operation fetches the complete row. - Single-row inserts:
INSERT INTO orders (...) VALUES (...). Append one tuple to the end of the last page. Very fast — one page write. - Single-row updates:
UPDATE users SET email = 'new@example.com' WHERE id = 42. Read one page, modify the tuple in place (if it fits), write it back. - Row-level locking: since an entire row lives on one page, locking a row for a transaction is straightforward and does not conflict with operations on other rows.
Why row storage is bad for analytics
Analytical queries (OLAP) often read a small subset of columns across many rows:
SELECT AVG(salary) FROM employees WHERE department = 'Engineering';
This query needs only the salary and department columns, but row storage forces the database to read every column of every row — name, address, phone, hire_date, etc. For a table with 50 columns, 98% of the data read from disk is wasted I/O.
Tuple identifier (TID)
Each row is identified by a Tuple ID — typically (page_number, slot_number). Indexes store TIDs as pointers to rows. To fetch a row, the database reads the page and looks up the slot in the tuple directory.
N-ary model in practice
| Database | Page size | Storage model |
|---|---|---|
| PostgreSQL | 8 KB | Row (NSM) |
| MySQL/InnoDB | 16 KB | Row (NSM) |
| Oracle | 2-32 KB | Row (NSM) |
| SQL Server | 8 KB | Row (NSM) |
All major OLTP databases default to row storage because the overwhelming majority of OLTP operations work with individual rows.
Write-ahead logging (WAL)
Row storage pairs naturally with WAL for crash recovery. When a row is modified, the database writes the before/after image of the tuple to the WAL before modifying the page. Since the entire row is contiguous, logging is efficient — one log record captures the complete change.
Real-Life: PostgreSQL Heap Pages
PostgreSQL stores table data in heap files composed of 8 KB pages using the slotted-page layout.
Inserting a row:
INSERT INTO users (name, email, age) VALUES ('Alice', 'alice@example.com', 30);
- PostgreSQL finds a page with enough free space (using the Free Space Map).
- The tuple (header + 'Alice' + 'alice@example.com' + 30) is written at the end of the page's free space.
- A new slot is added to the page's line pointer array pointing to this tuple.
- The tuple's TID is (page 5, slot 3) — this is what indexes store.
Point query:
SELECT * FROM users WHERE id = 42;
- The primary key index maps id=42 to TID (page 5, slot 3).
- The buffer pool fetches page 5 (one I/O if not cached).
- Slot 3 in the line pointer array gives the offset within the page.
- All columns of that row are immediately available — no additional I/O.
Why analytics are slow: Running SELECT AVG(age) FROM users on a table with 20 columns requires a sequential scan of every page. Each 8 KB page contains perhaps 50 rows, but only the 4-byte age field is needed (200 bytes useful out of 8,192 bytes read). That is ~97% wasted I/O.