What is Row-Oriented Storage?

Prerequisites(2)

Slotted Page Layout Row Format & NULL bitmap

Builds on to(1)

Row-oriented storage (also called the N-ary Storage Model or NSM) stores all columns of a single row contiguously on a disk page. When the database reads a page, it gets every column of every row on that page.

Page layout

A typical page (4 KB or 8 KB) in row storage contains:

Page header: page ID, number of tuples, free space pointer, checksum.
Tuple directory (slot array): an array of (offset, length) pairs pointing to each tuple on the page. This allows tuples to vary in size and be rearranged without changing external references.
Tuples: stored from the end of the page growing backward. Each tuple contains a header (null bitmap, transaction metadata) followed by the column values packed sequentially.

This layout is sometimes called a slotted page. The tuple directory grows forward while tuples grow backward; they meet in the middle when the page is full.

Why row storage is great for OLTP

Online Transaction Processing (OLTP) workloads are characterized by:

Point lookups: SELECT * FROM users WHERE id = 42. The database reads one page, finds the tuple, and returns all columns. One I/O operation fetches the complete row.
Single-row inserts: INSERT INTO orders (...) VALUES (...). Append one tuple to the end of the last page. Very fast — one page write.
Single-row updates: UPDATE users SET email = 'new@example.com' WHERE id = 42. Read one page, modify the tuple in place (if it fits), write it back.
Row-level locking: since an entire row lives on one page, locking a row for a transaction is straightforward and does not conflict with operations on other rows.

Why row storage is bad for analytics

Analytical queries (OLAP) often read a small subset of columns across many rows:

SELECT AVG(salary) FROM employees WHERE department = 'Engineering';

This query needs only the salary and department columns, but row storage forces the database to read every column of every row — name, address, phone, hire_date, etc. For a table with 50 columns, 98% of the data read from disk is wasted I/O.

Tuple identifier (TID)

Each row is identified by a Tuple ID — typically (page_number, slot_number). Indexes store TIDs as pointers to rows. To fetch a row, the database reads the page and looks up the slot in the tuple directory.

N-ary model in practice

Database	Page size	Storage model
PostgreSQL	8 KB	Row (NSM)
MySQL/InnoDB	16 KB	Row (NSM)
Oracle	2-32 KB	Row (NSM)
SQL Server	8 KB	Row (NSM)

All major OLTP databases default to row storage because the overwhelming majority of OLTP operations work with individual rows.

Write-ahead logging (WAL)

Row storage pairs naturally with WAL for crash recovery. When a row is modified, the database writes the before/after image of the tuple to the WAL before modifying the page. Since the entire row is contiguous, logging is efficient — one log record captures the complete change.

**Row-oriented storage** (also called the **N-ary Storage Model** or NSM) stores all columns of a single row **contiguously** on a disk page. When the database reads a page, it gets every column of every row on that page.

### Page layout

A typical page (4 KB or 8 KB) in row storage contains:

1. **Page header**: page ID, number of tuples, free space pointer, checksum.
2. **Tuple directory** (slot array): an array of (offset, length) pairs pointing to each tuple on the page. This allows tuples to vary in size and be rearranged without changing external references.
3. **Tuples**: stored from the end of the page growing backward. Each tuple contains a header (null bitmap, transaction metadata) followed by the column values packed sequentially.

This layout is sometimes called a **slotted page**. The tuple directory grows forward while tuples grow backward; they meet in the middle when the page is full.

### Why row storage is great for OLTP

**Online Transaction Processing (OLTP)** workloads are characterized by:

- **Point lookups**: `SELECT * FROM users WHERE id = 42`. The database reads one page, finds the tuple, and returns all columns. One I/O operation fetches the complete row.
- **Single-row inserts**: `INSERT INTO orders (...) VALUES (...)`. Append one tuple to the end of the last page. Very fast — one page write.
- **Single-row updates**: `UPDATE users SET email = 'new@example.com' WHERE id = 42`. Read one page, modify the tuple in place (if it fits), write it back.
- **Row-level locking**: since an entire row lives on one page, locking a row for a transaction is straightforward and does not conflict with operations on other rows.

### Why row storage is bad for analytics

**Analytical queries** (OLAP) often read a small subset of columns across many rows:

```sql
SELECT AVG(salary) FROM employees WHERE department = 'Engineering';
```

This query needs only the `salary` and `department` columns, but row storage forces the database to read **every column** of every row — name, address, phone, hire_date, etc. For a table with 50 columns, 98% of the data read from disk is wasted I/O.

### Tuple identifier (TID)

Each row is identified by a **Tuple ID** — typically (page_number, slot_number). Indexes store TIDs as pointers to rows. To fetch a row, the database reads the page and looks up the slot in the tuple directory.

### N-ary model in practice

| Database    | Page size | Storage model |
|------------|-----------|---------------|
| PostgreSQL | 8 KB      | Row (NSM)     |
| MySQL/InnoDB | 16 KB   | Row (NSM)     |
| Oracle     | 2-32 KB   | Row (NSM)     |
| SQL Server | 8 KB      | Row (NSM)     |

All major OLTP databases default to row storage because the overwhelming majority of OLTP operations work with individual rows.

### Write-ahead logging (WAL)

Row storage pairs naturally with **WAL** for crash recovery. When a row is modified, the database writes the before/after image of the tuple to the WAL before modifying the page. Since the entire row is contiguous, logging is efficient — one log record captures the complete change.

Real-Life: PostgreSQL Heap Pages

Real-World Example

PostgreSQL stores table data in heap files composed of 8 KB pages using the slotted-page layout.

Inserting a row:

INSERT INTO users (name, email, age) VALUES ('Alice', 'alice@example.com', 30);

PostgreSQL finds a page with enough free space (using the Free Space Map).
The tuple (header + 'Alice' + 'alice@example.com' + 30) is written at the end of the page's free space.
A new slot is added to the page's line pointer array pointing to this tuple.
The tuple's TID is (page 5, slot 3) — this is what indexes store.

Point query:

SELECT * FROM users WHERE id = 42;

The primary key index maps id=42 to TID (page 5, slot 3).
The buffer pool fetches page 5 (one I/O if not cached).
Slot 3 in the line pointer array gives the offset within the page.
All columns of that row are immediately available — no additional I/O.

Why analytics are slow: Running SELECT AVG(age) FROM users on a table with 20 columns requires a sequential scan of every page. Each 8 KB page contains perhaps 50 rows, but only the 4-byte age field is needed (200 bytes useful out of 8,192 bytes read). That is ~97% wasted I/O.

Slotted Page Layout (Row Storage)

Step 1 of 2