How a Row is Stored on Disk
When a database writes a row (tuple) to a page, it must serialize the column values into a compact byte sequence. The row format defines exactly how fields are laid out, how NULLs are represented, and how oversized values are handled.
Row header
Every row begins with a header containing system metadata:
- Transaction ID (xmin/xmax) — which transaction created/deleted this row (used by MVCC).
- Command ID — ordering within a transaction.
- Infomask bits — flags indicating whether the row has NULLs, variable-length fields, or TOAST pointers.
- Offset to user data — where the actual column values start.
Fixed-length fields
Columns with fixed sizes — int (4 bytes), bigint (8 bytes), float (4 bytes), char(n) (n bytes) — are stored inline at a known offset. The database can jump directly to byte offset X to read column Y because every preceding column has a predictable size.
Variable-length fields
Columns like varchar, text, or bytea have unpredictable sizes. They are stored using a (offset, length) or a length-prefixed scheme. PostgreSQL uses a 1-byte or 4-byte varlena header: the first byte(s) encode the total length, followed by the data bytes.
NULL bitmap
Rather than storing a sentinel value for NULL, databases use a NULL bitmap: one bit per nullable column. If bit i is 1, column i is NULL and no bytes are stored for it. This is vastly more efficient than storing a placeholder value, especially for wide tables with many nullable columns.
For fixed-length fields, the bitmap lets the engine skip the field entirely. For variable-length fields, a NULL means zero bytes — no length prefix, no data. The bitmap itself costs only ceil(num_columns / 8) bytes per row.
TOAST — The Oversized-Attribute Storage Technique
When a single column value exceeds roughly 2 KB (PostgreSQL's threshold is ~2000 bytes), it cannot fit inline in an 8 KB page alongside other rows. TOAST handles this by:
- Compressing the value (using pglz or lz4).
- If still too large, storing it out-of-line in a separate TOAST table, replacing the inline value with an 18-byte TOAST pointer (chunk ID + size).
- Large values are split into chunks (typically ~2000 bytes each) stored across multiple TOAST table rows.
TOAST is transparent: queries see the full value, but the storage engine fetches it in chunks only when needed.
Alignment and padding
Columns are typically aligned to their natural boundary (4-byte ints at 4-byte offsets, 8-byte bigints at 8-byte offsets). This may introduce padding bytes between columns. Smart column ordering (largest alignment first) can reduce wasted space.
Real-Life: PostgreSQL HeapTupleHeader
In PostgreSQL, every heap tuple starts with a 23-byte header (HeapTupleHeaderData):
- t_xmin (4 bytes) — inserting transaction ID
- t_xmax (4 bytes) — deleting transaction ID (0 if not deleted)
- t_cid (4 bytes) — command ID within the transaction
- t_ctid (6 bytes) — current tuple ID (page, offset)
- t_infomask2 (2 bytes) — number of attributes + flags
- t_infomask (2 bytes) — NULL bitmap present? Has varlen fields? Has TOAST?
- t_hoff (1 byte) — offset to user data
Immediately after the header comes the NULL bitmap (if the infomask says NULLs exist). Then the actual column data begins at byte offset t_hoff.
Example row for CREATE TABLE users (id int, name varchar, email varchar, age int): | Header (23B) | NULL bitmap (1B: 0b0000) | id: 42 (4B) | name: "Alice" (6B) | email: "a@b.com" (8B) | age: 25 (4B) |
If email were NULL, the bitmap would be 0b0100, and the email bytes would be completely absent — saving 8 bytes.