What is a Heap File?
A heap file is the simplest way a database stores records on disk: an unordered collection of pages. New records are placed wherever there is free space — there is no sorting, no indexing, no structure beyond "here are the pages."
Record Identifier (RID)
Every record in a heap file is identified by a Record ID = (page_id, slot_number). The page_id tells the storage engine which disk page to fetch, and the slot_number tells it where within that page the record lives. This two-part address is stable — it does not change unless the record is physically moved (e.g., during compaction).
Finding records
Because records are unordered, the only way to find a specific record without an index is a full table scan: read every page, examine every record, check if it matches. This is O(n) where n is the number of records — terrible for point queries, but perfectly acceptable for analytics that must read everything anyway.
Free space tracking
When a record is deleted, the page has a hole. New inserts should reuse that space. Databases track free space using one of two strategies:
-
Free Space Map (FSM) — a separate structure that records, for each page, how much free space it has. PostgreSQL uses a one-byte-per-page FSM where the value approximates the largest contiguous free chunk. To insert a 200-byte record, the engine scans the FSM for a page with at least 200 bytes free.
-
Page directory — a set of header pages that point to data pages, each entry recording the page's free space. This is a more centralized approach.
Page organization strategies
- Linked list: pages form two linked lists — one for pages with free space, one for full pages. Simple but requires traversal to find a page.
- Page directory: a few special pages act as a directory, listing all data pages and their free capacity. More random-access friendly.
When heap files shine
Heap files are ideal for bulk inserts (append to any page with space — no ordering to maintain) and sequential scans (read every page in order). They are the default storage for tables in PostgreSQL and MySQL (InnoDB's clustered index is technically a B+Tree heap, but secondary heap files exist). The weakness is point queries: without an index, every lookup is a full scan.
Real-Life: PostgreSQL Table Storage
When you create a table in PostgreSQL and insert rows, they are stored in a heap file on disk. Each table maps to one or more OS files, and each file is divided into 8 KB pages.
Inserting a row:
- PostgreSQL checks the Free Space Map to find a page with enough room.
- It writes the row into that page at the next available slot.
- The row's physical location is called a ctid — a (page, offset) pair, which is PostgreSQL's Record ID.
SELECT * FROM users WHERE id = 42 (no index):
- PostgreSQL performs a sequential scan — it reads every page of the heap file.
- For each page, it examines each row tuple and checks if id = 42.
- This is why adding an index (B-tree on id) is critical: it maps id=42 directly to a ctid, skipping the scan.
Deleting a row:
- The row is not physically removed. Instead, it is marked as dead (invisible to new transactions).
- The space is reclaimed later by VACUUM, which updates the Free Space Map so future inserts can reuse the slot.
Heap files are also used in MySQL's MyISAM engine, SQLite's table storage, and Oracle's heap-organized tables.