What is a Hash Table?
A hash table is a data structure that maps keys to values using a hash function. The hash function converts a key into an index in an underlying array (called buckets), enabling average-case O(1) lookup, insertion, and deletion.
Core idea
- Hash function — takes a key and produces an integer. A good hash function distributes keys uniformly across buckets.
- Bucket array — a fixed-size array where each slot holds zero or more entries.
- Collision resolution — when two keys hash to the same bucket, we need a strategy. The two most common are:
- Chaining: each bucket holds a linked list of entries.
- Open addressing: probe for the next available slot (linear probing, quadratic probing, double hashing).
Complexity
| Operation | Average | Worst case (all keys collide) |
|---|---|---|
| Get | O(1) | O(n) |
| Put | O(1) | O(n) |
| Remove | O(1) | O(n) |
The worst case happens when every key hashes to the same bucket, degenerating the table into a linked list.
Load factor and resizing
The load factor = (number of entries) / (number of buckets). When it exceeds a threshold (commonly 0.75), the table resizes — typically doubling the bucket array and rehashing all entries. This keeps the average chain length short.
Real-Life: Phone Book Lookup
Think of a phone book organized by the first letter of each last name. The "hash function" is taking the first letter: all names starting with "S" go into the S section. Within that section, you scan linearly — that's chaining.
If most people in your town have last names starting with "S", the S section becomes huge while other sections are nearly empty — that's a bad hash function causing uneven distribution.
In databases, hash tables power:
- Hash indexes — PostgreSQL hash indexes for equality lookups
- Hash joins — the database builds a hash table on the smaller relation, then probes it with the larger relation
- In-memory caches — Redis is essentially a networked hash table