fivenines

Theory / 10 min

Startup Recovery

Startup recovery is the moment Redis turns durable records back into live memory.

Before the server can safely accept ordinary client traffic, it has to answer a question:

what should the in-memory keyspace be right now?

The answer depends on configuration, available files, and sometimes replication topology.

Choosing The Source Of Truth

Redis may have several possible recovery inputs:

RDB snapshot
AOF command log
hybrid AOF with base plus incremental tail
replication from a primary
empty dataset

Recovery is not "load everything you find." It is a configured decision about which source represents the dataset.

If AOF is enabled, it often takes precedence because it may contain more recent writes than the last RDB snapshot. In other configurations, an RDB file may be the recovery source. A replica might instead start by synchronizing from its primary.

The Startup Sequence

A clean startup has a deliberate order:

parse configuration
initialize server structures
load dataset from the chosen source
discard data that is already expired
rebuild in-memory indexes and metadata
start networking
begin accepting normal commands

Networking should not expose a half-loaded keyspace as if it were ready. Redis can report a loading state, but ordinary command processing must wait until recovery has reached a coherent point.

RDB Recovery

RDB recovery deserializes a point-in-time dataset. It is usually direct: read key records, reconstruct objects, restore expiration timestamps, and skip anything already expired.

The strength of RDB recovery is speed and compactness. The weakness is that the snapshot may not include the most recent writes before a crash.

AOF Recovery

AOF recovery replays commands in order.

empty db
  -> SET a 1
  -> HSET user:7 name Ada
  -> EXPIRE a ...
  -> final reconstructed state

This path can use much of the normal command execution machinery, but it should not behave exactly like a live client. It should rebuild state without sending network replies, publishing to nonexistent subscribers, or accidentally triggering side effects that belong only to active traffic.

AOF replay also has to handle file damage carefully. A truncated tail may be repairable depending on policy. Silent corruption is not acceptable.

Expired Data Must Stay Dead

Recovery has to respect time. If a key expired while Redis was offline, loading should not bring it back.

That is why persistence formats store absolute expiration timestamps. On recovery:

deadline <= now -> do not restore
deadline > now  -> restore key and deadline

This rule keeps expiration consistent across restarts.

Recovery Rebuilds More Than Values

The obvious goal is to restore keys and values. The less obvious goal is to restore the server's internal ability to operate efficiently.

That can include expiration indexes, memory accounting, object encodings, replication offsets, function libraries, and persistence metadata. Some of this is loaded directly. Some is reconstructed from the dataset.

Startup recovery is complete only when Redis has not merely read the data, but rebuilt a coherent server around it.

Recovery Is A Trust Boundary

Persistence files are promises from the past. Startup decides how much to trust them, how to validate them, and what to do when they are incomplete.

That makes recovery one of the most important correctness paths in the system. A fast database that cannot reliably reconstruct its own memory after a restart is not durable. Redis' recovery path is where persistence becomes real.

Next step

See what actually stuck.

Take the practice scenarios now.