Back to DAG

Incremental View Maintenance

databases

Applying Deltas Instead of Full Recomputation

A standard materialized view refresh re-executes the entire defining query from scratch. For a view built on a billion-row table, this means re-scanning and re-aggregating a billion rows even if only 100 rows changed since the last refresh. Incremental View Maintenance (IVM) solves this by applying only the deltas (changes) to the stored result, avoiding full recomputation.

The Delta Table Concept

When the base table changes, the changes are captured in a delta table (sometimes called a changelog or diff). Insertions are recorded as positive deltas (+), deletions as negative deltas (-), and updates as a delete followed by an insert. The IVM algorithm then determines how these deltas propagate through the view's query to produce deltas on the view result.

IVM for Simple Aggregations

For simple aggregation queries, incremental maintenance is straightforward:

  • COUNT: If 5 new rows are inserted and 2 are deleted, add (5 - 2) = 3 to the stored count. No need to re-count the entire table.
  • SUM: If a new row with value 100 is inserted, add 100 to the stored sum. If a row with value 40 is deleted, subtract 40.
  • AVG: Maintain both the sum and count incrementally, then recompute avg = sum / count.
  • MIN/MAX: These are harder — deleting the current minimum requires scanning to find the new minimum. Some systems maintain auxiliary structures (heaps or sorted indexes) to handle this efficiently.

IVM for Joins

For a view defined as SELECT * FROM A JOIN B ON A.id = B.aid:

  • When new rows are inserted into A, only join the new A rows with all of B. Append matching results to the view.
  • When new rows are inserted into B, only join all of A with the new B rows.
  • Deletions require removing the corresponding join results from the view.

This is far cheaper than re-joining all of A with all of B.

Challenges

IVM is not universally applicable. DELETE handling is complex: removing a row from a SUM is easy (subtract), but removing a row from a DISTINCT count requires knowing whether other rows still contribute that value. Complex queries involving subqueries, window functions, or HAVING clauses make delta propagation mathematically involved. Consistency is another challenge: the view must reflect a consistent snapshot, not a partially-applied delta.

Real-World Systems

Materialize is a streaming database that continuously maintains materialized views as data arrives — it is IVM taken to its logical extreme. dbt incremental models approximate IVM by processing only new/changed rows in batch ETL pipelines. Oracle's fast refresh uses materialized view logs (delta tables) to incrementally maintain views. PostgreSQL does not natively support IVM, but the pg_ivm extension is an ongoing effort to add it.

Incremental Maintenance of a Revenue Summary

Real-World Example

Consider a materialized view:

CREATE MATERIALIZED VIEW revenue_by_region AS
SELECT region, SUM(amount) AS total, COUNT(*) AS num_orders
FROM orders
GROUP BY region;

The orders table has 100 million rows. The view stores one row per region (say, 50 regions).

Full refresh re-scans 100M rows every time: ~30 seconds.

Incremental maintenance when 1000 new orders arrive:

  1. Capture deltas: The 1000 new orders form the delta table.
  2. Group the deltas: Group the 1000 new orders by region:
    region=US: SUM(amount)=50000, COUNT=400
    region=EU: SUM(amount)=30000, COUNT=350
    region=APAC: SUM(amount)=20000, COUNT=250
    
  3. Apply to view:
    UPDATE revenue_by_region
    SET total = total + 50000, num_orders = num_orders + 400
    WHERE region = 'US';
    -- repeat for EU, APAC
    
  4. Time: ~5 milliseconds (process 1000 rows instead of 100 million).

DELETE handling: If an order with amount=200 in region=US is deleted:

UPDATE revenue_by_region
SET total = total - 200, num_orders = num_orders - 1
WHERE region = 'US';

When IVM breaks down: If the view used COUNT(DISTINCT customer_id), deleting an order does not tell us whether to decrement the count — other orders from the same customer may exist. The system would need to query the base table to check, partially defeating the purpose of IVM.

Materialize example: In Materialize, you create a source (e.g., a Kafka topic) and a materialized view. As events arrive on the topic, the view updates in real-time with sub-second latency — no manual refresh needed.

Full Refresh vs. Incremental View Maintenance

Full Refresh Base Table (100M rows) scan ALL Re-aggregate 100M rows Materialized View ~30 seconds Incremental (IVM) Base Table (100M rows) Delta (1000 rows) apply delta Group + merge delta Materialized View ~5 ms IVM Delta Rules: SUM += delta_sum COUNT += delta_count JOIN: new_A JOIN all_B DELETE from SUM: subtract the deleted value. DELETE from COUNT DISTINCT: must check if other rows remain.
Step 1 of 2