Back to DAG

OLTP vs OLAP

databases

Two Fundamentally Different Workloads

Databases serve two fundamentally different workload patterns: OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing). Understanding the difference is critical because the optimal storage format, indexing strategy, and hardware configuration are completely different for each.

OLTP: Transaction Processing

OLTP systems handle the operational workload of an application: inserting an order, updating a user profile, checking inventory. Characteristics:

  • Short transactions: Each operation touches a small number of rows (usually 1–10).
  • Point queries: Look up a specific row by primary key or a small result set by indexed column.
  • High concurrency: Thousands of concurrent users, each running small, fast transactions.
  • Row-oriented storage: Data is stored row-by-row on disk. Reading one complete row (all its columns) is a single sequential read.
  • Normalized schema: 3NF or higher, minimizing data duplication, with foreign keys between tables.
  • Latency-optimized: Queries must return in milliseconds. Indexes (B-trees) enable fast point lookups.

Examples: PostgreSQL, MySQL, Oracle for banking, e-commerce, social media.

OLAP: Analytical Processing

OLAP systems answer analytical questions: "what was the total revenue by product category per quarter?" Characteristics:

  • Complex queries: Aggregate millions or billions of rows. GROUP BY, SUM, AVG, COUNT over large datasets.
  • Full table scans: Queries read many rows but typically only a few columns per row.
  • Columnar storage: Data is stored column-by-column. A query that only needs revenue and category reads only those two columns, skipping all others. Compression is highly effective because values in a column are similar.
  • Denormalized/star schema: Fact table + dimension tables, optimized for JOIN-free or simple-JOIN analytics.
  • Throughput-optimized: A single query may take seconds or minutes but processes enormous amounts of data.

Examples: Snowflake, BigQuery, Redshift, ClickHouse.

ETL Bridges the Two

OLTP systems are not suitable for analytics (a complex scan would block transactions). OLAP systems are not suitable for transactions (columnar storage makes single-row updates expensive). ETL (Extract, Transform, Load) copies data from OLTP to OLAP systems, bridging the gap.

HTAP: Hybrid Transactional/Analytical Processing

HTAP systems attempt to serve both workloads in a single system. They use a dual storage engine: row-store for transactions, columnar store for analytics, with real-time data synchronization between the two. Examples:

  • TiDB: Combines TiKV (row-based) with TiFlash (columnar) in a single cluster.
  • CockroachDB: Supports analytical queries on its distributed row store.
  • AlloyDB (Google): PostgreSQL-compatible with a columnar cache for analytics.
  • SingleStore (MemSQL): In-memory row store + columnar disk store.

Real-Life: Why You Need Both

Real-World Example

E-commerce platform example:

OLTP (PostgreSQL):

  • User places an order: INSERT INTO orders VALUES (...) — latency: 5ms
  • Check inventory: SELECT stock FROM products WHERE id = 42 — latency: 2ms
  • Update shipping status: UPDATE orders SET status = 'shipped' WHERE id = 1001 — latency: 3ms
  • Concurrency: 10,000 transactions/second

OLAP (Snowflake):

  • Revenue by category by quarter: scans 500M rows, takes 8 seconds, returns 200 rows.
  • Customer retention analysis: complex window functions across 2 years of data, 45 seconds.
  • Product recommendation training: full scan of user-product interactions, 3 minutes.

If you tried analytics on the OLTP database:

  • The revenue query would lock rows, blocking orders.
  • Full table scans on row-oriented storage would read every column, even if you only need two.
  • No columnar compression: the scan reads 10x more data than necessary.

TiDB HTAP in practice:

  • Shopee (Southeast Asian e-commerce) uses TiDB to serve both their transactional workload (orders, payments) and real-time analytics (fraud detection, inventory dashboards) from a single cluster. TiFlash maintains a columnar replica that is updated within seconds of OLTP writes.

OLTP vs OLAP: Storage and Query Patterns

OLTP (Row-Oriented) Row 1: id=1 | name=Alice | email=... | age=30 Row 2: id=2 | name=Bob | email=... | age=25 Row 3: id=3 | name=Carol | email=... | age=28 SELECT * WHERE id=2 Read 1 row, all columns Short txns, point queries High concurrency, low latency Normalized schema, B-tree indexes OLAP (Column-Oriented) id col 1 2 3 name col Alice Bob Carol age col 30 25 28 SELECT AVG(age) Read 1 column only! Complex aggregations, full scans High throughput, columnar compression Star schema, denormalized ETL Extract Transform HTAP Row + Column in one system TiDB, AlloyDB
Step 1 of 2