Two Fundamentally Different Workloads

Prerequisites(3)

Row-Oriented Storage (OLTP)Columnar Storage (OLAP)ACID Properties

Builds on to(1)

Databases serve two fundamentally different workload patterns: OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing). Understanding the difference is critical because the optimal storage format, indexing strategy, and hardware configuration are completely different for each.

OLTP: Transaction Processing

OLTP systems handle the operational workload of an application: inserting an order, updating a user profile, checking inventory. Characteristics:

Short transactions: Each operation touches a small number of rows (usually 1–10).
Point queries: Look up a specific row by primary key or a small result set by indexed column.
High concurrency: Thousands of concurrent users, each running small, fast transactions.
Row-oriented storage: Data is stored row-by-row on disk. Reading one complete row (all its columns) is a single sequential read.
Normalized schema: 3NF or higher, minimizing data duplication, with foreign keys between tables.
Latency-optimized: Queries must return in milliseconds. Indexes (B-trees) enable fast point lookups.

Examples: PostgreSQL, MySQL, Oracle for banking, e-commerce, social media.

OLAP: Analytical Processing

OLAP systems answer analytical questions: "what was the total revenue by product category per quarter?" Characteristics:

Complex queries: Aggregate millions or billions of rows. GROUP BY, SUM, AVG, COUNT over large datasets.
Full table scans: Queries read many rows but typically only a few columns per row.
Columnar storage: Data is stored column-by-column. A query that only needs revenue and category reads only those two columns, skipping all others. Compression is highly effective because values in a column are similar.
Denormalized/star schema: Fact table + dimension tables, optimized for JOIN-free or simple-JOIN analytics.
Throughput-optimized: A single query may take seconds or minutes but processes enormous amounts of data.

Examples: Snowflake, BigQuery, Redshift, ClickHouse.

ETL Bridges the Two

OLTP systems are not suitable for analytics (a complex scan would block transactions). OLAP systems are not suitable for transactions (columnar storage makes single-row updates expensive). ETL (Extract, Transform, Load) copies data from OLTP to OLAP systems, bridging the gap.

HTAP: Hybrid Transactional/Analytical Processing

HTAP systems attempt to serve both workloads in a single system. They use a dual storage engine: row-store for transactions, columnar store for analytics, with real-time data synchronization between the two. Examples:

TiDB: Combines TiKV (row-based) with TiFlash (columnar) in a single cluster.
CockroachDB: Supports analytical queries on its distributed row store.
AlloyDB (Google): PostgreSQL-compatible with a columnar cache for analytics.
SingleStore (MemSQL): In-memory row store + columnar disk store.

Real-Life: Why You Need Both

Real-World Example

E-commerce platform example:

OLTP (PostgreSQL):

User places an order: INSERT INTO orders VALUES (...) — latency: 5ms
Check inventory: SELECT stock FROM products WHERE id = 42 — latency: 2ms
Update shipping status: UPDATE orders SET status = 'shipped' WHERE id = 1001 — latency: 3ms
Concurrency: 10,000 transactions/second

OLAP (Snowflake):

Revenue by category by quarter: scans 500M rows, takes 8 seconds, returns 200 rows.
Customer retention analysis: complex window functions across 2 years of data, 45 seconds.
Product recommendation training: full scan of user-product interactions, 3 minutes.

If you tried analytics on the OLTP database:

The revenue query would lock rows, blocking orders.
Full table scans on row-oriented storage would read every column, even if you only need two.
No columnar compression: the scan reads 10x more data than necessary.

TiDB HTAP in practice:

Shopee (Southeast Asian e-commerce) uses TiDB to serve both their transactional workload (orders, payments) and real-time analytics (fraud detection, inventory dashboards) from a single cluster. TiFlash maintains a columnar replica that is updated within seconds of OLTP writes.

OLTP vs OLAP: Storage and Query Patterns

Step 1 of 2