Two Fundamentally Different Workloads
Databases serve two fundamentally different workload patterns: OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing). Understanding the difference is critical because the optimal storage format, indexing strategy, and hardware configuration are completely different for each.
OLTP: Transaction Processing
OLTP systems handle the operational workload of an application: inserting an order, updating a user profile, checking inventory. Characteristics:
- Short transactions: Each operation touches a small number of rows (usually 1–10).
- Point queries: Look up a specific row by primary key or a small result set by indexed column.
- High concurrency: Thousands of concurrent users, each running small, fast transactions.
- Row-oriented storage: Data is stored row-by-row on disk. Reading one complete row (all its columns) is a single sequential read.
- Normalized schema: 3NF or higher, minimizing data duplication, with foreign keys between tables.
- Latency-optimized: Queries must return in milliseconds. Indexes (B-trees) enable fast point lookups.
Examples: PostgreSQL, MySQL, Oracle for banking, e-commerce, social media.
OLAP: Analytical Processing
OLAP systems answer analytical questions: "what was the total revenue by product category per quarter?" Characteristics:
- Complex queries: Aggregate millions or billions of rows. GROUP BY, SUM, AVG, COUNT over large datasets.
- Full table scans: Queries read many rows but typically only a few columns per row.
- Columnar storage: Data is stored column-by-column. A query that only needs
revenueandcategoryreads only those two columns, skipping all others. Compression is highly effective because values in a column are similar. - Denormalized/star schema: Fact table + dimension tables, optimized for JOIN-free or simple-JOIN analytics.
- Throughput-optimized: A single query may take seconds or minutes but processes enormous amounts of data.
Examples: Snowflake, BigQuery, Redshift, ClickHouse.
ETL Bridges the Two
OLTP systems are not suitable for analytics (a complex scan would block transactions). OLAP systems are not suitable for transactions (columnar storage makes single-row updates expensive). ETL (Extract, Transform, Load) copies data from OLTP to OLAP systems, bridging the gap.
HTAP: Hybrid Transactional/Analytical Processing
HTAP systems attempt to serve both workloads in a single system. They use a dual storage engine: row-store for transactions, columnar store for analytics, with real-time data synchronization between the two. Examples:
- TiDB: Combines TiKV (row-based) with TiFlash (columnar) in a single cluster.
- CockroachDB: Supports analytical queries on its distributed row store.
- AlloyDB (Google): PostgreSQL-compatible with a columnar cache for analytics.
- SingleStore (MemSQL): In-memory row store + columnar disk store.
Real-Life: Why You Need Both
E-commerce platform example:
OLTP (PostgreSQL):
- User places an order:
INSERT INTO orders VALUES (...)— latency: 5ms - Check inventory:
SELECT stock FROM products WHERE id = 42— latency: 2ms - Update shipping status:
UPDATE orders SET status = 'shipped' WHERE id = 1001— latency: 3ms - Concurrency: 10,000 transactions/second
OLAP (Snowflake):
- Revenue by category by quarter: scans 500M rows, takes 8 seconds, returns 200 rows.
- Customer retention analysis: complex window functions across 2 years of data, 45 seconds.
- Product recommendation training: full scan of user-product interactions, 3 minutes.
If you tried analytics on the OLTP database:
- The revenue query would lock rows, blocking orders.
- Full table scans on row-oriented storage would read every column, even if you only need two.
- No columnar compression: the scan reads 10x more data than necessary.
TiDB HTAP in practice:
- Shopee (Southeast Asian e-commerce) uses TiDB to serve both their transactional workload (orders, payments) and real-time analytics (fraud detection, inventory dashboards) from a single cluster. TiFlash maintains a columnar replica that is updated within seconds of OLTP writes.