Back to DAG

Relational Model (Codd)

databases

The Relational Model

The relational model was introduced by Edgar F. Codd in his landmark 1970 paper "A Relational Model of Data for Large Shared Data Banks." It revolutionized database design by separating the logical view of data (tables, rows, columns) from its physical storage (files, pages, indexes). Before Codd, programmers had to navigate physical data structures (network or hierarchical models) directly. The relational model freed them to think in terms of mathematical relations.

Core terminology

  • Relation = a table. Formally, a relation is a set of tuples (no duplicates, no ordering).
  • Tuple = a row. A single record in the table.
  • Attribute = a column. A named property of the relation.
  • Domain = the data type of an attribute (e.g., INTEGER, VARCHAR(255), DATE).
  • Schema = the structure definition: the relation name, its attributes, and their domains.
  • Instance = the actual data in the table at a given point in time.

Keys

  • Candidate key: a minimal set of attributes that uniquely identifies every tuple. A relation can have multiple candidate keys.
  • Primary key: the candidate key chosen as the main identifier. Must be non-null.
  • Foreign key: an attribute in one relation that references the primary key of another relation. This is how tables are linked together.
  • Superkey: any set of attributes that uniquely identifies a tuple (not necessarily minimal).

Normalization: eliminating redundancy

Normalization organizes tables to reduce data duplication and prevent update anomalies:

  • 1NF (First Normal Form): every attribute contains only atomic (indivisible) values. No arrays, no nested tables.
  • 2NF (Second Normal Form): in 1NF, and every non-key attribute depends on the entire primary key (no partial dependencies). This matters when the primary key is composite.
  • 3NF (Third Normal Form): in 2NF, and no non-key attribute depends on another non-key attribute (no transitive dependencies). Every non-key attribute depends only on the primary key.

Denormalization: in practice, highly normalized schemas can require many expensive JOINs. Databases often selectively denormalize — duplicating some data into fewer tables — to improve read performance at the cost of more complex write logic.

Relational algebra

Codd defined a set of operations on relations that form a complete query language:

  • Select (σ): filter rows that satisfy a predicate. σ_{age > 25}(Employees)
  • Project (π): choose a subset of columns. π_{name, salary}(Employees)
  • Join (⋈): combine rows from two relations where a condition is met. Employees ⋈ Departments
  • Union (∪): combine rows from two relations with the same schema.
  • Difference (−): rows in one relation but not another.
  • Cartesian Product (×): all combinations of rows from two relations.

SQL is the practical implementation of relational algebra. Every SQL query is internally translated into a tree of relational algebra operations by the query optimizer.

Real-Life: Designing a Schema

Real-World Example

Consider designing a database for an online bookstore:

Unnormalized (one big table):

OrderIDCustomerNameCustomerEmailBookTitleBookAuthorPriceQty
1Alicealice@mail.comDB SystemsRamakrishnan89.992
2Alicealice@mail.comAlgorithmsSedgewick69.991

Problems: Alice's name and email are duplicated. If she changes her email, every row must be updated (update anomaly). If we delete her last order, we lose her contact info (delete anomaly).

Normalized (3NF):

  • Customers(customer_id PK, name, email)
  • Books(book_id PK, title, author, price)
  • Orders(order_id PK, customer_id FK, book_id FK, quantity)

Now each fact is stored once. Changes to customer email update one row in Customers. The Orders table references Customers and Books via foreign keys.

Denormalization example: if your application constantly needs customer_name with every order, you might add a customer_name column to Orders to avoid the JOIN. This is a conscious tradeoff: faster reads but now you must keep the denormalized copy in sync.

Relational Model: Tables, Keys, and Operations

Relational Model: Three Normalized Tables Customers id PK name email 1 Alice a@m.com 2 Bob b@m.com 3 Carol c@m.com Books id PK title author price 1 DB Systems Ramak. 89.99 2 Algorithms Sedgew. 69.99 Orders id PK cust FK book FK qty date 1 1 1 2 2024-01 2 1 2 1 2024-02 3 2 1 1 2024-02 FK: cust_id FK: book_id Relational Algebra Operations: Select (sigma): sigma_{price > 80}(Books) = rows where price > 80 Project (pi): pi_{name,email}(Customers) = only name and email columns Join: Orders JOIN Customers ON cust_id = id = combined rows Every SQL query is translated into a tree of these operations by the query optimizer.
Step 1 of 2