Foundations

Databases — SQL Fundamentals: Relational Models, ACID, Indexes, and Joins

Master relational database fundamentals: tables, normalization, ACID properties, B-tree indexes, and JOIN operations.

14 min readSQLrelationalACIDindexesnormalizationJOINs

Why Databases First?

Before you can design a system that scales, you need to understand how data is stored and retrieved. Databases are the foundation of almost every system you'll design. If you get the database layer wrong, no amount of caching or load balancing will save you.


The Relational Model

A relational database organizes data into tables (relations), where each table has rows (tuples) and columns (attributes).

Core Concepts

  • Table — A collection of related data (e.g., users, orders)
  • Row — A single record in a table
  • Column — An attribute of the table (e.g., id, name, email)
  • Primary Key (PK) — A column (or set of columns) that uniquely identifies each row
  • Foreign Key (FK) — A column that references the primary key of another table

Example: Users and Orders

sql
CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id INT PRIMARY KEY,
    user_id INT NOT NULL,
    total DECIMAL(10, 2),
    status VARCHAR(20) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

The orders.user_id is a foreign key pointing to users.id. This establishes the relationship: each order belongs to one user.


Normalization

Normalization is the process of organizing data to minimize redundancy and dependency.

Normal Forms

FormRuleExample Violation
1NFAtomic values (no repeating groups)A phone_numbers column containing "555-0101,555-0102"
2NF1NF + no partial dependency on composite PKorder_item table storing product_name (depends on product_id, not the full PK)
3NF2NF + no transitive dependencyusers table storing city and state when state depends on city

When to Denormalize

Normalization is great for write-heavy systems (reduces update anomalies). But for read-heavy systems, denormalization can improve performance by reducing JOINs.

⚠️

Rule of thumb: Normalize by default. Denormalize only when you can prove that specific queries are slow due to excessive JOINs. Always denormalize with a clear understanding of the trade-offs: faster reads, slower writes, risk of data inconsistency.


ACID Properties

ACID guarantees that database transactions are processed reliably. Every production database you use should be ACID-compliant.

PropertyWhat It MeansExample
AtomicityAll or nothing — the entire transaction succeeds or fails togetherTransferring $100: debit account A AND credit account B. If one fails, both roll back.
ConsistencyData remains valid before and after the transactionA CHECK constraint that balance >= 0 is never violated
IsolationConcurrent transactions don't interfere with each otherTwo users booking the last seat — only one succeeds
DurabilityOnce committed, data survives crashesAfter COMMIT, data is on disk and survives power loss

The Bank Transfer Example

sql
BEGIN TRANSACTION;

-- Atomicity: both must succeed or both fail
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- Debit
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- Credit

COMMIT;

If the second UPDATE fails (e.g., account 2 doesn't exist), the first UPDATE is rolled back. No money is lost.

Interview question: "How does a database ensure durability?" — The answer involves the Write-Ahead Log (WAL). Before modifying data pages, the database writes the change to a sequential log on disk. On crash recovery, it replays the WAL.


Indexes — The Performance Multiplier

Without an index, the database must scan every row to find matching data (full table scan). With an index, it jumps directly to the relevant rows.

How B-Tree Indexes Work

Most relational databases use B-tree (balanced tree) indexes. Here's how they work:

Searching for "Smith":

  1. Root → go right (T-Z? No, N-S)
  2. N-S node → go right (Q-S)
  3. Leaf node → find "Smith"

This is O(log n) instead of O(n).

When to Create Indexes

  • Columns used in WHERE clauses frequently
  • Columns used in JOIN conditions
  • Columns used in ORDER BY or GROUP BY
  • Columns with high selectivity (many distinct values)

When NOT to Create Indexes

  • Small tables (full scan is fast enough)
  • Columns with low selectivity (e.g., gender with only 2 values)
  • Write-heavy tables (every INSERT/UPDATE/DELETE must update indexes)
⚠️

Common mistake: Adding indexes to every column. Each index slows down writes (the database must maintain the index on every change). Start with indexes on foreign keys and frequently queried columns. Add more based on slow query analysis.


JOINs combine rows from two or more tables based on a related column.

Types of JOINs

sql
-- INNER JOIN: Only matching rows from both tables
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN: All rows from left table, matching rows from right (NULL if no match)
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- RIGHT JOIN: All rows from right table, matching rows from left
SELECT u.name, o.total
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;

-- FULL OUTER JOIN: All rows from both tables
SELECT u.name, o.total
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id;

Visual Guide

INNER JOIN: [A ∩ B] Only rows where both tables match LEFT JOIN: [A ∪ (A ∩ B)] All left rows + matching right rows RIGHT JOIN: [B ∪ (A ∩ B)] All right rows + matching left rows FULL OUTER: [A ∪ B] All rows from both tables

JOIN Performance

  • Always index the JOIN columns (usually foreign keys)
  • JOIN order matters — start with the most selective table
  • Avoid JOINing huge tables without proper indexes
  • Consider denormalization if a specific JOIN is a persistent bottleneck

Transaction Isolation Levels

Isolation levels control how visible uncommitted changes are to other transactions. Each level trades off consistency for performance.

LevelDirty ReadNon-Repeatable ReadPhantom ReadPerformance
Read UncommittedPossiblePossiblePossibleFastest
Read Committed❌ NoPossiblePossibleFast
Repeatable Read❌ No❌ NoPossibleModerate
Serializable❌ No❌ No❌ NoSlowest
  • Dirty Read: Reading uncommitted data from another transaction
  • Non-Repeatable Read: Getting different results when reading the same row twice
  • Phantom Read: New rows appearing between two identical range queries
💡

Default levels: PostgreSQL defaults to Read Committed. MySQL InnoDB defaults to Repeatable Read. Most applications are fine with the defaults.


What to Remember for Interviews

  1. ACID — Atomicity, Consistency, Isolation, Durability — know each with an example
  2. Normalization — 1NF, 2NF, 3NF — know when to denormalize
  3. B-Tree indexes — O(log n) lookup, when to create and when not to
  4. JOINs — INNER, LEFT, RIGHT, FULL OUTER — know the difference
  5. Isolation levels — Read Committed vs Repeatable Read vs Serializable