Databases — SQL Fundamentals: Relational Models, ACID, Indexes, and Joins

Master relational database fundamentals: tables, normalization, ACID properties, B-tree indexes, and JOIN operations.

SQLrelationalACIDindexesnormalizationJOINs

Why Databases First?

Before you can design a system that scales, you need to understand how data is stored and retrieved. Databases are the foundation of almost every system you'll design. If you get the database layer wrong, no amount of caching or load balancing will save you.

The Relational Model

A relational database organizes data into tables (relations), where each table has rows (tuples) and columns (attributes).

Core Concepts

Table — A collection of related data (e.g., users, orders)
Row — A single record in a table
Column — An attribute of the table (e.g., id, name, email)
Primary Key (PK) — A column (or set of columns) that uniquely identifies each row
Foreign Key (FK) — A column that references the primary key of another table

Example: Users and Orders

sql

CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id INT PRIMARY KEY,
    user_id INT NOT NULL,
    total DECIMAL(10, 2),
    status VARCHAR(20) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

The orders.user_id is a foreign key pointing to users.id. This establishes the relationship: each order belongs to one user.

Normalization

Normalization is the process of organizing data to minimize redundancy and dependency.

Normal Forms

Form	Rule	Example Violation
1NF	Atomic values (no repeating groups)	A `phone_numbers` column containing "555-0101,555-0102"
2NF	1NF + no partial dependency on composite PK	`order_item` table storing `product_name` (depends on `product_id`, not the full PK)
3NF	2NF + no transitive dependency	`users` table storing `city` and `state` when `state` depends on `city`

When to Denormalize

Normalization is great for write-heavy systems (reduces update anomalies). But for read-heavy systems, denormalization can improve performance by reducing JOINs.

⚠️

Rule of thumb: Normalize by default. Denormalize only when you can prove that specific queries are slow due to excessive JOINs. Always denormalize with a clear understanding of the trade-offs: faster reads, slower writes, risk of data inconsistency.

ACID Properties

ACID guarantees that database transactions are processed reliably. Every production database you use should be ACID-compliant.

Property	What It Means	Example
Atomicity	All or nothing — the entire transaction succeeds or fails together	Transferring $100: debit account A AND credit account B. If one fails, both roll back.
Consistency	Data remains valid before and after the transaction	A CHECK constraint that `balance >= 0` is never violated
Isolation	Concurrent transactions don't interfere with each other	Two users booking the last seat — only one succeeds
Durability	Once committed, data survives crashes	After `COMMIT`, data is on disk and survives power loss

The Bank Transfer Example

sql

BEGIN TRANSACTION;

-- Atomicity: both must succeed or both fail
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- Debit
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- Credit

COMMIT;

If the second UPDATE fails (e.g., account 2 doesn't exist), the first UPDATE is rolled back. No money is lost.

✅

Interview question: "How does a database ensure durability?" — The answer involves the Write-Ahead Log (WAL). Before modifying data pages, the database writes the change to a sequential log on disk. On crash recovery, it replays the WAL.

Indexes — The Performance Multiplier

Without an index, the database must scan every row to find matching data (full table scan). With an index, it jumps directly to the relevant rows.

How B-Tree Indexes Work

Most relational databases use B-tree (balanced tree) indexes. Here's how they work:

Searching for "Smith":

Root → go right (T-Z? No, N-S)
N-S node → go right (Q-S)
Leaf node → find "Smith"

This is O(log n) instead of O(n).

When to Create Indexes

Columns used in WHERE clauses frequently
Columns used in JOIN conditions
Columns used in ORDER BY or GROUP BY
Columns with high selectivity (many distinct values)

When NOT to Create Indexes

Small tables (full scan is fast enough)
Columns with low selectivity (e.g., gender with only 2 values)
Write-heavy tables (every INSERT/UPDATE/DELETE must update indexes)

⚠️

Common mistake: Adding indexes to every column. Each index slows down writes (the database must maintain the index on every change). Start with indexes on foreign keys and frequently queried columns. Add more based on slow query analysis.

JOINs combine rows from two or more tables based on a related column.

Types of JOINs

sql

-- INNER JOIN: Only matching rows from both tables
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN: All rows from left table, matching rows from right (NULL if no match)
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- RIGHT JOIN: All rows from right table, matching rows from left
SELECT u.name, o.total
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;

-- FULL OUTER JOIN: All rows from both tables
SELECT u.name, o.total
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id;

Visual Guide

text

INNER JOIN:     [A ∩ B]    Only rows where both tables match
LEFT JOIN:      [A ∪ (A ∩ B)]  All left rows + matching right rows
RIGHT JOIN:     [B ∪ (A ∩ B)]  All right rows + matching left rows
FULL OUTER:     [A ∪ B]    All rows from both tables

JOIN Performance

Always index the JOIN columns (usually foreign keys)
JOIN order matters — start with the most selective table
Avoid JOINing huge tables without proper indexes
Consider denormalization if a specific JOIN is a persistent bottleneck

Transaction Isolation Levels

Isolation levels control how visible uncommitted changes are to other transactions. Each level trades off consistency for performance.

Level	Dirty Read	Non-Repeatable Read	Phantom Read	Performance
Read Uncommitted	Possible	Possible	Possible	Fastest
Read Committed	❌ No	Possible	Possible	Fast
Repeatable Read	❌ No	❌ No	Possible	Moderate
Serializable	❌ No	❌ No	❌ No	Slowest

Dirty Read: Reading uncommitted data from another transaction
Non-Repeatable Read: Getting different results when reading the same row twice
Phantom Read: New rows appearing between two identical range queries

💡

Default levels: PostgreSQL defaults to Read Committed. MySQL InnoDB defaults to Repeatable Read. Most applications are fine with the defaults.

What to Remember for Interviews

ACID — Atomicity, Consistency, Isolation, Durability — know each with an example
Normalization — 1NF, 2NF, 3NF — know when to denormalize
B-Tree indexes — O(log n) lookup, when to create and when not to
JOINs — INNER, LEFT, RIGHT, FULL OUTER — know the difference
Isolation levels — Read Committed vs Repeatable Read vs Serializable

Client-Server Architecture: REST, Statelessness, and HTTP Methods

Databases — NoSQL Overview: CAP Theorem, Eventual Consistency, and When to Use What