Core Algorithms

Data Structures: Arrays, Trees, Graphs, Hash Tables, and Their Trade-offs

Master essential data structures and understand when to use each one. Learn about arrays, linked lists, trees, graphs, and hash tables with their complexity trade-offs.

22 min readdata structuresarraystreesgraphshash tableslinked lists

Why Data Structures Matter

The choice of data structure affects every aspect of your system's performance: how fast you can insert, delete, search, and iterate. Understanding trade-offs helps you design efficient systems.

The golden rule: There's no "best" data structure. The right choice depends on your access patterns, data size, and performance requirements.


Arrays and Dynamic Arrays

Arrays (Fixed Size)

  • Access: O(1) — Direct index access
  • Insert/Delete: O(n) — Must shift elements
  • Memory: Contiguous block

Dynamic Arrays (Vector, ArrayList)

  • Grow by doubling (usually 2x)
  • Access: O(1)
  • Append: O(1) amortized
  • Insert at middle: O(n)
python
# Python list is a dynamic array
arr = [1, 2, 3, 4, 5]
arr.append(6)  # O(1) amortized
arr.insert(0, 0)  # O(n) - must shift all elements

Linked Lists

Singly Linked List

  • Access: O(n) — Must traverse from head
  • Insert at head: O(1)
  • Insert at tail: O(1) if you have a tail pointer, else O(n)
  • Memory: Distributed nodes

Doubly Linked List

  • Each node has prev and next pointers
  • Insert/Delete: O(1) if you have the node
  • Good for LRU caches, undo/redo stacks
💡

When to use linked lists:

  • Frequent insertions/deletions at known positions
  • When memory is fragmented
  • When you don't need random access

Trees

Binary Search Trees (BST)

  • Search: O(log n) average, O(n) worst (skewed tree)
  • Insert/Delete: O(log n) average
  • In-order traversal: Gives sorted sequence

Balanced Trees (AVL, Red-Black)

  • Self-balancing BSTs
  • All operations: O(log n) guaranteed
  • Used in databases, file systems

B-Trees and B+ Trees

  • Used in: Database indexes, file systems
  • B+ Tree: All data in leaves, internal nodes only store keys
  • Optimized for disk access (fewer disk reads)

Hash Tables

  • Insert/Delete/Lookup: O(1) average, O(n) worst (collisions)
  • No ordering: Elements are not stored sorted
python
# Python dict uses a hash table
hash_table = {
    "apple": 5,
    "banana": 3,
    "cherry": 8
}
print(hash_table["apple"])  # O(1)

Handling Collisions

Collision resolution:

  1. Chaining: Store multiple items per bucket (linked list)
  2. Open addressing: Find next empty slot (linear probing, quadratic probing, double hashing)
⚠️

Hash collision attack: Malicious inputs can cause worst-case O(n) behavior. Use cryptographic hashing in security-critical applications.


Graphs

Representations

python
# Adjacency List (sparse graphs)
graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D'],
    'C': ['A', 'D'],
    'D': ['B', 'C']
}

# Adjacency Matrix (dense graphs)
#  A B C D
# A 0 1 1 0
# B 1 0 0 1
# C 1 0 0 1
# D 0 1 1 0

Graph Traversal

AlgorithmUse CaseTimeSpace
BFSShortest path (unweighted)O(V+E)O(V)
DFSPath finding, cyclesO(V+E)O(V)
DijkstraShortest path (weighted)O(V²) or O(E log V)O(V)

Choosing the Right Data Structure

Access PatternBest Data Structure
Fast lookup by keyHash Table
Sorted data + range queriesBST, B-Tree
Frequent insertions at endsDeque, Linked List
LRU cacheHash Table + Doubly Linked List
Routing tablesTrie
Dependency resolutionDAG

System design tip: Redis combines hash tables (fast lookup) with sorted sets (range queries) and lists (queues). Understanding these trade-offs helps you design better caching and storage layers.


What to Remember for Interviews

  1. Hash tables: O(1) average, but collisions hurt performance
  2. Trees: O(log n) guaranteed with balanced trees
  3. Arrays: O(1) access, O(n) insertion at arbitrary positions
  4. Linked lists: O(1) insertion/deletion if you have the node, O(n) access
  5. Choose based on access patterns: Don't use a hash table if you need sorted data

Practice: For any system you design, ask: "What data structures will I use, and why?" Consider the access patterns, scale, and performance requirements.