Data Structures: Arrays, Trees, Graphs, Hash Tables, and Their Trade-offs
Master essential data structures and understand when to use each one. Learn about arrays, linked lists, trees, graphs, and hash tables with their complexity trade-offs.
Why Data Structures Matter
The choice of data structure affects every aspect of your system's performance: how fast you can insert, delete, search, and iterate. Understanding trade-offs helps you design efficient systems.
The golden rule: There's no "best" data structure. The right choice depends on your access patterns, data size, and performance requirements.
Arrays and Dynamic Arrays
Arrays (Fixed Size)
- Access: O(1) — Direct index access
- Insert/Delete: O(n) — Must shift elements
- Memory: Contiguous block
Dynamic Arrays (Vector, ArrayList)
- Grow by doubling (usually 2x)
- Access: O(1)
- Append: O(1) amortized
- Insert at middle: O(n)
# Python list is a dynamic array
arr = [1, 2, 3, 4, 5]
arr.append(6) # O(1) amortized
arr.insert(0, 0) # O(n) - must shift all elements
Linked Lists
Singly Linked List
- Access: O(n) — Must traverse from head
- Insert at head: O(1)
- Insert at tail: O(1) if you have a tail pointer, else O(n)
- Memory: Distributed nodes
Doubly Linked List
- Each node has prev and next pointers
- Insert/Delete: O(1) if you have the node
- Good for LRU caches, undo/redo stacks
When to use linked lists:
- Frequent insertions/deletions at known positions
- When memory is fragmented
- When you don't need random access
Trees
Binary Search Trees (BST)
- Search: O(log n) average, O(n) worst (skewed tree)
- Insert/Delete: O(log n) average
- In-order traversal: Gives sorted sequence
Balanced Trees (AVL, Red-Black)
- Self-balancing BSTs
- All operations: O(log n) guaranteed
- Used in databases, file systems
B-Trees and B+ Trees
- Used in: Database indexes, file systems
- B+ Tree: All data in leaves, internal nodes only store keys
- Optimized for disk access (fewer disk reads)
Hash Tables
- Insert/Delete/Lookup: O(1) average, O(n) worst (collisions)
- No ordering: Elements are not stored sorted
# Python dict uses a hash table
hash_table = {
"apple": 5,
"banana": 3,
"cherry": 8
}
print(hash_table["apple"]) # O(1)
Handling Collisions
Collision resolution:
- Chaining: Store multiple items per bucket (linked list)
- Open addressing: Find next empty slot (linear probing, quadratic probing, double hashing)
Hash collision attack: Malicious inputs can cause worst-case O(n) behavior. Use cryptographic hashing in security-critical applications.
Graphs
Representations
# Adjacency List (sparse graphs)
graph = {
'A': ['B', 'C'],
'B': ['A', 'D'],
'C': ['A', 'D'],
'D': ['B', 'C']
}
# Adjacency Matrix (dense graphs)
# A B C D
# A 0 1 1 0
# B 1 0 0 1
# C 1 0 0 1
# D 0 1 1 0
Graph Traversal
| Algorithm | Use Case | Time | Space |
|---|---|---|---|
| BFS | Shortest path (unweighted) | O(V+E) | O(V) |
| DFS | Path finding, cycles | O(V+E) | O(V) |
| Dijkstra | Shortest path (weighted) | O(V²) or O(E log V) | O(V) |
Choosing the Right Data Structure
| Access Pattern | Best Data Structure |
|---|---|
| Fast lookup by key | Hash Table |
| Sorted data + range queries | BST, B-Tree |
| Frequent insertions at ends | Deque, Linked List |
| LRU cache | Hash Table + Doubly Linked List |
| Routing tables | Trie |
| Dependency resolution | DAG |
System design tip: Redis combines hash tables (fast lookup) with sorted sets (range queries) and lists (queues). Understanding these trade-offs helps you design better caching and storage layers.
What to Remember for Interviews
- Hash tables: O(1) average, but collisions hurt performance
- Trees: O(log n) guaranteed with balanced trees
- Arrays: O(1) access, O(n) insertion at arbitrary positions
- Linked lists: O(1) insertion/deletion if you have the node, O(n) access
- Choose based on access patterns: Don't use a hash table if you need sorted data
Practice: For any system you design, ask: "What data structures will I use, and why?" Consider the access patterns, scale, and performance requirements.