All GuidesScalability & Performance
Performance Optimization: Profiling, Caching, and Latency Reduction
Learn techniques to optimize system performance including caching strategies, database optimization, CDN usage, and profiling tools.
18 min readperformanceoptimizationlatencycachingprofilingdatabase tuning
Measuring What Matters
Before optimizing, measure. You can't improve what you don't measure.
Key Metrics
| Metric | Definition | Target |
|---|---|---|
| Latency | Time for a single request | P50 < 100ms |
| Throughput | Requests per second | Meet peak demand |
| Error rate | Failed requests % | < 0.1% |
| Availability | Uptime percentage | > 99.9% |
✅
P99 matters more than P50: Fast P50 but slow P99 means some users have a terrible experience. Monitor both!
Latency Breakdown
Where Time Goes
Typical Latencies
| Operation | Latency | Notes |
|---|---|---|
| L1 cache reference | 1 ns | CPU cache |
| L2 cache reference | 4 ns | CPU cache |
| Main memory access | 100 ns | RAM |
| SSD read | 100 μs | NVMe SSD |
| HDD seek | 10 ms | Disk |
| Network: Same datacenter | 1 ms | LAN |
| Network: Cross-continent | 100 ms | Internet |
Caching Strategies
The Cache Hierarchy
Cache Patterns
| Pattern | Write | Read | Consistency | Use Case |
|---|---|---|---|---|
| Cache-Aside | DB only | Cache on miss | Eventual | General |
| Read-Through | DB only | Cache on miss | Eventual | Simplified code |
| Write-Through | DB + Cache | From cache | Strong | Critical data |
| Write-Behind | Cache only | From cache | Eventual | High write |
Cache Invalidation
⚠️
Cache invalidation is hard: There are only two hard things in computer science: cache invalidation and naming things. Choose invalidation strategy based on your consistency requirements.
Database Optimization
Indexing Strategies
Query Optimization
sql
-- Bad: SELECT *
SELECT * FROM orders WHERE user_id = 123;
-- Good: SELECT specific columns
SELECT id, total, status, created_at
FROM orders
WHERE user_id = 123
AND status = 'completed'
LIMIT 10;
Denormalization Trade-offs
| Normalized | Denormalized |
|---|---|
| Write efficiency | Read efficiency |
| No data duplication | Duplicated data |
| Complex joins | Simpler queries |
| Consistency guaranteed | Consistency burden |
Network Optimization
Connection Pooling
HTTP/2 and HTTP/3 Benefits
| Feature | HTTP/1.1 | HTTP/2 | HTTP/3 |
|---|---|---|---|
| Multiplexing | ❌ | ✅ | ✅ |
| Header compression | ❌ | ✅ | ✅ |
| Parallel requests | Multiple connections | Single connection | Single connection |
| QUIC (UDP) | ❌ | ❌ | ✅ |
Compression
Code-Level Optimization
Algorithm Complexity
python
# O(n²) - Bad for large inputs
for i in data:
for j in data:
process(i, j)
# O(n log n) - Better
sorted_data = sorted(data)
for i in sorted_data:
process(i)
Avoiding N+1 Queries
python
# Bad: N+1 query problem
users = db.query("SELECT * FROM users LIMIT 100")
for user in users:
posts = db.query(f"SELECT * FROM posts WHERE user_id = {user.id}")
user.posts = posts
# Better: Single join
users = db.query("""
SELECT u.*, p.* FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")
Async I/O
python
# Blocking: Wait for each request
results = []
for url in urls:
results.append(requests.get(url)) # Sequential
# Non-blocking: All requests in parallel
import asyncio
import aiohttp
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [session.get(url) for url in urls]
return await asyncio.gather(*tasks)
Monitoring and Profiling
Application Performance Monitoring (APM)
Popular Tools
| Category | Tools |
|---|---|
| APM | New Relic, Datadog, AWS X-Ray |
| Profiling | Pyroscope, async-profiler, Chrome DevTools |
| Logging | ELK Stack, Loki, CloudWatch |
| Metrics | Prometheus + Grafana |
What to Remember for Interviews
- Measure first: Optimize based on data, not assumptions
- Cache aggressively: Memory is cheaper than compute
- Database tuning: Index wisely, avoid N+1, consider denormalization
- Network efficiency: Use HTTP/2+, compress, keep connections alive
- P99 latency: Some slow requests affect all users
✅
Practice: Profile your own web app. What's the P99 latency? Where are the bottlenecks? What's the cache hit rate? Start measuring before optimizing.