Scalability & Performance

Performance Optimization: Profiling, Caching, and Latency Reduction

Learn techniques to optimize system performance including caching strategies, database optimization, CDN usage, and profiling tools.

18 min readperformanceoptimizationlatencycachingprofilingdatabase tuning

Measuring What Matters

Before optimizing, measure. You can't improve what you don't measure.

Key Metrics

MetricDefinitionTarget
LatencyTime for a single requestP50 < 100ms
ThroughputRequests per secondMeet peak demand
Error rateFailed requests %< 0.1%
AvailabilityUptime percentage> 99.9%

P99 matters more than P50: Fast P50 but slow P99 means some users have a terrible experience. Monitor both!


Latency Breakdown

Where Time Goes

Typical Latencies

OperationLatencyNotes
L1 cache reference1 nsCPU cache
L2 cache reference4 nsCPU cache
Main memory access100 nsRAM
SSD read100 μsNVMe SSD
HDD seek10 msDisk
Network: Same datacenter1 msLAN
Network: Cross-continent100 msInternet

Caching Strategies

The Cache Hierarchy

Cache Patterns

PatternWriteReadConsistencyUse Case
Cache-AsideDB onlyCache on missEventualGeneral
Read-ThroughDB onlyCache on missEventualSimplified code
Write-ThroughDB + CacheFrom cacheStrongCritical data
Write-BehindCache onlyFrom cacheEventualHigh write

Cache Invalidation

⚠️

Cache invalidation is hard: There are only two hard things in computer science: cache invalidation and naming things. Choose invalidation strategy based on your consistency requirements.


Database Optimization

Indexing Strategies

Query Optimization

sql
-- Bad: SELECT *
SELECT * FROM orders WHERE user_id = 123;

-- Good: SELECT specific columns
SELECT id, total, status, created_at 
FROM orders 
WHERE user_id = 123 
AND status = 'completed'
LIMIT 10;

Denormalization Trade-offs

NormalizedDenormalized
Write efficiencyRead efficiency
No data duplicationDuplicated data
Complex joinsSimpler queries
Consistency guaranteedConsistency burden

Network Optimization

Connection Pooling

HTTP/2 and HTTP/3 Benefits

FeatureHTTP/1.1HTTP/2HTTP/3
Multiplexing
Header compression
Parallel requestsMultiple connectionsSingle connectionSingle connection
QUIC (UDP)

Compression


Code-Level Optimization

Algorithm Complexity

python
# O(n²) - Bad for large inputs
for i in data:
    for j in data:
        process(i, j)

# O(n log n) - Better
sorted_data = sorted(data)
for i in sorted_data:
    process(i)

Avoiding N+1 Queries

python
# Bad: N+1 query problem
users = db.query("SELECT * FROM users LIMIT 100")
for user in users:
    posts = db.query(f"SELECT * FROM posts WHERE user_id = {user.id}")
    user.posts = posts

# Better: Single join
users = db.query("""
    SELECT u.*, p.* FROM users u
    LEFT JOIN posts p ON u.id = p.user_id
    WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")

Async I/O

python
# Blocking: Wait for each request
results = []
for url in urls:
    results.append(requests.get(url))  # Sequential

# Non-blocking: All requests in parallel
import asyncio
import aiohttp

async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [session.get(url) for url in urls]
        return await asyncio.gather(*tasks)

Monitoring and Profiling

Application Performance Monitoring (APM)

CategoryTools
APMNew Relic, Datadog, AWS X-Ray
ProfilingPyroscope, async-profiler, Chrome DevTools
LoggingELK Stack, Loki, CloudWatch
MetricsPrometheus + Grafana

What to Remember for Interviews

  1. Measure first: Optimize based on data, not assumptions
  2. Cache aggressively: Memory is cheaper than compute
  3. Database tuning: Index wisely, avoid N+1, consider denormalization
  4. Network efficiency: Use HTTP/2+, compress, keep connections alive
  5. P99 latency: Some slow requests affect all users

Practice: Profile your own web app. What's the P99 latency? Where are the bottlenecks? What's the cache hit rate? Start measuring before optimizing.