Designing Netflix: Video Streaming, CDN, and Recommendations at Scale
Analyze how to design a video streaming service like Netflix. Cover video encoding, CDN architecture, recommendation systems, and handling millions of concurrent viewers.
Why Study Netflix?
Netflix serves 230+ million subscribers in 190+ countries, streaming 17,000+ titles. They stream billions of hours monthly with users averaging 2 hours per day. Understanding Netflix reveals how to deliver media at massive scale.
Key lesson: Netflix doesn't build data centers - they pioneered cloud-native architecture on AWS, handling 35% of all internet traffic in North America at peak.
Requirements Analysis
Functional Requirements
- Video Playback: Stream video with adaptive quality
- Catalog Management: Browse and search titles
- Recommendations: Personalized content suggestions
- User Profiles: Multiple profiles per account
- Watchlist: Save titles for later
- Playback History: Resume from where you left off
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Latency | < 2 seconds to start playback |
| Quality | Up to 4K HDR with Dolby Atmos |
| Reliability | 99.99% playback uptime |
| Concurrency | Millions simultaneous streams |
| Adaptive | Quality adjusts to network conditions |
Video Processing Pipeline
The Encoding Workflow
Adaptive Bitrate Streaming
Netflix uses DASH (Dynamic Adaptive Streaming over HTTP):
Video Quality Matrix
| Resolution | Bitrate | Codec | Use Case |
|---|---|---|---|
| 480p | 1.5 Mbps | AVC | Mobile/Slow |
| 720p | 3 Mbps | HEVC | Average |
| 1080p | 5 Mbps | HEVC | HD |
| 4K HDR | 15-25 Mbps | HEVC | Premium |
Why HEVC (H.265)? It achieves the same quality as H.264 at half the bitrate. This saves massive bandwidth costs at Netflix's scale.
CDN Architecture
Open Connect
Netflix built their own CDN called Open Connect to deliver video efficiently.
How Open Connect Works
CDN Cache Strategy
| Content Type | Cache Location | TTL |
|---|---|---|
| Popular titles | OCA (ISP) | Days |
| New releases | OCA | Hours |
| Catalog metadata | AWS (API) | Minutes |
| User data | AWS (DynamoDB) | Real-time |
Why build your own CDN? At Netflix's scale, saving bandwidth costs justifies building specialized hardware. Open Connect appliances are deployed in 100+ ISPs worldwide, serving 97%+ of traffic locally.
Recommendation System
The Netflix Recommendation Challenge
- 230M+ subscribers
- 17,000+ titles
- Each user sees a personalized experience
- Goal: Maximize watch time and satisfaction
Recommendation Architecture
Two-Tower Model for Recommendations
Recommendation Types
| Type | Algorithm | Example |
|---|---|---|
| Continue Watching | User-state based | Resume paused video |
| Because You Watched | Item similarity | Similar genre/director |
| Top Picks | Collaborative filtering | Users like you watched |
| Trending | Aggregation + time decay | Popular now |
| New Releases | Freshness ranking | Recently added |
Database Architecture
What Netflix Actually Uses
Data Storage Choices
| Data Type | Storage | Reason |
|---|---|---|
| User events | Cassandra | Write-heavy, scalable |
| Watch history | Cassandra | Time-series, append-only |
| Profiles | DynamoDB | Low latency reads |
| Catalog | Elasticsearch | Full-text search |
| Transactions | PostgreSQL | ACID requirements |
API Architecture
Backend for Frontend (BFF)
Netflix uses different APIs for different devices:
Device Profiles
{
"deviceProfile": {
"type": "tv-4k",
"manufacturer": "samsung",
"model": "QN65Q80B",
"os": "Tizen 6.5",
"supportedCodecs": ["hevc", "av1"],
"maxResolution": "3840x2160",
"maxFramerate": 60,
"digitalRights": ["widevine", "playready"]
}
}
Resiliency Patterns
Chaos Engineering at Netflix
Netflix pioneered chaos engineering with Chaos Monkey and its siblings:
Netflix's "Nothing Fails" Culture
| Pattern | Implementation |
|---|---|
| Circuit Breaker | Hystrix (now Resilience4j) |
| Bulkhead | Separate thread pools per dependency |
| Fallback | Show cached content if API fails |
| Retry | Exponential backoff with jitter |
Architecture Diagram
End-to-End Flow
Key Numbers
| Metric | Value |
|---|---|
| Peak bandwidth | 100 Gbps |
| Streams per second | 15M+ |
| CDN cache hit rate | 97%+ |
| Open Connect locations | 100+ ISPs |
| Encoding profiles | 100+ |
| AWS instances | 100,000+ |
Key Takeaways
- Build your own CDN: At Netflix's scale, Open Connect saves billions in bandwidth
- Adaptive streaming: Dynamic quality adjustment ensures playback across network conditions
- Personalization everywhere: ML-driven recommendations increase engagement
- Device-specific APIs: BFF pattern lets each device optimize its experience
- Chaos engineering: Break things on purpose to build resilience
Interview tip: When designing streaming systems, focus on the "buffering vs quality" trade-off. Users prefer smooth playback over highest quality. This is why Netflix prioritizes avoiding rebuffers.
Follow-Up Questions
- How would you handle live streaming (sports, events)?
- How would you prevent account sharing?
- How would you design the download-for-offline feature?
- How would you handle content localization?
- How would you detect and prevent piracy?
Real Netflix trivia: Netflix's famous "culture deck" emphasizes freedom and responsibility. Their engineers can deploy to production any day of the week, any time. They process 500 million events per day for their recommendation engine, and A/B test constantly to optimize viewer experience.