Expert Case Studies

Designing Netflix: Video Streaming, CDN, and Recommendations at Scale

Analyze how to design a video streaming service like Netflix. Cover video encoding, CDN architecture, recommendation systems, and handling millions of concurrent viewers.

30 min readcase studynetflixstreamingCDNrecommendationsvideo encodingscale

Why Study Netflix?

Netflix serves 230+ million subscribers in 190+ countries, streaming 17,000+ titles. They stream billions of hours monthly with users averaging 2 hours per day. Understanding Netflix reveals how to deliver media at massive scale.

Key lesson: Netflix doesn't build data centers - they pioneered cloud-native architecture on AWS, handling 35% of all internet traffic in North America at peak.


Requirements Analysis

Functional Requirements

  1. Video Playback: Stream video with adaptive quality
  2. Catalog Management: Browse and search titles
  3. Recommendations: Personalized content suggestions
  4. User Profiles: Multiple profiles per account
  5. Watchlist: Save titles for later
  6. Playback History: Resume from where you left off

Non-Functional Requirements

RequirementTarget
Latency< 2 seconds to start playback
QualityUp to 4K HDR with Dolby Atmos
Reliability99.99% playback uptime
ConcurrencyMillions simultaneous streams
AdaptiveQuality adjusts to network conditions

Video Processing Pipeline

The Encoding Workflow

Adaptive Bitrate Streaming

Netflix uses DASH (Dynamic Adaptive Streaming over HTTP):

Video Quality Matrix

ResolutionBitrateCodecUse Case
480p1.5 MbpsAVCMobile/Slow
720p3 MbpsHEVCAverage
1080p5 MbpsHEVCHD
4K HDR15-25 MbpsHEVCPremium
💡

Why HEVC (H.265)? It achieves the same quality as H.264 at half the bitrate. This saves massive bandwidth costs at Netflix's scale.


CDN Architecture

Open Connect

Netflix built their own CDN called Open Connect to deliver video efficiently.

How Open Connect Works

CDN Cache Strategy

Content TypeCache LocationTTL
Popular titlesOCA (ISP)Days
New releasesOCAHours
Catalog metadataAWS (API)Minutes
User dataAWS (DynamoDB)Real-time

Why build your own CDN? At Netflix's scale, saving bandwidth costs justifies building specialized hardware. Open Connect appliances are deployed in 100+ ISPs worldwide, serving 97%+ of traffic locally.


Recommendation System

The Netflix Recommendation Challenge

  • 230M+ subscribers
  • 17,000+ titles
  • Each user sees a personalized experience
  • Goal: Maximize watch time and satisfaction

Recommendation Architecture

Two-Tower Model for Recommendations

Recommendation Types

TypeAlgorithmExample
Continue WatchingUser-state basedResume paused video
Because You WatchedItem similaritySimilar genre/director
Top PicksCollaborative filteringUsers like you watched
TrendingAggregation + time decayPopular now
New ReleasesFreshness rankingRecently added

Database Architecture

What Netflix Actually Uses

Data Storage Choices

Data TypeStorageReason
User eventsCassandraWrite-heavy, scalable
Watch historyCassandraTime-series, append-only
ProfilesDynamoDBLow latency reads
CatalogElasticsearchFull-text search
TransactionsPostgreSQLACID requirements

API Architecture

Backend for Frontend (BFF)

Netflix uses different APIs for different devices:

Device Profiles

json
{
  "deviceProfile": {
    "type": "tv-4k",
    "manufacturer": "samsung",
    "model": "QN65Q80B",
    "os": "Tizen 6.5",
    "supportedCodecs": ["hevc", "av1"],
    "maxResolution": "3840x2160",
    "maxFramerate": 60,
    "digitalRights": ["widevine", "playready"]
  }
}

Resiliency Patterns

Chaos Engineering at Netflix

Netflix pioneered chaos engineering with Chaos Monkey and its siblings:

Netflix's "Nothing Fails" Culture

PatternImplementation
Circuit BreakerHystrix (now Resilience4j)
BulkheadSeparate thread pools per dependency
FallbackShow cached content if API fails
RetryExponential backoff with jitter

Architecture Diagram

End-to-End Flow


Key Numbers

MetricValue
Peak bandwidth100 Gbps
Streams per second15M+
CDN cache hit rate97%+
Open Connect locations100+ ISPs
Encoding profiles100+
AWS instances100,000+

Key Takeaways

  1. Build your own CDN: At Netflix's scale, Open Connect saves billions in bandwidth
  2. Adaptive streaming: Dynamic quality adjustment ensures playback across network conditions
  3. Personalization everywhere: ML-driven recommendations increase engagement
  4. Device-specific APIs: BFF pattern lets each device optimize its experience
  5. Chaos engineering: Break things on purpose to build resilience

Interview tip: When designing streaming systems, focus on the "buffering vs quality" trade-off. Users prefer smooth playback over highest quality. This is why Netflix prioritizes avoiding rebuffers.


Follow-Up Questions

  1. How would you handle live streaming (sports, events)?
  2. How would you prevent account sharing?
  3. How would you design the download-for-offline feature?
  4. How would you handle content localization?
  5. How would you detect and prevent piracy?
💡

Real Netflix trivia: Netflix's famous "culture deck" emphasizes freedom and responsibility. Their engineers can deploy to production any day of the week, any time. They process 500 million events per day for their recommendation engine, and A/B test constantly to optimize viewer experience.