Expert Case Studies

Designing Instagram: A Deep Dive into Photo Sharing at Scale

Analyze how to design a photo-sharing platform like Instagram. Cover storage, CDN, feed ranking, and how to scale to billions of photos.

28 min readcase studyinstagramphoto sharingsocial mediafeed designscale

Why Study Instagram?

Instagram handles 2 billion monthly active users, 500 million daily active users, and over 95 million photos/videos uploaded daily. Understanding how it's built reveals how to design massive-scale social platforms.

Approach: We'll design a simplified version. In interviews, focus on identifying key scaling challenges and proposing reasonable solutions rather than implementing full Instagram.


Requirements Analysis

Functional Requirements

  1. User Management: Register, login, profile management
  2. Photo Upload: Upload photos with filters and captions
  3. Feed: See photos from followed users (ordered by recency or algorithmic)
  4. Social Graph: Follow/unfollow users
  5. Engagement: Like, comment, share
  6. Search: Find users and hashtags

Non-Functional Requirements

RequirementTarget
Scale2B photos/day upload rate at peak
StoragePetabytes of photo storage
LatencyFeed loads in < 500ms
Availability99.9% uptime
DurabilityPhotos never lost

High-Level Architecture


Photo Upload Flow

Step-by-Step

Key Design Decisions

DecisionChoiceRationale
Upload methodDirect to S3Reduces server load
Image storageS3 + CloudFrontDurable, CDN-backed
Image processingAsync workersDoesn't block upload
MetadataPostgreSQLRelational, ACID
💡

Why direct upload to S3? The API server becomes a bottleneck for uploads. By using presigned URLs, clients upload directly to storage, and the API just coordinates.


Feed Design

The Fan-Out Problem

When you post a photo, it needs to appear in your followers' feeds. If you have 1 million followers, that's 1 million entries to write.

Push vs Pull Model

ModelDescriptionProsCons
Push (Fan-out on write)Write to all followers' feeds on postFast readsExpensive for popular users
Pull (Fan-out on read)Compute feed when requestedCheaper storageSlow reads
HybridPush for small accounts, pull for largeBalancedComplex

How Instagram Actually Does It

Instagram uses a hybrid approach:

  1. Celebrity accounts (>10K followers): Don't fan out writes. Pull on read.
  2. Regular accounts: Fan out writes to followers' feeds.
  3. Cached feeds: Most users get served from cache, not computed fresh.
python
def get_feed(user_id, limit=30):
    # Check cache first
    cached_feed = redis.get(f"feed:{user_id}")
    if cached_feed:
        return cached_feed
    
    # Pull for celebrities + recent from follows
    posts = []
    
    # Get celebrity posts (pulled)
    celebrities = get_celebrity_follows(user_id)
    for celeb in celebrities:
        posts.extend(get_recent_posts(celeb, limit=10))
    
    # Get regular followee posts (should be cached/fan-outed)
    regular = get_cached_followee_posts(user_id)
    posts.extend(regular)
    
    # Sort and paginate
    posts.sort(key=lambda x: x.timestamp, reverse=True)
    return posts[:limit]

Database Schema

User Table

sql
CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    username VARCHAR(30) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    profile_photo_url VARCHAR(500),
    bio TEXT,
    follower_count INT DEFAULT 0,
    following_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_username ON users(username);
CREATE INDEX idx_follower_count ON users(follower_count);

Post Table

sql
CREATE TABLE posts (
    id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    image_url VARCHAR(500) NOT NULL,
    caption TEXT,
    like_count INT DEFAULT 0,
    comment_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_user_id ON posts(user_id);
CREATE INDEX idx_created_at ON posts(created_at DESC);
CREATE INDEX idx_user_created ON posts(user_id, created_at DESC);

Follow Table

sql
CREATE TABLE follows (
    follower_id BIGINT NOT NULL REFERENCES users(id),
    following_id BIGINT NOT NULL REFERENCES users(id),
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (follower_id, following_id)
);

CREATE INDEX idx_following ON follows(following_id);

Storage Architecture

Photo Storage at Scale

Storage Estimates

ContentSizeDailyStorage per Year
Photo (compressed)200KB50M3.6 TB
Thumbnail10KB50M180 GB
Video (avg)3MB5M5.5 TB
Metadata1KB55M20 GB
💡

Total daily storage growth: ~10 TB/day
After 1 year: ~3.6 PB (without considering deduplication and compression)


Caching Strategy

Cache Hierarchy

Cache Keys

python
# Feed cache - user's personalized feed
feed:{user_id}  # List of post IDs

# Profile cache - user info
profile:{user_id}  # User object

# Post cache - individual post
post:{post_id}  # Post object

# Timeline cache - celebrity posts
timeline:{user_id}  # List of celebrity post IDs

Search Architecture

Search Features

FeatureImplementation
User searchPrefix matching on username
Hashtag searchFull-text on hashtags
Location searchGeospatial queries
TrendingAggregation + time decay

Scaling Challenges & Solutions

ChallengeSolution
Photo upload bottleneckDirect upload to S3 via presigned URLs
Feed computationHybrid push/pull, heavy caching
Celebrity accountsDon't fan out; pull on read
Image resizingAsync processing with queues
Hot storage costsTier to cold storage after 90 days
Search performanceElasticsearch for full-text search

Key Takeaways

  1. Minimize writes at upload: Use presigned URLs for direct S3 upload
  2. Balance push vs pull: Hybrid model handles both small and large accounts
  3. Cache aggressively: Feed, profile, and post caches dramatically reduce DB load
  4. Async everything: Image processing, notifications, analytics - use queues
  5. Tier storage: Not all data needs to be hot; move old content to cold storage

Interview tip: When designing a social platform, always consider the "write amplification" problem. A single post might need to appear in thousands of feeds. Address this with fan-out control and caching.


Follow-Up Questions to Consider

  1. How would you handle video uploads (much larger files)?
  2. How would you implement the Explore page (algorithmic discovery)?
  3. How would you prevent spam and fake accounts?
  4. How would you design direct messaging?
  5. How would you handle real-time notifications?
💡

Real Instagram trivia: Instagram moved from Ruby to Python early on for better performance. They use Django for the web framework and React Native for mobile. Feed ranking is powered by ML models that consider engagement likelihood, not just recency.