Designing Instagram: A Deep Dive into Photo Sharing at Scale
Analyze how to design a photo-sharing platform like Instagram. Cover storage, CDN, feed ranking, and how to scale to billions of photos.
Why Study Instagram?
Instagram handles 2 billion monthly active users, 500 million daily active users, and over 95 million photos/videos uploaded daily. Understanding how it's built reveals how to design massive-scale social platforms.
Approach: We'll design a simplified version. In interviews, focus on identifying key scaling challenges and proposing reasonable solutions rather than implementing full Instagram.
Requirements Analysis
Functional Requirements
- User Management: Register, login, profile management
- Photo Upload: Upload photos with filters and captions
- Feed: See photos from followed users (ordered by recency or algorithmic)
- Social Graph: Follow/unfollow users
- Engagement: Like, comment, share
- Search: Find users and hashtags
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Scale | 2B photos/day upload rate at peak |
| Storage | Petabytes of photo storage |
| Latency | Feed loads in < 500ms |
| Availability | 99.9% uptime |
| Durability | Photos never lost |
High-Level Architecture
Photo Upload Flow
Step-by-Step
Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Upload method | Direct to S3 | Reduces server load |
| Image storage | S3 + CloudFront | Durable, CDN-backed |
| Image processing | Async workers | Doesn't block upload |
| Metadata | PostgreSQL | Relational, ACID |
Why direct upload to S3? The API server becomes a bottleneck for uploads. By using presigned URLs, clients upload directly to storage, and the API just coordinates.
Feed Design
The Fan-Out Problem
When you post a photo, it needs to appear in your followers' feeds. If you have 1 million followers, that's 1 million entries to write.
Push vs Pull Model
| Model | Description | Pros | Cons |
|---|---|---|---|
| Push (Fan-out on write) | Write to all followers' feeds on post | Fast reads | Expensive for popular users |
| Pull (Fan-out on read) | Compute feed when requested | Cheaper storage | Slow reads |
| Hybrid | Push for small accounts, pull for large | Balanced | Complex |
How Instagram Actually Does It
Instagram uses a hybrid approach:
- Celebrity accounts (>10K followers): Don't fan out writes. Pull on read.
- Regular accounts: Fan out writes to followers' feeds.
- Cached feeds: Most users get served from cache, not computed fresh.
def get_feed(user_id, limit=30):
# Check cache first
cached_feed = redis.get(f"feed:{user_id}")
if cached_feed:
return cached_feed
# Pull for celebrities + recent from follows
posts = []
# Get celebrity posts (pulled)
celebrities = get_celebrity_follows(user_id)
for celeb in celebrities:
posts.extend(get_recent_posts(celeb, limit=10))
# Get regular followee posts (should be cached/fan-outed)
regular = get_cached_followee_posts(user_id)
posts.extend(regular)
# Sort and paginate
posts.sort(key=lambda x: x.timestamp, reverse=True)
return posts[:limit]
Database Schema
User Table
CREATE TABLE users (
id BIGINT PRIMARY KEY,
username VARCHAR(30) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
profile_photo_url VARCHAR(500),
bio TEXT,
follower_count INT DEFAULT 0,
following_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_username ON users(username);
CREATE INDEX idx_follower_count ON users(follower_count);
Post Table
CREATE TABLE posts (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
image_url VARCHAR(500) NOT NULL,
caption TEXT,
like_count INT DEFAULT 0,
comment_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_user_id ON posts(user_id);
CREATE INDEX idx_created_at ON posts(created_at DESC);
CREATE INDEX idx_user_created ON posts(user_id, created_at DESC);
Follow Table
CREATE TABLE follows (
follower_id BIGINT NOT NULL REFERENCES users(id),
following_id BIGINT NOT NULL REFERENCES users(id),
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (follower_id, following_id)
);
CREATE INDEX idx_following ON follows(following_id);
Storage Architecture
Photo Storage at Scale
Storage Estimates
| Content | Size | Daily | Storage per Year |
|---|---|---|---|
| Photo (compressed) | 200KB | 50M | 3.6 TB |
| Thumbnail | 10KB | 50M | 180 GB |
| Video (avg) | 3MB | 5M | 5.5 TB |
| Metadata | 1KB | 55M | 20 GB |
Total daily storage growth: ~10 TB/day
After 1 year: ~3.6 PB (without considering deduplication and compression)
Caching Strategy
Cache Hierarchy
Cache Keys
# Feed cache - user's personalized feed
feed:{user_id} # List of post IDs
# Profile cache - user info
profile:{user_id} # User object
# Post cache - individual post
post:{post_id} # Post object
# Timeline cache - celebrity posts
timeline:{user_id} # List of celebrity post IDs
Search Architecture
Search Features
| Feature | Implementation |
|---|---|
| User search | Prefix matching on username |
| Hashtag search | Full-text on hashtags |
| Location search | Geospatial queries |
| Trending | Aggregation + time decay |
Scaling Challenges & Solutions
| Challenge | Solution |
|---|---|
| Photo upload bottleneck | Direct upload to S3 via presigned URLs |
| Feed computation | Hybrid push/pull, heavy caching |
| Celebrity accounts | Don't fan out; pull on read |
| Image resizing | Async processing with queues |
| Hot storage costs | Tier to cold storage after 90 days |
| Search performance | Elasticsearch for full-text search |
Key Takeaways
- Minimize writes at upload: Use presigned URLs for direct S3 upload
- Balance push vs pull: Hybrid model handles both small and large accounts
- Cache aggressively: Feed, profile, and post caches dramatically reduce DB load
- Async everything: Image processing, notifications, analytics - use queues
- Tier storage: Not all data needs to be hot; move old content to cold storage
Interview tip: When designing a social platform, always consider the "write amplification" problem. A single post might need to appear in thousands of feeds. Address this with fan-out control and caching.
Follow-Up Questions to Consider
- How would you handle video uploads (much larger files)?
- How would you implement the Explore page (algorithmic discovery)?
- How would you prevent spam and fake accounts?
- How would you design direct messaging?
- How would you handle real-time notifications?
Real Instagram trivia: Instagram moved from Ruby to Python early on for better performance. They use Django for the web framework and React Native for mobile. Feed ranking is powered by ML models that consider engagement likelihood, not just recency.