How does a URL shortener generate globally unique short codes at 1000+ URLs/sec, ensuring no collisions across distributed nodes while keeping codes short (7 chars) and supporting custom aliases?
Core challenge: 7 chars (base62) = 627 = 3.5 trillion possible codes. But generating them without collisions across multiple servers · without a central coordinator bottleneck · requires careful ID generation strategy.
1000+
URLs shortened / sec
100K+
redirects / sec
7 chars
base62 (3.5T codes)
<10ms
redirect P99
Architecture
ID generation: Pre-allocate counter ranges from a central sequence (ZooKeeper/DB). Each node gets a range (e.g., 1M IDs). Encode counter as base62 ? 7 chars. No coordination during normal operation · only when range exhausted.
Redirect path: GET /abc1234 ? Redis lookup (1ms) ? if miss, DB lookup ? 301 redirect. 301 (permanent) for SEO, 302 (temporary) if you need analytics on every click.
Anti-patterns:Random hash + check-for-collision · race conditions under load. Auto-increment single DB · bottleneck. No cache for redirects · DB melts at 100K req/sec.
Scale Estimation
Step
Derivation
Result
Design Impact
1
New URLs: 1000/sec · 86400
~86M URLs/day
Write-light · DB handles easily
2
Redirects: 100:1 read/write ratio
100K redirects/sec
Read-heavy · must cache aggressively
3
Storage: 86M/day · 365 · 5 years · 500 bytes
~7.8 TB total
Fits in single sharded DB cluster
4
Cache: top 20% URLs = 80% traffic (Pareto)
~170 GB Redis
Hot URLs cached, 99% hit rate
5
ID space: 7 chars base62 = 627
3.5 trillion codes
Won't exhaust for centuries at 86M/day
6
Bandwidth: 100K req/sec · 500 bytes response
~50 MB/sec outbound
Trivial for modern infrastructure
Resilience & Edge Cases
Failure
Impact
Recovery
Cache miss storm (viral link)
1M req/sec to DB for same key
Request coalescing: only 1 DB query, others wait for result. Or: cache-aside with SETNX lock.
Counter range exhausted
Node can't generate new IDs
Pre-fetch next range before current exhausts (watermark at 80%). Fallback: request new range from coordinator.
Custom alias collision
Two users want same alias
UNIQUE constraint in DB. First writer wins. Return 409 Conflict to second.
Malicious URLs
Phishing/malware behind short link
Scan URLs against Google Safe Browsing API on creation. Flag + interstitial warning page.
Link expiry
Expired link still cached
TTL on cache entries. Check expiry on redirect. Return 410 Gone for expired links.
Interview Cheat Sheet
1.Base62 encoding of counter · 7 chars = 3.5T unique codes, collision-free 2.Pre-allocated counter ranges · no coordination during normal operation 3.Read-heavy (100:1) · cache redirects aggressively (Redis, 99% hit rate) 4.301 vs 302 · permanent (SEO, browser caches) vs temporary (track every click) 5.Custom aliases · check uniqueness, store separately, higher priority in lookup 6.Analytics async · log click event to Kafka, don't block redirect path