System Design Concepts

No fluff — visual, concise, interview-ready

1 · FOUNDATIONS

System Design Framework

The structured approach to tackle any design interview (45-60 min)

1. REQUIREMENTS (5 min)  — Clarify scope. Ask questions. Define FR + NFR.
2. ESTIMATION (5 min)    — Users, QPS, storage, bandwidth (back-of-envelope).
3. HIGH-LEVEL DESIGN (10 min) — Draw: clients → LB → services → DB → cache.
4. DETAILED DESIGN (20 min)   — Deep dive 2-3 critical components.
5. TRADE-OFFS (5 min)    — Alternatives, bottlenecks, failure modes.
6. SCALING (5 min)       — How to handle 10x, 100x growth.
Interview Walkthrough (45-min example — "Design Twitter"): Requirements (5 min): "Users post tweets, follow others, view home timeline. NFR: 500M users, 10K tweets/sec, p99 <200ms, 99.99% availability." Estimation (5 min): "500M users × 2 tweets/day = 1B tweets/day ÷ 100K sec = ~10K writes/sec. Read-heavy: 100:1 read/write. Storage: 1B × 200B = 200GB/day." High-Level (10 min): "Client → LB → Tweet Service → DB + Cache. Timeline Service reads from fan-out cache. Media → S3 + CDN." Deep Dive (20 min): "Fan-out on write (push to follower timelines in Redis) vs fan-out on read (pull at read time). Hybrid: push for normal users, pull for celebrities (>1M followers). Sharding tweets by user_id. Timeline cache in Redis sorted sets." Trade-offs (5 min): "Push = fast reads, expensive writes for celebrities. Pull = cheap writes, slow reads. Hybrid balances both." Scaling (5 min): "Shard DB by user_id, Redis Cluster for timelines, CDN for media, Kafka for async fan-out."

Functional vs Non-Functional Requirements

What the system does vs how well it does it — with visual sketches showing the concept

Functional (What)Non-Functional (How Well)
User can view a homepage with posts, feed, and navigationLatency — homepage loads in <1.5 seconds (P99)
System stores customer data (profile, preferences, history)Security — data encrypted at rest (AES-256) and in transit (TLS 1.3)
Users can log into their accounts (auth, sessions)Scalability — support 5,000 concurrent logged-in users per server
System is always available to customers 24/7Availability99.7% uptime (26 hours max downtime/year)
Customers can access on mobile phones and tabletsCompatibility — works on iOS 14+ and Android 10+ browsers
User can search products by name, category, filtersThroughput — handle 10K searches/sec with <100ms response
User can make payments (checkout, refunds)Consistencystrong consistency for payments (no double-charge)
User can chat with AI assistant (ask questions, get answers)Latency (TTFT) — first token in <500ms, stream at 50+ tokens/sec
System answers from company docs (RAG knowledge base)Accuracy<5% hallucination rate, grounded in retrieved sources
User can generate images from text (AI image generation)GPU Throughput — generate image in <10s, serve 1K concurrent users per GPU
Core Challenges: Too many users → horizontal scaling, LB, caching. Too much data → sharding, tiered storage. Low latency → caching, CDN, geo-distribution. High availability → replication, multi-region, graceful degradation.
Interview tip: Always pair each FR with its NFR constraint. "Users can post tweets" → "at 10K tweets/sec with P99 <200ms." This shows you think about both what the system does and how well it must do it.

NFR Metrics & SLOs

How non-functional requirements are measured — pick targets, then design to them

NFRWhat it meansMetricTypical TargetLevers
LatencyTime per request (user-perceived speed)p50 / p95 / p99 / p99.9 msp99 < 200 ms (web) · < 50 ms (internal RPC)Cache, CDN, async, geo-PoP, fewer hops
ThroughputWork served per unit timeRPS / QPS / TPS / msgs·s⁻¹10K–1M RPS per serviceHorizontal scale, batching, sharding
Availability% of time system is up & serving"Nines" uptime99.9 % (8.7 h/yr) · 99.99 % (52 min) · 99.999 % (5 min)Redundancy, multi-AZ/region, failover, health checks
Durability% chance data survives (no loss, ever)Nines of durability11×9 (S3) · 99.999999999 %Replication (3×), erasure coding, cross-region backup, WAL
ReliabilityCorrectness over time (MTBF / MTTR)Error rate, MTBF, MTTRError budget < 0.1 % · MTTR < 5 minRetries, circuit breakers, idempotency, runbooks
ScalabilityAbility to grow with load (linear, ideally)Cost / RPS, scale factorLinear up to 10×–100×Stateless services, sharding, autoscale
BandwidthData moved over network per secondMB/s · Gbps ingress/egressStay within VPC/CDN egress budgetCompression, CDN, delta sync, batching
StorageHow much data is kept & for how longGB / TB / PB, retentionRight-size; tier hot→warm→cold (S3 IA/Glacier)TTL, compression, tiering, dedup
ConsistencyHow fresh / agreed data is across replicasStrong / RYW / Eventual, replica lagStrong for $ · eventual for likesQuorum (R+W>N), Raft/Paxos, CRDTs
Security / PrivacyAuthN/AuthZ, encryption, auditCVE count, % encrypted, audit pass0 critical CVEs · TLS everywhere · PII encrypted at restOAuth2, mTLS, KMS, WAF, RBAC
Cost$ per request / GB / user$ / 1M req, $/GB-monthWithin unit-economics envelopeSpot, reserved, autoscale-down, caching
Why percentiles, not averages, for latency: averages hide tail pain. With 100 ms avg you can still have 5 % users at 2 s. p50 = median (typical user), p95 / p99 = the bad days, p99.9 = the angry tweets. SLO is usually written on p99 for user-facing, p99.9 for infra.
"Nines" cheatsheet — downtime per year: 99 % = 3.65 days · 99.9 % = 8.77 h · 99.99 % = 52.6 min · 99.999 % = 5.26 min · 99.9999 % = 31.5 s. Each extra 9 ≈ 10× cost & complexity (more replicas, multi-region, chaos testing). Durability uses the same scale but for data loss probability — S3 advertises 11×9 = 1 object lost per 100 B per year.
SLI · SLO · SLA: SLI = the measurement ("p99 latency over 5-min window"). SLO = your internal target ("p99 < 200 ms, 99.9 % of the time"). SLA = the contractual promise to customers (with refund/credit if missed). Always: SLA < SLO < actual performance — leave headroom for the error budget.
Trade-offs (CAP/PACELC reminder): you cannot maximize all NFRs at once. More 9's of availability → weaker consistency or higher cost. Lower latency → larger cache footprint / more PoPs / weaker durability (e.g. async fsync). State the SLO numerically in interviews — "p99 < 200 ms, 99.99 % availability, 11×9 durability" — then derive the architecture from it.

Scaling Basics

Vertical vs Horizontal — the fundamental scaling decision

Vertical Scaling (Scale Up) Bigger machine — more CPU, RAM, IOPS 4 CPU 16 GB Small 64 CPU 512 GB NVMe SSD BIG 💪 ⚠️ Hardware ceiling ✓ Simple ✓ No code changes ✓ No distributed complexity ✗ Has a ceiling ✗ Single point of failure ✗ Expensive at top tier Cost: $$ → $$$$ (exponential at top) Horizontal Scaling (Scale Out) More machines behind a load balancer Server 4 CPU Srv 1 Srv 2 Srv 3 Srv 4 Srv 5 + more No ceiling — add nodes as needed ✓ Unlimited scale ✓ Fault tolerant (node dies → others serve) ✗ Distributed complexity ✗ Consistency challenges
Vertical (Scale Up)Horizontal (Scale Out)
Bigger machine (more CPU/RAM/IOPS)More machines behind a load balancer
Simple — no code changesUnlimited — add nodes as needed
Has ceiling — biggest machine has limitsComplex — distributed state, consistency
Single point of failureFault tolerant (node dies → others serve)
Guarantee: Horizontal scaling provides linear throughput growth — doubling nodes roughly doubles capacity, because each node handles an independent subset of traffic.

Stateless vs Stateful

Stateless — no memory between requests, every request is a stranger · Stateful — server remembers past interactions

STATELESS Any server handles any request User A User B User C Load Balancer Round Robin Server 1 Server 2 Server 3 Redis / DB Shared session store ✓ Server dies → no state lost ✓ Scale by adding servers STATEFUL Each user pinned to specific server User A User B User C Load Balancer Sticky / IP Hash Server 1 A's session 🔒 Server 2 B's session 🔒 Server 3 C's session 🔒 💥 Server 2 dies User B's session LOST ✗ Server dies → state lost ✗ Can't freely move users
AspectStatelessStateful
MemoryNo memory — every request self-containedRemembers — keeps session, knows past actions
ScalingJust add servers — any instance handles any requestSticky sessions + coordination — same user → same server
FailureAny server takes over — no state lost on crashState lost on crash — needs replication or persistence
State lives inRedis / DB / JWT — externalized, not on serverServer memory — tied to specific instance
ExamplesREST APIs · HTTP services · Auth via JWT · App servers with RedisWebSocket (chat) · Game servers · DB connections · Login sessions in memory
Load BalancerRound Robin — any server, no special routingIP Hash / Cookie — must route to same server
Stateless → Stateful conversion: Move session to Redis (shared store) → all servers read same session → service becomes stateless. Move auth to JWT (token carries user context) → no server-side session needed. This is how companies horizontally scale without sticky sessions.
Real-world: Netflix — stateless API servers + session in Redis. WhatsApp — stateful WebSocket (connection pinned to server) + Redis for presence. Kubernetes Pods — stateless by design, killed/restarted anytime. Game servers — stateful, entire match state in memory → hard to migrate mid-game.
One-liner: Stateless = no memory, infinite scale (REST API) · Stateful = memory, scaling pain (game server, sessions) → best practice: externalize state, make services stateless.

Serialization & Deserialization

Converting objects in memory to bytes/JSON/Binary (serialize) and back (deserialize) for storage/transmission

Serialization Deserialization Object Stream of Bytes 📄 File 🗄️ Database 💾 Memory 🌐 Network Stream of Bytes Object
FormatTypeSizeSpeedUse Case
JSONTextLargeSlowREST APIs, config files, human-readable
Protocol BuffersBinarySmallFastgRPC, internal services (Google)
AvroBinarySmallFastKafka events, Hadoop (schema in header)
MessagePackBinarySmallFastRedis, embedded systems
XMLTextVery largeSlowSOAP, legacy enterprise
Guarantee: Binary formats (Protobuf, Avro) provide schema evolution with backward/forward compatibility — old consumers can read new messages and vice versa, because fields are identified by number not name. This is critical for zero-downtime deployments.
Schema Registry (Confluent) — central store for Avro/Protobuf schemas. Enforces compatibility rules. Used with Kafka to prevent breaking changes.
Why needed: Network — send objects over HTTP/TCP/Kafka (network only understands bytes). Persistence — store to DB/Redis/disk. Cross-language — Java → Python via Protobuf (no language barrier). Caching — serialize to Redis, deserialize on hit. Distributed Computing / RPC — gRPC serializes to Protobuf binary — faster & smaller than JSON for internal service calls.