System Design Concepts

No fluff — visual, concise, interview-ready

🔌 4 · APIs & COMMUNICATION

REST API

Stateless HTTP-based — universal, cacheable. The default choice for public APIs

Anatomy of a REST URL
GET https:// api . example.com / v1 /users ? age=25 & gender=male & page=2 & limit=10 METHOD GET/POST/PUT/DELETE PROTOCOL always HTTPS SUBDOMAIN api.example.com VERSION backward compat ENDPOINT nouns, not verbs FILTERING narrow results PAGINATION page + limit ✓ Best Practices: Use nouns for resources (/users not /getUsers) Plural names (/users not /user) Cursor pagination > offset (for large datasets) Idempotency keys for POST (prevent dupes) Version in URL path
GET    /api/v1/products/123     → Fetch (cacheable, idempotent)
POST   /api/v1/orders           → Create (use idempotency key to prevent dupes)
PUT    /api/v1/orders/456       → Replace (idempotent)
DELETE /api/v1/orders/456       → Cancel (idempotent)

Pagination: ?cursor=abc123 (preferred) or ?page=2&limit=10
Caching: Cache-Control: max-age=3600 · ETag: "abc123"
Rate Limit: X-RateLimit-Limit: 1000 · X-RateLimit-Remaining: 847
Guarantees: Statelessness — no server-side session, any instance handles any request. Idempotency of GET/PUT/DELETE — safe to retry on failure. Cacheability — HTTP caching (CDN, browser) reduces load.
Real-world: Stripe API — gold standard (idempotency keys, versioning, pagination). GitHub API v3 — REST. Twilio — REST for SMS/voice.

gRPC

HTTP/2 + Protobuf10x faster than REST. 4 call types: unary, server-stream, client-stream, bidirectional

4 gRPC Streaming Modes
Unary Client Server 1 req 1 res Simple RPC 1 request → 1 response GetUser() Server Stream Client Server 1 req stream of res Server pushes N msgs 1 request → N responses ListPrices() Client Stream Client Server stream of req 1 res Client sends N msgs N requests → 1 response UploadFile() Bidirectional Client Server streams both ways Both send N msgs N requests ↔ N responses Chat()
ModeUse CaseExample
UnarySimple request/responseGetUser, CreateOrder
Server StreamServer pushes multiple resultsStock ticker, log tailing
Client StreamClient sends batchFile upload, telemetry
BidirectionalReal-time two-wayChat, multiplayer game
Guarantees: Type safety — .proto schema + codegen catches incompatibility at compile time. Deadline propagation — timeout flows through entire call chain. Multiplexing — multiple concurrent calls on single HTTP/2 connection.
Real-world: Google internal comms. Netflix/Uber microservice-to-microservice. Best for: internal APIs, 10K+ RPS, bidirectional streaming. Not for browsers (use gRPC-Web proxy).

GraphQL

Client specifies exactly which fields — single endpoint, no versioning, strongly typed schema

Guarantees: No over-fetching — client gets only requested fields. Schema contract — server validates queries against schema before execution. Introspection — clients can discover available types/fields.
Risks: N+1 problem (fix with DataLoader batching). Deep query DoS (fix with depth limiting + cost analysis). Caching hard (each query unique). GitHub API v4, Shopify Storefront use GraphQL.

Async APIs

For long-running tasks — accept immediately, process in background, client polls for result

When to use: Image/video processing, report generation, ML inference, bulk imports — any operation that takes seconds to minutes. Don't make the client wait. Accept the request, queue the work, return a status URL.
Async Request Lifecycle (Image Processing Example)
💻Client 🌐API 📋Queue ⚙️Worker 🗄️DB Phase 1: Submit Request POST /api/images save original image + create job record queue processing job 202 Accepted Location: /api/images/{id}/status Phase 2: Client Polls loop GET /api/images/{id}/status 200 OK {status: "processing"} GET /api/images/{id}/status 200 OK {status: "complete", url: "..."} Phase 3: Background Processing dequeue job status → "processing" ⚙️ process resize, compress, etc. status → "complete" save processed image URL Alternatives to Polling 🔔 Webhook POST /your-webhook 200 OK Server pushes when done · HMAC signed Best for: server-to-server (Stripe, GitHub) ⚡ WebSocket 🔗 persistent conn — server pushes {status: "complete"} Bidirectional — client can cancel, get progress % Best for: real-time UIs, live progress bars Submit (202) → get result via: Poll (200) · Webhook (POST) · WebSocket (push)

Polling

Client calls GET /status periodically
Simple — no infra needed
Add Retry-After: 5 header
Wasteful for long jobs
Use: Short jobs, browser apps

Webhook

Server POSTs result to client URL
No wasted requests — push
HMAC signature for security
Client needs public endpoint
Use: Server-to-server, Stripe

WebSocket

Persistent conn, server pushes
Instant notification — no delay
Bidirectional (cancel jobs too)
Connection management overhead
Use: Real-time UIs, live progress
Real-world: Stripe — payment intents (202 → webhook on completion). AWS S3 — multipart upload (initiate → upload parts → complete). GitHub Actions — trigger workflow (202) → poll or webhook for result. Vercel — deploy (202) → poll build status.

Idempotent APIs

Same request N times = same effect as once. Critical for payments, orders, any operation that must not duplicate

Why Retries Are Dangerous
User transfers $100. Network glitch → client retries automatically. Without idempotency, the server deducts $100 twice. The user loses $200. This is the duplicate processing problem — and it happens in production.
3 Failure Scenarios During an API Call
Client Network Server ① Request fails before reaching server ✓ Safe to retry — server never saw it Server has not started processing ② Request reaches server, processing interrupted ⚙️ partial ⚠ UNSAFE — did $100 deduct or not? ③ Server processes fully, response lost ✓ done ⚠ UNSAFE — retry = double charge! Scenarios ② and ③ need idempotency keys to make retries safe
Solution: Idempotency Key Flow (Stripe Pattern)
Client (App) Payment Server Redis (Key Store) ① Generate idempotency key (UUID) key: "abc-123-def" ② POST /transfer {$100, to: Bob} Header: Idempotency-Key: abc-123-def ③ Check: key "abc-123-def" exists? NO — first time ④ Process $100 store key + result (TTL: 24h) ✗ response lost! ⑤ Client retries (same key!) POST /transfer {$100, to: Bob} key: abc-123-def ⑥ Check: key exists? YES — already processed! ⑦ Return cached result (no re-processing) ✓ $100 deducted exactly once — user safe
Failure ScenarioRetry Safe?With Idempotency Key
Request fails before reaching server✓ SafeKey not consumed — retry works normally
Server processing interrupted⚠ Unsafe✓ Safe — key marks partial, server resumes or rejects
Response lost in transit⚠ Unsafe✓ Safe — key already processed, returns cached result
Implementation: Store keys in Redis with TTL (24h). Key = UUID, Value = {status, result}. On request: check key → if exists, return cached result → if not, process + store. Delete key after TTL. Stripe requires Idempotency-Key header on all POST endpoints.

SOAP

XML over HTTP — enterprise legacy contract style

AspectSOAP
FormatXML envelope (header + body)
ContractWSDL — strict, machine-readable
SecurityWS-Security (signed/encrypted parts)
Use todayBanking, telco, government, legacy ERP
<Envelope> Header auth, routing, WS-Security Body operation + parameters

SOAP

Protocol: Strict XML envelope (Header + Body)
Contract: WSDL (machine-generated clients)
Transport: HTTP, SMTP, JMS (transport-agnostic)
Security: WS-Security (message-level encryption)
State: Can be stateful (WS-ReliableMessaging)
Verbose: 10-100× larger payloads than REST/JSON

REST (comparison)

Protocol: HTTP methods (GET/POST/PUT/DELETE)
Contract: OpenAPI (optional, human-friendly)
Transport: HTTP only
Security: TLS + OAuth2 (transport-level)
State: Stateless by design
Lightweight: JSON, minimal overhead
When SOAP still wins: Banking/finance (WS-Security for signed transactions), Government (strict contracts, audit trails), Legacy integration (SAP, Oracle ERP). If you're building new: use REST or gRPC. If integrating with enterprise: expect SOAP.
Why heavier than REST: XML parsing, verbose envelope, mandatory schemas, stateful sessions.

CORS

Browser-enforced cross-origin policy

Browser app.com api.other.com Server 1. OPTIONS preflight 2. Access-Control-Allow-Origin 3. Real GET / POST
HeaderDirectionPurposeExample
OriginRequest →Browser sends the requesting originOrigin: https://app.com
Access-Control-Allow-Origin← ResponseServer declares which origins are allowed* or https://app.com
Access-Control-Allow-Methods← ResponseAllowed HTTP methodsGET, POST, PUT, DELETE
Access-Control-Allow-Headers← ResponseAllowed custom headersAuthorization, Content-Type
Access-Control-Max-Age← ResponseCache preflight result (seconds)86400 (24 hours)
Access-Control-Allow-Credentials← ResponseAllow cookies/auth headerstrue (cannot use with * origin)
Simple vs Preflight: Simple requests (GET/POST with standard headers) go directly — browser adds Origin, checks response. Preflight (PUT/DELETE, custom headers, non-standard Content-Type) triggers an OPTIONS request first. Server must respond with allowed methods/headers before browser sends the real request.
Common CORS mistakes: Using * with credentials (browsers reject this). Forgetting OPTIONS handler (preflight fails → request blocked). Not caching preflight (Max-Age=0 → OPTIONS on every request = 2× latency). Reflecting Origin without validation (security vulnerability — allows any site).
HeaderPurpose
Access-Control-Allow-OriginWhich origins may read the response
Access-Control-Allow-MethodsAllowed verbs (GET, POST, …)
Access-Control-Allow-HeadersCustom headers permitted
Access-Control-Max-AgePreflight cache TTL (sec)
Simple requests (GET/POST with safe headers) skip preflight. Anything else → OPTIONS first.

OpenAPI / Swagger

Machine-readable spec for REST APIs

GetFrom the spec
Interactive docsSwagger UI, Redoc
Client SDKsopenapi-generator (Java/Go/TS…)
Mock serverPrism, Stoplight
Contract testsDredd, Schemathesis
Gateway configKong, AWS API Gateway import
paths:
  /users/{id}:
    get:
      parameters:
        - in: path
          name: id
          schema: { type: string }
      responses:
        '200': { $ref: '#/components/schemas/User' }
OpenAPI Development Workflow
📝 Design Write YAML spec 🔍 Lint Spectral rules ⚙️ Generate SDKs + Docs + Mocks ✓ Validate Runtime checks Single source of truth: spec drives docs, SDKs, mocks, and runtime validation
Workflow: design → lint (Spectral) → commit YAML → CI generates SDKs + docs → server validates against same spec.

API Versioning

Three places you can put a version

StyleExampleProsCons
URL path/v1/usersCacheable, obvious, browseableURL churns on bumps
HeaderAccept: application/vnd.api.v2+jsonClean URLs, content-negotiation nativeHidden, harder to test in browser
Query param?version=2Quick to tryPollutes cache keys
Versioning Styles at a Glance
URL Path ★ recommended /v1/users /v2/users Cacheable, obvious, browseable Header Accept: app/vnd.api.v2+json Content-Negotiation Clean URLs, harder to test Query Param /users?version=2 Quick to try in browser Pollutes cache keys
Default to URL path (Stripe, GitHub do). Bump major version only on breaking changes; add fields backward-compatibly otherwise.

gRPC Streaming Modes

Four interaction patterns over one HTTP/2 connection

Unary C S Server stream (1 → N) C S stock ticker, log tail Client stream (N → 1) C S file upload, batched metrics Bidirectional C S chat, collab editor, RPC sessions
All four ride one HTTP/2 stream — multiplexed, header-compressed, binary framed.

Real-time Communication

Technologies for pushing data from server to client — choose based on direction + latency needs

Steps 0 / 0

Short Polling

Server t Browser t GET ⏱ 5s GET ⏱ 5s GET data! 200 OK Client asks every N sec — most responses empty (wasteful) 2 wasted requests before getting data

Long Polling

Server t Browser t wait (hold) data! wait data! Server holds until data ready

WebSocket

Browser Server Phase 1: HTTP Upgrade Handshake GET /chat HTTP/1.1 + Upgrade: websocket Sec-WebSocket-Key: dGhlIHNhbXBsZQ== 101 Switching Protocols Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= ✓ Full-Duplex Connection Established Phase 2: Bidirectional Real-Time Communication Server: Price update Client: User action Server: Instant response Phase 3: Keep-Alive Heartbeat (every 30s) Ping Pong ✦ Binary Frames: FIN(1b) | RSV(3b) | Opcode(4b) | Mask(1b) | Length(7-64b) | Payload ~2-14 bytes overhead vs 400+ for HTTP · Text & Binary support One persistent connection · 100K+ per server · Sub-ms latency

SSE (Server-Sent Events)

Browser Server Phase 1: Initial Connection (GET /events) GET /events (Accept: text/event-stream) HTTP 200 OK Connection stays open Phase 2: Server Streams Events data: Price update $150.25 data: Price update $150.30 data: Price update $150.28 Minutes pass... data: Price update $151.00 Connection drops (network glitch) Auto-reconnect: GET /events (Last-Event-ID: 42) Resumes from event 42 (no data loss!) Server-only push · built-in browser auto-reconnect · replay via event IDs

WebRTC (Peer-to-Peer)

👤 Peer A 👤 Peer B Signal Server setup only (SDP/ICE) 🔒 Direct P2P — encrypted media audio / video / screen share ✓ No server bandwidth for media Zoom · Google Meet · Discord voice

Webhook (Server → Your Server)

Provider (Stripe, GitHub) Your Server (/webhooks endpoint) Your App ⚡ event fires HTTP POST + JSON payload + HMAC signature header validate 200 OK (within ~5s) queue → process async If timeout / 5xx → provider retries (1s → 2s → 4s → backoff) 🔒 Validate HMAC · Use event_id for idempotency · Whitelist IPs
TechDirectionLatencyBest ForGuarantee
Short PollingClient → Server (repeated)N secDashboard refresh, legacy status checks, simple health monitorsSimple but 99% requests empty (wasteful)
Long PollingServer holds connection~secChat (pre-WS era), low-frequency notifications, JIRA-style updatesNear real-time but 1 conn/client held open
WebSocketFull duplex~msLive stock prices (Robinhood, Binance), chat (Slack), collaborative editing (Figma), gaming (Chess.com)Persistent bidirectional — server pushes instantly. ~100K conn/server.
SSEServer → Client~msAI token streaming (ChatGPT, GitHub Copilot), live news tickers, CI/CD build logsAuto-reconnect built into browser. Event ID for resuming. Text-only.
WebRTCPeer-to-peerUltra-lowVideo/audio calls (Zoom, Google Meet), screen sharing, Discord voiceDirect P2P — no server bandwidth for media. Browser-enforced encryption.
WebhookServer → Your Server~secPayment events (Stripe), CI/CD triggers (GitHub Actions), order updates (Shopify)Event-driven HTTP POST — fire-and-forget. Retries on failure. No persistent connection.
Real-world: Slack — WebSocket for messaging. Figma — WebSocket for collab editing. Zoom — WebRTC for video. Robinhood — WebSocket for live stock prices. Socket.IO — auto-fallback to polling. ChatGPT — SSE for token streaming. Stripe — Webhook for payment events.

WebSocket

Persistent, bidirectional communication. Perfect for real-time apps that need instant two-way data flow.

Quick Summary
✓ Strengths✗ Challenges⚙ Best Practices
Bidirectional instant (~ms)
100K+ connections/server
Binary + text
Persistent connection
Stateful (track clients)
Needs reconnect logic
Load balance complexity
No auto-replay on disconnect
Use wss:// (TLS)
Exponential backoff
Redis Pub/Sub for scaling
Validate messages server-side
Scaling with Redis Pub/Sub
Problem: Multiple servers → events isolated per server → clients on Server B miss updates from Server A.
Solution: All servers subscribe to Redis channels → event fans out to ALL servers → all clients see updates instantly.
WS Server Cluster Server 1 Client Client Client Server 2 Client Client Client Server 3 Client Client Client Redis Pub/Sub Event Broker Event Producer (Service / Worker) broadcast event to all servers Push to local clients ✦ Multi-Server Flow 1. Client on Server A sends message 2. Server A publishes to Redis channel 3. Redis fans-out to ALL subscribed servers 4. Servers B & C push to their clients instantly Result: All clients see the message regardless of which server they're connected to
Load Balancing: Sticky sessions (pin client to server) vs connection migration (client reconnects) vs shared Redis store (state survives server change).
Real-world: Slack (millions of connections), Figma (collaborative editing), Binance (market data). Typical: 100K–500K connections/server, sub-ms latency.

Server-Sent Events (SSE) — Deep Dive

One-way server-to-client push over HTTP. Built-in auto-reconnect, event IDs for replay, automatic browser handling.

Middle ground: Polling is wasteful (99% empty requests) → WebSockets overkill if unidirectional → SSE perfect for server-only push with auto-reconnect built-in.
SSE Event Stream Format
✦ SSE Stream Format (Text-based) event: priceUpdate id: 42 retry: 5000 data: {"symbol": "AAPL", "price": 150.25} ← Custom event type ← Unique ID for replay ← Reconnect delay (ms) ← Actual payload (JSON) (blank line ends event) event: notification id: 43 data: {"message": "Market closing in 5 minutes"} 🔗 Single HTTP/1.1 connection stays open — just HTTP headers, no protocol upgrade needed
Auto-Reconnection with Event Replay
Browser Auto-Reconnection with Event Replay ① Connected, receiving events event id=1,2,3... ② Connection drops (network error) ❌ lost! ③ Browser auto-reconnects (3s default) GET /events + Last-Event-ID: 3 ④ Server replays missed events event 4,5,6... ✓ Zero data loss — browser handles reconnection + server replays missed events using Last-Event-ID
Quick Summary
✓ Strengths✗ Limitations⚙ Common Fixes
Auto-reconnect built-in
Event replay (Last-Event-ID)
Standard HTTP
10K+ connections/server
Server-only (unidirectional)
Text-only (no binary)
6 conn/domain (HTTP/1.1)
Proxy buffering issues
No IE support
Proxy buffering: proxy_buffering off
Connection limits: Use HTTP/2
Idle timeout: Heartbeat every 30s
Storms: retry: 5000ms
Use Cases
UseExample
Live prices / market dataRobinhood, Finnhub, Binance
AI token streamingChatGPT, GitHub Copilot
Build logs, CI/CD outputGitHub Actions, Jenkins, CircleCI
Live notificationsGmail, Slack, email
Performance: 10K–100K connections/server, 2–5KB memory/connection, ~10 bytes overhead vs 400+ for HTTP.

WebSocket vs SSE — Design Choices

When to pick each technology based on application requirements

Feature Comparison Matrix
FeatureWebSocketSSELong Polling
CommunicationBidirectional ✓Server onlyClient asks repeatedly
Protocol Overhead2-14 bytes/msg10-50 bytes/msg400+ bytes/msg
Browser SupportAll modern (IE10+)All modern (no IE)Universal
Binary Support✓ Yes✗ Text only✓ Yes
Auto-ReconnectManual requiredBuilt-in browserBuilt-in (polling loop)
Message ReplayManual requiredBuilt-in (Last-Event-ID)No standard
HTTP/2 MultiplexingNo (separate connection)✓ Yes (single connection)✓ Yes (http requests)
StatefulVery (per-client state)Mostly (stream state)Stateless
Proxy FriendlySometimes blocked✓ Standard HTTP✓ Standard HTTP
Connections/Server100K–500K10K–100K1K–10K
Latency~1-50ms~100-200ms~0.5-5s
Memory/Connection5-20KB2-5KBMinimal
Decision Matrix: Which to Use?

✓ Use WebSocket When:

Client ↔ Server messaging needed
High frequency updates (100s/sec)
Low latency critical (<10ms)
Binary data needed
Multiplayer games, trading apps
Real-time collaboration (Figma)
Chat apps (Slack, Discord)
Live stock/crypto prices

✓ Use SSE When:

Server → Client only (no client send)
Auto-reconnect needed (free feature)
Event replay on disconnect
Simple browser API (EventSource)
AI token streaming (ChatGPT)
Build logs (GitHub Actions)
Live notifications / dashboards
Text/JSON data only

✓ Use Long Polling When:

Serverless environment (timeouts)
IE support required
WebSocket blocked by proxy/firewall
Simple infrequent updates OK
Existing polling infrastructure
Cost sensitive (minimal server state)
Doesn't need real-time urgency
Stateless is a hard requirement
Hybrid Approaches
SSE + HTTP POST: Use SSE for server-to-client push, regular POST for client commands (e.g., Twitch chat, YouTube comments)
WebSocket + REST fallback: Try WS first, fallback to long polling if blocked (Socket.IO does this)
WebSocket + Redis: For scale — WS per client, Redis Pub/Sub for multi-server broadcast (Slack, Figma pattern)
WebSocket + Kafka: For event sourcing — all events stored in Kafka, clients subscribe via WS (high-scale trading systems)
Common Failure Scenarios
ScenarioWebSocket ImpactSSE Impact
Network disconnectionConnection drops, client must reconnect + resync stateBrowser auto-reconnects, replays events via Last-Event-ID ✓
Server restartAll clients lose connection, must reconnectClients reconnect, get missed events if stored ✓
Proxy timeouts (>60s idle)Connection dies, must detect + reconnectHeartbeat prevents timeout ✓
High load spike100K+ connections: high memory, CPU consumedFewer connections, easier to scale with multi-server ✓
Message orderingNot guaranteed across reconnectsEvent IDs allow ordering verification ✓
Browser refreshConnection lost, full state resync neededCan optionally restore via session storage + server replay ✓
Key Insight: SSE excels at resilience (auto-reconnect, event replay), WebSocket excels at latency & bidirectionality. Most real-time apps benefit from a hybrid approach: SSE for notifications, WebSocket for interactive features.