4. APIs & Communication — System Design Concepts

🔌 4 · APIs & COMMUNICATION

REST API

Stateless HTTP-based — universal, cacheable. The default choice for public APIs

▸ Anatomy of a REST URL

GET    /api/v1/products/123     → Fetch (cacheable, idempotent)
POST   /api/v1/orders           → Create (use idempotency key to prevent dupes)
PUT    /api/v1/orders/456       → Replace (idempotent)
DELETE /api/v1/orders/456       → Cancel (idempotent)

Pagination: ?cursor=abc123 (preferred) or ?page=2&limit=10
Caching: Cache-Control: max-age=3600 · ETag: "abc123"
Rate Limit: X-RateLimit-Limit: 1000 · X-RateLimit-Remaining: 847

Guarantees: Statelessness — no server-side session, any instance handles any request. Idempotency of GET/PUT/DELETE — safe to retry on failure. Cacheability — HTTP caching (CDN, browser) reduces load.

Real-world: Stripe API — gold standard (idempotency keys, versioning, pagination). GitHub API v3 — REST. Twilio — REST for SMS/voice.

gRPC

HTTP/2 + Protobuf — 10x faster than REST. 4 call types: unary, server-stream, client-stream, bidirectional

▸ 4 gRPC Streaming Modes

Mode	Use Case	Example
Unary	Simple request/response	GetUser, CreateOrder
Server Stream	Server pushes multiple results	Stock ticker, log tailing
Client Stream	Client sends batch	File upload, telemetry
Bidirectional	Real-time two-way	Chat, multiplayer game

Guarantees: Type safety — .proto schema + codegen catches incompatibility at compile time. Deadline propagation — timeout flows through entire call chain. Multiplexing — multiple concurrent calls on single HTTP/2 connection.

Real-world: Google internal comms. Netflix/Uber microservice-to-microservice. Best for: internal APIs, 10K+ RPS, bidirectional streaming. Not for browsers (use gRPC-Web proxy).

GraphQL

Client specifies exactly which fields — single endpoint, no versioning, strongly typed schema

Guarantees: No over-fetching — client gets only requested fields. Schema contract — server validates queries against schema before execution. Introspection — clients can discover available types/fields.

Risks: N+1 problem (fix with DataLoader batching). Deep query DoS (fix with depth limiting + cost analysis). Caching hard (each query unique). GitHub API v4, Shopify Storefront use GraphQL.

Async APIs

For long-running tasks — accept immediately, process in background, client polls for result

When to use: Image/video processing, report generation, ML inference, bulk imports — any operation that takes seconds to minutes. Don't make the client wait. Accept the request, queue the work, return a status URL.

▸ Async Request Lifecycle (Image Processing Example)

Polling

▸ Client calls GET /status periodically
▸ Simple — no infra needed
▸ Add Retry-After: 5 header
▸ Wasteful for long jobs
Use: Short jobs, browser apps

Webhook

▸ Server POSTs result to client URL
▸ No wasted requests — push
▸ HMAC signature for security
▸ Client needs public endpoint
Use: Server-to-server, Stripe

WebSocket

▸ Persistent conn, server pushes
▸ Instant notification — no delay
▸ Bidirectional (cancel jobs too)
▸ Connection management overhead
Use: Real-time UIs, live progress

Real-world: Stripe — payment intents (202 → webhook on completion). AWS S3 — multipart upload (initiate → upload parts → complete). GitHub Actions — trigger workflow (202) → poll or webhook for result. Vercel — deploy (202) → poll build status.

Idempotent APIs

Same request N times = same effect as once. Critical for payments, orders, any operation that must not duplicate

▸ Why Retries Are Dangerous

User transfers $100. Network glitch → client retries automatically. Without idempotency, the server deducts $100 twice. The user loses $200. This is the duplicate processing problem — and it happens in production.

▸ 3 Failure Scenarios During an API Call

▸ Solution: Idempotency Key Flow (Stripe Pattern — Atomic Lock)

⚠ Why SETNX (atomic) is critical: A naive if(!exists(key)) { process(); save(key); } has a race condition — two concurrent retries both see "key not found" and both process the payment. SETNX (SET if Not eXists) is a single atomic Redis operation that checks AND sets in one step. Only the first request wins the lock. All subsequent requests see "key exists" and get the cached response.

Failure Scenario	Retry Safe?	With Idempotency Key
Request fails before reaching server	✓ Safe	Key not consumed — retry works normally
Server processing interrupted	⚠ Unsafe	✓ Safe — key marks partial, server resumes or rejects
Response lost in transit	⚠ Unsafe	✓ Safe — key already processed, returns cached result

Implementation (concurrency-safe): Use Redis SETNX (SET if Not eXists) — a single atomic operation that checks AND acquires the lock. Key = UUID, Value = {status: "processing"}. After payment completes, update value to {status: "done", response: ...}. On retry: SETNX fails (key exists) → read stored response → return immediately. TTL 24h auto-cleans keys. Stripe requires Idempotency-Key header on all POST endpoints.

Key guarantees: Exactly-once side effects (payment charges once). No double charges even under concurrent retries. Concurrency-safe via atomic lock (not check-then-act). Network-retry resilient — clients can safely retry on any timeout.

SOAP

XML over HTTP — enterprise legacy contract style

Aspect	SOAP
Format	XML envelope (header + body)
Contract	WSDL — strict, machine-readable
Security	WS-Security (signed/encrypted parts)
Use today	Banking, telco, government, legacy ERP

SOAP

Protocol: Strict XML envelope (Header + Body)
Contract: WSDL (machine-generated clients)
Transport: HTTP, SMTP, JMS (transport-agnostic)
Security: WS-Security (message-level encryption)
State: Can be stateful (WS-ReliableMessaging)
Verbose: 10-100× larger payloads than REST/JSON

REST (comparison)

Protocol: HTTP methods (GET/POST/PUT/DELETE)
Contract: OpenAPI (optional, human-friendly)
Transport: HTTP only
Security: TLS + OAuth2 (transport-level)
State: Stateless by design
Lightweight: JSON, minimal overhead

When SOAP still wins: Banking/finance (WS-Security for signed transactions), Government (strict contracts, audit trails), Legacy integration (SAP, Oracle ERP). If you're building new: use REST or gRPC. If integrating with enterprise: expect SOAP.

Why heavier than REST: XML parsing, verbose envelope, mandatory schemas, stateful sessions.

CORS

Browser-enforced cross-origin policy

Header	Direction	Purpose	Example
Origin	Request →	Browser sends the requesting origin	`Origin: https://app.com`
Access-Control-Allow-Origin	← Response	Server declares which origins are allowed	`*` or `https://app.com`
Access-Control-Allow-Methods	← Response	Allowed HTTP methods	`GET, POST, PUT, DELETE`
Access-Control-Allow-Headers	← Response	Allowed custom headers	`Authorization, Content-Type`
Access-Control-Max-Age	← Response	Cache preflight result (seconds)	`86400` (24 hours)
Access-Control-Allow-Credentials	← Response	Allow cookies/auth headers	`true` (cannot use with `*` origin)

Simple vs Preflight: Simple requests (GET/POST with standard headers) go directly — browser adds Origin, checks response. Preflight (PUT/DELETE, custom headers, non-standard Content-Type) triggers an OPTIONS request first. Server must respond with allowed methods/headers before browser sends the real request.

Common CORS mistakes: Using * with credentials (browsers reject this). Forgetting OPTIONS handler (preflight fails → request blocked). Not caching preflight (Max-Age=0 → OPTIONS on every request = 2× latency). Reflecting Origin without validation (security vulnerability — allows any site).

Header	Purpose
`Access-Control-Allow-Origin`	Which origins may read the response
`Access-Control-Allow-Methods`	Allowed verbs (GET, POST, …)
`Access-Control-Allow-Headers`	Custom headers permitted
`Access-Control-Max-Age`	Preflight cache TTL (sec)

Simple requests (GET/POST with safe headers) skip preflight. Anything else → OPTIONS first.

OpenAPI / Swagger

Machine-readable spec for REST APIs

Get	From the spec
Interactive docs	Swagger UI, Redoc
Client SDKs	openapi-generator (Java/Go/TS…)
Mock server	Prism, Stoplight
Contract tests	Dredd, Schemathesis
Gateway config	Kong, AWS API Gateway import

paths:
  /users/{id}:
    get:
      parameters:
        - in: path
          name: id
          schema: { type: string }
      responses:
        '200': { $ref: '#/components/schemas/User' }

▸ OpenAPI Development Workflow

Workflow: design → lint (Spectral) → commit YAML → CI generates SDKs + docs → server validates against same spec.

API Versioning

Three places you can put a version

Style	Example	Pros	Cons
URL path	`/v1/users`	Cacheable, obvious, browseable	URL churns on bumps
Header	`Accept: application/vnd.api.v2+json`	Clean URLs, content-negotiation native	Hidden, harder to test in browser
Query param	`?version=2`	Quick to try	Pollutes cache keys

▸ Versioning Styles at a Glance

Default to URL path (Stripe, GitHub do). Bump major version only on breaking changes; add fields backward-compatibly otherwise.

Pagination — Cursor vs Offset

Keep responses small and stable

	Offset	Cursor
Cost	O(skip + limit) on DB	O(limit)
Stability	Items shift on insert/delete	Stable
Jump-to-page	Yes	No (sequential only)
Use	Admin tables, small sets	Feeds, infinite scroll, deep lists

gRPC Streaming Modes

Four interaction patterns over one HTTP/2 connection

All four ride one HTTP/2 stream — multiplexed, header-compressed, binary framed.

Real-time Communication

Technologies for pushing data from server to client — choose based on direction + latency needs

Short Polling

Long Polling

WebSocket

SSE (Server-Sent Events)

WebRTC (Peer-to-Peer)

Webhook (Server → Your Server)

Tech	Direction	Latency	Best For	Guarantee
Short Polling	Client → Server (repeated)	N sec	Dashboard refresh, legacy status checks, simple health monitors	Simple but 99% requests empty (wasteful)
Long Polling	Server holds connection	~sec	Chat (pre-WS era), low-frequency notifications, JIRA-style updates	Near real-time but 1 conn/client held open
WebSocket	Full duplex	~ms	Live stock prices (Robinhood, Binance), chat (Slack), collaborative editing (Figma), gaming (Chess.com)	Persistent bidirectional — server pushes instantly. ~100K conn/server.
SSE	Server → Client	~ms	AI token streaming (ChatGPT, GitHub Copilot), live news tickers, CI/CD build logs	Auto-reconnect built into browser. Event ID for resuming. Text-only.
WebRTC	Peer-to-peer	Ultra-low	Video/audio calls (Zoom, Google Meet), screen sharing, Discord voice	Direct P2P — no server bandwidth for media. Browser-enforced encryption.
Webhook	Server → Your Server	~sec	Payment events (Stripe), CI/CD triggers (GitHub Actions), order updates (Shopify)	Event-driven HTTP POST — fire-and-forget. Retries on failure. No persistent connection.

Real-world: Slack — WebSocket for messaging. Figma — WebSocket for collab editing. Zoom — WebRTC for video. Robinhood — WebSocket for live stock prices. Socket.IO — auto-fallback to polling. ChatGPT — SSE for token streaming. Stripe — Webhook for payment events.

WebSocket

Persistent, bidirectional communication. Perfect for real-time apps that need instant two-way data flow.

▸ Quick Summary

✓ Strengths	✗ Challenges	⚙ Best Practices
Bidirectional instant (~ms) 100K+ connections/server Binary + text Persistent connection	Stateful (track clients) Needs reconnect logic Load balance complexity No auto-replay on disconnect	Use wss:// (TLS) Exponential backoff Redis Pub/Sub for scaling Validate messages server-side

▸ Scaling with Redis Pub/Sub

Problem: Multiple servers → events isolated per server → clients on Server B miss updates from Server A.
Solution: All servers subscribe to Redis channels → event fans out to ALL servers → all clients see updates instantly.

Load Balancing: Sticky sessions (pin client to server) vs connection migration (client reconnects) vs shared Redis store (state survives server change).

Real-world: Slack (millions of connections), Figma (collaborative editing), Binance (market data). Typical: 100K–500K connections/server, sub-ms latency.

Server-Sent Events (SSE) — Deep Dive

One-way server-to-client push over HTTP. Built-in auto-reconnect, event IDs for replay, automatic browser handling.

Middle ground: Polling is wasteful (99% empty requests) → WebSockets overkill if unidirectional → SSE perfect for server-only push with auto-reconnect built-in.

▸ SSE Event Stream Format

▸ Auto-Reconnection with Event Replay

▸ Quick Summary

✓ Strengths	✗ Limitations	⚙ Common Fixes
Auto-reconnect built-in Event replay (Last-Event-ID) Standard HTTP 10K+ connections/server	Server-only (unidirectional) Text-only (no binary) 6 conn/domain (HTTP/1.1) Proxy buffering issues No IE support	Proxy buffering: proxy_buffering off Connection limits: Use HTTP/2 Idle timeout: Heartbeat every 30s Storms: retry: 5000ms

▸ Use Cases

Use	Example
Live prices / market data	Robinhood, Finnhub, Binance
AI token streaming	ChatGPT, GitHub Copilot
Build logs, CI/CD output	GitHub Actions, Jenkins, CircleCI
Live notifications	Gmail, Slack, email

Performance: 10K–100K connections/server, 2–5KB memory/connection, ~10 bytes overhead vs 400+ for HTTP.

WebSocket vs SSE — Design Choices

When to pick each technology based on application requirements

▸ Feature Comparison Matrix

Feature	WebSocket	SSE	Long Polling
Communication	Bidirectional ✓	Server only	Client asks repeatedly
Protocol Overhead	2-14 bytes/msg	10-50 bytes/msg	400+ bytes/msg
Browser Support	All modern (IE10+)	All modern (no IE)	Universal
Binary Support	✓ Yes	✗ Text only	✓ Yes
Auto-Reconnect	Manual required	Built-in browser	Built-in (polling loop)
Message Replay	Manual required	Built-in (Last-Event-ID)	No standard
HTTP/2 Multiplexing	No (separate connection)	✓ Yes (single connection)	✓ Yes (http requests)
Stateful	Very (per-client state)	Mostly (stream state)	Stateless
Proxy Friendly	Sometimes blocked	✓ Standard HTTP	✓ Standard HTTP
Connections/Server	100K–500K	10K–100K	1K–10K
Latency	~1-50ms	~100-200ms	~0.5-5s
Memory/Connection	5-20KB	2-5KB	Minimal

▸ Decision Matrix: Which to Use?

✓ Use WebSocket When:

▸ Client ↔ Server messaging needed
▸ High frequency updates (100s/sec)
▸ Low latency critical (<10ms)
▸ Binary data needed
▸ Multiplayer games, trading apps
▸ Real-time collaboration (Figma)
▸ Chat apps (Slack, Discord)
▸ Live stock/crypto prices

✓ Use SSE When:

▸ Server → Client only (no client send)
▸ Auto-reconnect needed (free feature)
▸ Event replay on disconnect
▸ Simple browser API (EventSource)
▸ AI token streaming (ChatGPT)
▸ Build logs (GitHub Actions)
▸ Live notifications / dashboards
▸ Text/JSON data only

✓ Use Long Polling When:

▸ Serverless environment (timeouts)
▸ IE support required
▸ WebSocket blocked by proxy/firewall
▸ Simple infrequent updates OK
▸ Existing polling infrastructure
▸ Cost sensitive (minimal server state)
▸ Doesn't need real-time urgency
▸ Stateless is a hard requirement

▸ Hybrid Approaches

SSE + HTTP POST: Use SSE for server-to-client push, regular POST for client commands (e.g., Twitch chat, YouTube comments)
WebSocket + REST fallback: Try WS first, fallback to long polling if blocked (Socket.IO does this)
WebSocket + Redis: For scale — WS per client, Redis Pub/Sub for multi-server broadcast (Slack, Figma pattern)
WebSocket + Kafka: For event sourcing — all events stored in Kafka, clients subscribe via WS (high-scale trading systems)

▸ Common Failure Scenarios

Scenario	WebSocket Impact	SSE Impact
Network disconnection	Connection drops, client must reconnect + resync state	Browser auto-reconnects, replays events via Last-Event-ID ✓
Server restart	All clients lose connection, must reconnect	Clients reconnect, get missed events if stored ✓
Proxy timeouts (>60s idle)	Connection dies, must detect + reconnect	Heartbeat prevents timeout ✓
High load spike	100K+ connections: high memory, CPU consumed	Fewer connections, easier to scale with multi-server ✓
Message ordering	Not guaranteed across reconnects	Event IDs allow ordering verification ✓
Browser refresh	Connection lost, full state resync needed	Can optionally restore via session storage + server replay ✓

Key Insight: SSE excels at resilience (auto-reconnect, event replay), WebSocket excels at latency & bidirectionality. Most real-time apps benefit from a hybrid approach: SSE for notifications, WebSocket for interactive features.