Store frequently accessed data in a fast layer to improve
latency (serve from cache → faster),
throughput (reduce backend load), and
cost efficiency (fewer DB calls),
with trade-offs in
consistency (stale data risk),
invalidation (hard to update/expire), and
memory usage (extra storage).
Cache-Aside (Lazy)
Pro App controls logic, only requested data cached Con Miss = 3 round trips, stale if DB updated directly
Read-Through
Pro Simple app code — cache auto-fetches from DB Con Cache library dependency, needs DB plugin
Write-Around
Pro DB is source of truth, no stale writes in cache Con Cache may be stale until next read miss
Write-Back (Write-Behind)
Pro Lowest write latency, cache+DB eventually consistent ConData loss risk if cache crashes before DB sync
Write-Through
2sync immediately✓ Cache + DB always in syncHigher write latency (waits for DB)
Pro Reads always fresh, cache+DB in sync Con Write latency doubles, infrequent data cached too
Strategy
Read Path
Write Path
Consistency
Best For
Cache-Aside
App → Cache → DB on miss
App → DB (cache invalidated)
Eventual
General purpose (most common)
Read-Through
App → Cache (auto-fetches DB)
—
Eventual
Simpler app code
Write-Around
App → Cache → DB on miss
App → DB directly
Eventual
Write-heavy, read-rarely data
Write-Back
App → Cache
App → Cache → async DB
Eventual
High write throughput
Write-Through
App → Cache
App → Cache → sync DB
Strong
Read-heavy, consistency needed
Invalidation:TTL (simple, stale until expiry) · Event-driven (CDC/app triggers delete, near real-time) · Version key (new version = auto miss). Eviction: LRU (most common) · LFU · FIFO.
Thundering Herd: Cache expires → thousands hit DB simultaneously. Fix: mutex on cache miss, probabilistic early expiration, stale-while-revalidate.
Real-world:Facebook uses Memcached (TAO). Twitter caches timelines in Redis. Target: cache hit rate >95%.
Redis — Data Structures
In-memory data store — sub-ms latency, 100K–1M ops/sec. Cache + data structures + messaging
Pipelining — batch multiple commands in one TCP round trip — up to 10x throughput gain.
No Query Planner — commands are direct operations — no SQL parsing, no optimizer.
RAM ~100ns · SSD ~100µs · HDD ~10ms
Single GET/SET → 100K–1M ops/sec
With Pipelining → up to 10x gain
Bottleneck = NETWORK, not CPU → single thread is enough
Redis 6.0+:I/O threads for network read/write — still single-threaded for command execution. Redis 7.0: functions, multi-part AOF, sharded pub/sub.
Watch out:RAM-bound · avoid KEYS * / SMEMBERS on huge sets (use SCAN) · big keys block the event loop · use Redis Cluster to shard beyond single-node limits.
Redis as Cache
App checks Redis first — cache hit returns instantly, cache miss fetches from DB and populates cache
Pattern: App checks Redis first → cache hit returns instantly · cache miss → fetch from DB → write to Redis with TTL → serve.
Strategies:Cache-aside (most common — app manages cache). Write-through (write to cache + DB together). Write-back (write to cache, async flush to DB). Read-through (cache fetches from DB on miss).
Eviction Policies:allkeys-lru (evict least recently used — best for cache). allkeys-lfu (evict least frequently used). volatile-lru (only evict keys with TTL). noeviction (return error when full — for data store use).
Cache problems:Thundering herd — many requests hit DB on same cache miss → use SETNX lock or probabilistic early expiry. Cache penetration — queries for non-existent keys always miss → cache null values with short TTL. Cache avalanche — many keys expire simultaneously → add random jitter to TTLs.
Anti-patterns:No TTL — stale data forever. Cache everything — wastes RAM on cold data. No eviction policy — OOM crash. Inconsistent invalidation — cache and DB disagree.
Redis Pub/Sub & Streams
Real-time messaging built into Redis — from fire-and-forget broadcast to durable event logs
▸ Pub/Sub — Real-Time Broadcast
PUBLISH chat:room1 "Hello!" → sends to all current subscribers
SUBSCRIBE chat:room1 → receives "Hello!" instantly
PSUBSCRIBE chat:* → pattern match — all chat channels
Subscriber joins later → NO history → messages already gone
Guarantees:Real-time delivery (<1ms). Fan-out to all subscribers. Pattern matching.
Limitations:No persistence. No replay. No acknowledgment. No consumer groups. Fire-and-forget only.
Use cases:Figma — real-time collaboration signals. Slack — online presence indicators. Cache invalidation across app servers. Chat typing indicators.
Guarantees:Persistence (survives restart). Replay (XRANGE from any point). Consumer groups (competing consumers). At-least-once delivery. Blocking reads (XREADGROUP BLOCK).
Limitations:Single node throughput — <100K events/sec (vs Kafka millions). RAM-bound. No cross-cluster replication. Best as lightweight Kafka when you already have Redis.
▸ Cache vs Pub/Sub vs Streams — When to Use What
Feature
Cache
Pub/Sub
Streams
Purpose
Read acceleration
Real-time broadcast
Durable event log
Persistence
TTL-based
None
Yes (AOF/RDB)
Replay
❌
❌
✅ XRANGE
Fan-out
❌
✅ All subscribers
✅ Consumer groups
Acknowledgment
❌
❌ Fire-and-forget
✅ XACK
Best For
DB offload, sessions
Presence, signals, invalidation
Order pipelines, audit, IoT
Redis Persistence & High Availability
From single-node to sharded cluster — persistence, replication, and failover
▸ Persistence: RDB vs AOF
RDB (Snapshot)
AOF (Append-Only File)
▸ Redis Deployment Modes
Mode
Architecture
Sharding
HA
Use Case
Single Node
One instance
No
No (SPOF)
Dev, small cache, non-critical
Sentinel
Master + replicas + sentinel monitors
No
Yes (auto-failover)
HA cache, sessions, moderate load
Cluster
N masters (16,384 hash slots) + replicas
Yes
Yes
Large datasets, high throughput, horizontal scale
Managed
ElastiCache / MemoryDB / Upstash
Yes
Yes
Production — no ops overhead
Cluster details:16,384 hash slots distributed across masters. Key → CRC16(key) % 16384 → slot → node. Each master has 1+ replicas. Gossip protocol for node discovery. MOVED/ASK redirects for client routing. Multi-key ops only within same slot (use hash tags: {user:123}.profile).
Redlock (distributed lock): Acquire lock on majority (N/2+1) of independent Redis nodes. Set TTL to prevent deadlock. Validate lock still held before critical section. Controversial — Martin Kleppmann argues it's unsafe (clock drift). Alternative: use etcd/ZooKeeper for strong locks.
Limitations:RAM-bound — all data must fit in memory. Single-threaded core — one slow command blocks everything. Not a primary DB — use as cache/accelerator. Async replication — data loss possible on failover (use WAIT for sync).
Serve content from edge PoPs globally to improve
latency (closer to users),
throughput (offload origin), and
availability (distributed delivery),
with trade-offs in
consistency (cache freshness) and
invalidation (hard to purge).
Pull (lazy) vs Push (proactive).
Guarantees:Low latency (<50ms from edge). DDoS absorption at edge. Origin offload. Edge computing (Cloudflare Workers) runs logic at edge.
Limitations:Dynamic/personalized content harder to cache. Cache invalidation complexity. Cost at high invalidation frequency.
▸ CDN Architecture — Edge PoPs Worldwide
▸ Pull CDN vs Push CDN
Pull CDN (Lazy)
Cache on first request. Cache-Control: max-age=3600
Flow: User → Edge (MISS) → Origin → Edge caches → User Next: User → Edge (HIT, <10ms) ✓
Pro No upfront cost, auto-populates on demand Con First request slow (cache miss), cold start Use: General web assets, images, API responses
Push CDN (Proactive)
Pre-populate all PoPs before users request
Flow: Origin → Push to all PoPs on publish User: User → Edge (always HIT, <5ms) ✓
Pro Zero cold starts, predictable latency Con Storage cost, must know what to push Use: Video segments, firmware, known-hot assets
▸ Scaling with CDN — From 1K to 1B+ Requests/Day
Scaling Principles:Shield layer — intermediate cache between edge and origin that collapses duplicate misses (100 PoPs miss → 1 request to origin). Tiered TTLs — edge 60s, shield 5min, origin 1h. Request coalescing — 1000 users request same uncached asset → only 1 goes to origin. Stale-while-revalidate — serve stale, refresh async.
Pitfalls at Scale:Thundering herd — hot key expires, all PoPs hit origin. Fix: jittered TTL + coalescing. Cache stampede — popular item invalidated during spike. Fix: lock + stale-while-revalidate. Purge storms — mass invalidation overloads origin. Fix: soft purge (serve stale, refresh async).
Interview tip: Always mention cache hit ratio as the key CDN metric. A 1% improvement from 95% → 96% = 20% fewer origin requests. At Netflix scale (100B+ req/day), that's billions of saved origin calls.
Real-world:Netflix Open Connect — custom CDN in ISPs, serves 95%+ of traffic from ISP-local boxes. Cloudflare — 300+ PoPs, serves 20%+ of web traffic. CloudFront — 400+ PoPs, Lambda@Edge for compute.
Advanced Caching:Cache Warming — pre-populate cache before traffic spike (product launch, Black Friday). Multi-Level — L1 (in-process, Caffeine) → L2 (Redis) → L3 (CDN). Each level faster but smaller. CDC Invalidation — DB change → CDC event → invalidate specific cache key in real-time (no stale TTL wait). Stale-While-Revalidate — serve stale, refresh in background.
Content Delivery & Edge:CDN caches static assets at edge PoPs (Cloudflare, CloudFront). Edge Computing — run logic at edge (Cloudflare Workers, Lambda@Edge, Vercel Edge Functions). Use for: A/B testing, geo-routing, auth token validation, personalization. Reduces origin load + latency. Limitation: limited runtime, no persistent state at edge.