7. Caching — System Design Concepts

🗄️ 7 · CACHING

Caching

Storing data in a faster storage layer (typically RAM) so frequently requested data can be served without repeatedly accessing the source of truth. This reduces latency (faster responses), increases throughput (fewer backend/database operations per request), and improves cost efficiency (less infrastructure needed to handle the same traffic), while introducing trade-offs around consistency (may have stale data), cache invalidation complexity, and additional memory overhead.

▸ Caching Strategies

Strategy	Flow (Read • Write)	Consistency	Strong Reason to Choose	Why Other Strategies Are Not Ideal	Real-World
Cache-Aside → Hot reads Hot, frequently read data	Pro App controls, only hot data cached Con Miss = 3 round trips, stale if DB updated directly	Eventual	App controls what's cached — same data viewed repeatedly; cache only popular content	Write-Through wastes RAM on unread data. Read-Through needs library plugin.	Twitter Timeline → read-heavy, eventual OK GitHub Repos → mostly reads, infrequent updates Shopify Products → high traffic, rare changes YouTube Video Metadata → billions of reads, rare edits Zoom Meeting Metadata → fetched on every join Netflix Movie Catalog → browse-heavy, titles rarely change
Read-Through → Auto reads Simple application code	Pro Simple app code, cache auto-fetches Con Needs cache library/plugin for DB	Eventual	Simple app code — cache automatically fetches missing data from the source	Cache-Aside requires manual miss-handling logic in every call.	AWS DynamoDB DAX → reads auto-fetched from DynamoDB Cloudflare CDN → edge auto-fetches origin on miss Google Calendar Events → calendar service reads through cache layer Spotify Playlist Metadata → catalog service auto-populates LinkedIn Profile Views → profile cache auto-fills from DB
Write-Around → Cold writes Avoid caching cold data	Pro DB is source of truth, no cache pollution Con Higher write latency, cache may be stale	Eventual	Writes skip cache — huge write volume; most records are never read again	Write-Through caches every write (wastes RAM). Write-Back risks loss.	CloudWatch / Datadog Logs → write millions/sec, rarely re-read Compliance Audit Trails → write-once, query only on investigation IoT Sensor Telemetry → massive ingest, batch analytics later Clickstream Events → firehose writes, only aggregated later Email Send Logs → stored for compliance, seldom opened
Write-Back → Fast writes Maximum write throughput	Pro Lowest write latency, reads always HIT Con Data loss risk if cache crashes	Eventual	Fastest writes — absorb massive write traffic before persisting to DB	Write-Through doubles latency. Cache-Aside hits DB on every write.	Gaming Leaderboards → 100K score updates/sec Uber Driver Location → GPS pings every 4s, batch to DB IoT Device Heartbeats → buffer 5s, flush batch to DB View Counters (YouTube) → increment in cache, persist periodically Chat Typing Indicators → ephemeral, never hits DB
Write-Through → Fresh reads after writes Strong consistency after writes	Pro Reads always fresh, zero stale window Con Write latency doubles, cold data cached	Strong	Zero stale window — users expect updates to be visible immediately on next read	Cache-Aside has stale gap. Write-Back can lose data.	Facebook TAO → friend list visible instantly after add Google Calendar (writes) → new event visible to all invitees immediately Stripe Payment State → charge status fresh on next API call Slack Channel Membership → join/leave reflected instantly Amazon Cart → add-to-cart must show on next page load

Invalidation: TTL (simple, stale until expiry) · Event-driven (CDC/app triggers delete, near real-time) · Version key (new version = auto miss). Eviction: LRU (most common) · LFU · FIFO.

Thundering Herd/cache stampede: Cache expires → thousands hit DB simultaneously. Fix: mutex on cache miss, Request coalescing / single-flight, probabilistic early expiration, stale-while-revalidate.

Real-world: Facebook uses Memcached (TAO). Twitter caches timelines in Redis. Target: cache hit rate >95%.

Cache Invalidation & Eviction

"There are only two hard things in CS: cache invalidation and naming things." — Phil Karlton

TTL-Based Expiry

Event-Driven Invalidation

Thundering Herd

Fix: Mutex lock · Probabilistic early expiry · Stale-while-revalidate

Cache Penetration

Fix: Cache null with short TTL · Bloom filter rejects impossible keys

Cache Avalanche

Fix: TTL = base ± random jitter · Never set same TTL for all keys

Method	How It Works	Freshness	Complexity	Best For
TTL (Time-to-Live)	Key auto-expires after N seconds	Stale up to TTL	Low	General purpose, acceptable staleness
Event-Driven Delete	App deletes cache key on DB write	Near real-time	Medium	User profiles, settings
CDC Invalidation	DB change → CDC event → delete key	Real-time (~100ms)	High	Multi-service, decoupled systems
Version Keys	Key includes version: `user:5:v3`	Instant (new key = miss)	Medium	Immutable data, API responses
Double-Delete	Delete before + after DB write (with delay)	Near real-time	Medium	Race condition prevention
Pub/Sub Broadcast	Publish invalidation event to all app nodes	Real-time	Medium	Multi-node local caches

Eviction Policies (when cache is full):

Policy	Evicts	Best For
LRU (Least Recently Used)	Key not accessed longest	General purpose — most common default
LFU (Least Frequently Used)	Key accessed fewest times	Hot/cold data — keeps popular items
FIFO (First In First Out)	Oldest key inserted	Simple, predictable — time-series data
Random	Random key	When access patterns are uniform
TTL-based	Keys closest to expiry first	Mixed workloads with varying freshness needs

Cache Penetration: Queries for keys that never exist in DB → always miss → DB overloaded. Fix: cache null values with short TTL (60s), or use Bloom filter to reject impossible keys before hitting cache.

Cache Avalanche: Many keys expire simultaneously → massive DB spike. Fix: add random jitter to TTLs (e.g., TTL = 3600 ± random(0,300)), stagger expiration.

Hot Key Problem: One key gets millions of reads → single Redis node overloaded. Fix: local cache (L1 in-process), replicate hot key across multiple slots, or shard the value (key:1, key:2, ... key:N).

Interview pattern: When discussing cache invalidation, always mention the consistency vs latency tradeoff. TTL = simple but stale. Event-driven = fresh but complex. The right choice depends on how much staleness your users can tolerate.

Redis — Data Structures

In-memory data store — sub-ms latency, 100K–1M ops/sec. Cache + data structures + messaging

Structure	Commands	Use Case
String Single value	SET/GET/INCR/SETNX/EXPIRE `SET session:tok_abc123 '{"user_id":"u42","role":"admin"}' EX 3600` `INCR ratelimit:user42:1705312200` `SET lock:order:ORD-9876 "worker-3" NX EX 30`	Counters · Session tokens · Cache · Distributed lock (SETNX) · Feature flags
Hash Object with fields	HSET/HGET/HGETALL/HDEL/HINCRBY `HSET user:u42 name "John" city "NYC" plan "pro"` `HINCRBY cart:sess_abc item:sku_001 2` `HGETALL config:feature-flags`	User profiles · Cache objects · Shopping cart · Config store
List Ordered list (LIFO/FIFO)	LPUSH/RPOP/LRANGE/LLEN/BRPOP `LPUSH queue:email:send '{"to":"a@b.com","tmpl":"welcome"}'` `BRPOP queue:email:send 30` `LRANGE feed:user42 0 19`	Job queues · Activity feeds · Recent history · Blocking queue
Set Unique items (unordered)	SADD/SMEMBERS/SINTER/SUNION/SISMEMBER `SADD channel:general:online "u42" "u87" "u103"` `SINTER friends:u42 friends:u87` `SISMEMBER blocklist:ip "203.0.113.5"`	Unique visitors · Tags · Deduplication · Common friends · Online users
Sorted Set Unique items ranked by score	ZADD/ZRANGE/ZRANK/ZREVRANK/ZINCRBY `ZADD leaderboard:weekly 4850 "player:u42"` `ZREVRANGE leaderboard:weekly 0 9 WITHSCORES` `ZADD delayed_jobs 1705312260 "job:send_report:u42"`	Leaderboards · Rate limiting (sliding window) · Priority queues · Trending topics · Delayed jobs
HyperLogLog Probabilistic unique count	PFADD/PFCOUNT/PFMERGE `PFADD dau:2025-01-15 "u42" "u87" "u103"` `PFCOUNT dau:2025-01-15` `PFMERGE mau:2025-01 dau:2025-01-01 dau:2025-01-02 ...`	Approximate unique count with 0.81% error at fixed 12KB memory · DAU · Unique page views
Geo Lat/long coordinates	GEOADD/GEODIST/GEORADIUS/GEOPOS/GEOSEARCH `GEOADD drivers:nyc -73.9857 40.7484 "driver:d001"` `GEOSEARCH drivers:nyc FROMLONLAT -73.98 40.75 BYRADIUS 2 km ASC COUNT 10` `GEODIST drivers:nyc "driver:d001" "rider:r042" km`	Nearby drivers (Uber/Lyft) · Store locator · Location tracking · Uses Sorted Set with Geohash
Stream Append-only event log	XADD/XREAD/XREADGROUP/XACK/XRANGE `XADD orders:events * user_id u42 action placed amount 49.99` `XREADGROUP GROUP workers consumer-1 COUNT 10 BLOCK 5000 STREAMS orders:events >` `XACK orders:events workers 1705312200000-0`	Event log · Consumer groups · Lightweight Kafka · Audit trail

Why Redis is So Fast

Multiple design decisions compound together — 100K–1M ops/sec on a single thread

In-Memory — all data in RAM. RAM ~100ns vs disk ~10ms = 100,000x faster.

Single-Threaded Core — no locks, no context switching, no race conditions. Commands execute atomically.

Non-Blocking I/O — epoll multiplexing — single thread watches 100K+ sockets.

Efficient Internals — SDS strings, ziplist, skiplist — CPU cache-friendly.

Simple Protocol (RESP) — plain text, O(1) parsing.

Pipelining — batch multiple commands in one TCP round trip — up to 10x throughput gain.

No Query Planner — commands are direct operations — no SQL parsing, no optimizer.

RAM ~100ns · SSD ~100µs · HDD ~10ms
Single GET/SET  → 100K–1M ops/sec
With Pipelining → up to 10x gain
Bottleneck = NETWORK, not CPU → single thread is enough

Redis 6.0+: I/O threads for network read/write — still single-threaded for command execution. Redis 7.0: functions, multi-part AOF, sharded pub/sub.

Watch out: RAM-bound · avoid KEYS * / SMEMBERS on huge sets (use SCAN) · big keys block the event loop · use Redis Cluster to shard beyond single-node limits.

Redis as Cache

App checks Redis first — cache hit returns instantly, cache miss fetches from DB and populates cache

Pattern: App checks Redis first → cache hit returns instantly · cache miss → fetch from DB → write to Redis with TTL → serve.

Strategies: Cache-aside (most common — app manages cache). Write-through (write to cache + DB together). Write-back (write to cache, async flush to DB). Read-through (cache fetches from DB on miss).

Eviction Policies: allkeys-lru (evict least recently used — best for cache). allkeys-lfu (evict least frequently used). volatile-lru (only evict keys with TTL). noeviction (return error when full — for data store use).

Cache problems: Thundering herd — many requests hit DB on same cache miss → use SETNX lock or probabilistic early expiry. Cache penetration — queries for non-existent keys always miss → cache null values with short TTL. Cache avalanche — many keys expire simultaneously → add random jitter to TTLs.

Anti-patterns: No TTL — stale data forever. Cache everything — wastes RAM on cold data. No eviction policy — OOM crash. Inconsistent invalidation — cache and DB disagree.

Redis Pub/Sub

Fire-and-forget real-time broadcast — every subscriber gets every message instantly, but no history, no replay, no persistence

PUBLISH  chat:room1 "Hello!"           → sends to all current subscribers
SUBSCRIBE chat:room1                   → receives "Hello!" instantly
PSUBSCRIBE chat:*                      → pattern match — all chat channels

Guarantees: Real-time delivery (<1ms). Fan-out to all subscribers. Pattern matching.

Limitations: No persistence. No replay. No acknowledgment. No consumer groups. Fire-and-forget only.

Use cases: Figma — real-time collaboration signals. Slack — online presence indicators. Cache invalidation across app servers. Chat typing indicators.

Redis Streams

Durable, replayable append-only event log with consumer groups — a lightweight Kafka built into Redis

XADD   orders * customer_id 5 amount 100 status "pending"  → 1704067200000-0
XGROUP CREATE orders payment-service 0                      → create consumer group
XREADGROUP GROUP payment-service consumer1 STREAMS orders > → read unprocessed
XACK   orders payment-service 1704067200000-0               → acknowledge done
XRANGE orders - +                                           → replay all entries

Guarantees: Persistence (survives restart). Replay (XRANGE from any point). Consumer groups (competing consumers). At-least-once delivery. Blocking reads (XREADGROUP BLOCK).

Limitations: Single node throughput — <100K events/sec (vs Kafka millions). RAM-bound. No cross-cluster replication. Best as lightweight Kafka when you already have Redis.

▸ Pub/Sub vs Streams — When to Use What

Feature	Pub/Sub	Streams
Purpose	Real-time broadcast	Durable event log
Persistence	None	Yes (AOF/RDB)
Replay	✗	✔ XRANGE
Consumer Groups	✗	✔ Competing consumers
Acknowledgment	✗ Fire-and-forget	✔ XACK
Best For	Presence, typing indicators, cache invalidation	Order pipelines, audit logs, IoT events

Redis Persistence & High Availability

How Redis survives crashes — RDB snapshots for fast recovery, AOF for minimal data loss, replication for failover

▸ Persistence: RDB vs AOF

RDB (Snapshot)

AOF (Append-Only File)

	RDB (Snapshot)	AOF (Append-Only)	Hybrid (RDB + AOF)
How	fork() → child writes .rdb to disk	Log every write command to file	AOF for durability, RDB for fast restart
Data loss	Up to last snapshot interval (1-15 min typical)	≤ 1 second (with appendfsync everysec)	≤ 1 second
Restart speed	Fast — load binary dump	Slow — replay all commands	Fast — load RDB + replay recent AOF
Disk I/O	Low (periodic bulk write)	High (continuous fsync)	Medium
File size	Compact binary	Large (AOF rewrite compacts)	Both files maintained
Best for	Backups, disaster recovery	Durability-critical data	Production default (Redis 4.0+)

AOF fsync options: always (every write — slowest, zero loss) • everysec (flush once/sec — recommended, ≤1s loss) • no (OS decides — fastest, unpredictable loss). AOF rewrite: periodic compaction removes redundant commands.

▸ Replication & Failover

Replication: Async by default — master streams commands to replicas. Replicas serve reads (read scaling). Async = data loss possible on master crash (writes not yet replicated). Use WAIT numreplicas timeout for semi-sync (waits for N replicas to ACK).

Sentinel (auto-failover): 3+ Sentinel processes monitor master. If master unreachable (quorum agrees), Sentinel promotes a replica to master and reconfigures clients. Failover takes ~5-15 seconds. Sentinels themselves use Raft-like leader election.

Failover data loss: If master accepted writes that weren't replicated before crash, those writes are permanently lost. Mitigation: min-replicas-to-write 1 + min-replicas-max-lag 10 — master refuses writes if no replica is within 10s of sync.

Redis Deployment Modes

From single-node dev cache to globally sharded production cluster

Mode	Architecture	Sharding	HA	Use Case
Single Node	One instance	No	No (SPOF)	Dev, small cache, non-critical
Sentinel	Master + replicas + sentinel monitors	No	Yes (auto-failover)	HA cache, sessions, moderate load
Cluster	N masters (16,384 hash slots) + replicas	Yes	Yes	Large datasets, high throughput, horizontal scale
Managed	ElastiCache / MemoryDB / Upstash	Yes	Yes	Production — no ops overhead

Cluster details: 16,384 hash slots distributed across masters. Key → CRC16(key) % 16384 → slot → node. Each master has 1+ replicas. Gossip protocol for node discovery. MOVED/ASK redirects for client routing. Multi-key ops only within same slot (use hash tags: {user:123}.profile).

Limitations: RAM-bound — all data must fit in memory. Single-threaded core — one slow command blocks everything. Not a primary DB — use as cache/accelerator. Async replication — data loss possible on failover (use WAIT for sync).

Real-world: Twitter — timeline cache (Redis Cluster). GitHub — job queues (Resque/Sidekiq). Snapchat — rate limiting. Pinterest — graph storage (billions of edges in Redis). Discord — presence, message cache.

Redis Distributed Locks — Redlock & Its Dangers

Redis SETNX gives simple locking — but Redlock is controversial. Know when it's safe and when it breaks.

▸ Simple Lock (Single Redis)

SET lock:order:123 "worker-A" NX EX 30    # Acquire: set if not exists, TTL 30s
# ... do critical work ...
# Safer release with Lua (atomic check-and-delete):
EVAL "if redis.call('get',KEYS[1])==ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 lock:order:123 "worker-A"

When single-node SETNX is fine: Rate limiting, deduplication, idempotency keys. Acceptable when rare double-execution is tolerable.

▸ Why Redlock Breaks — GC Pause Attack

▸ The Fix: Fencing Tokens

Failure Mode	Redlock Safe?	With Fencing Token
GC / Process Pause	✗ UNSAFE — both in critical section	✔ Safe — stale token rejected
Clock Drift (NTP jump)	✗ UNSAFE — TTL expires early	✔ Safe — token not time-based
Network Partition	✗ UNSAFE — split brain	✔ Safe — highest token wins
Redis Failover	✗ UNSAFE — lock lost on promotion	✔ Safe — fencing protects storage

When to use what: Redis SETNX for efficiency locks (dedup, rate limiting, idempotency). etcd / ZooKeeper + fencing for correctness locks (payments, inventory). Optimistic concurrency (CAS) when you can avoid locks entirely.

Kleppmann's verdict: "Redlock is not safe for correctness. It's fine for efficiency. For correctness, use consensus-based locks (etcd, ZooKeeper, Chubby) with fencing tokens."

Real-world: Google Chubby — Paxos + sequencer. etcd — Raft + revision numbers. ZooKeeper — ephemeral znodes. Stripe uses Redis for idempotency (efficiency) but DB constraints for payment correctness.

Memcached vs Redis

Both are in-memory stores — but Redis = Swiss army knife, Memcached = simple speed demon

	Redis	Memcached
Data Structures	Rich — strings, hashes, lists, sets, sorted sets, streams, geo, HLL	Strings only (key → blob)
Threading	Single-threaded core (I/O threads in 6.0+)	Multi-threaded — scales with CPU cores
Persistence	RDB + AOF — survives restarts	None — pure volatile cache
Replication	Built-in — master-replica, Sentinel, Cluster	None (client-side sharding)
Pub/Sub	Yes — channels + streams	No
Max Value Size	512 MB	1 MB (default)
Memory Efficiency	Higher overhead (metadata per key)	Slab allocator — less fragmentation
Eviction	8 policies (LRU, LFU, volatile, etc.)	LRU only
Scripting	Lua scripts — atomic multi-step ops	No
Best For	Sessions, leaderboards, queues, pub/sub, locks, complex caching	Simple key-value cache at massive scale

When to pick Memcached: You only need simple GET/SET caching, want multi-threaded performance on a single node, don't need persistence or data structures. Facebook's TAO uses Memcached for billions of social graph lookups.

When to pick Redis: You need data structures (sorted sets for leaderboards, lists for queues), persistence, pub/sub, Lua scripting, or built-in HA. Most modern systems default to Redis unless they have a specific Memcached use case.

CDN (Content Delivery Network)

Serve content from edge PoPs globally to improve latency (closer to users), throughput (offload origin), and availability (distributed delivery), with trade-offs in consistency (cache freshness) and invalidation (hard to purge). Pull (lazy) vs Push (proactive).

Guarantees: Low latency (<50ms from edge). DDoS absorption at edge. Origin offload. Edge computing (Cloudflare Workers) runs logic at edge.

Limitations: Dynamic/personalized content harder to cache. Cache invalidation complexity. Cost at high invalidation frequency.

▸ CDN Architecture — Edge PoPs Worldwide

▸ Pull CDN vs Push CDN

Pull CDN (Lazy)

Cache on first request. Cache-Control: max-age=3600

Flow: User → Edge (MISS) → Origin → Edge caches → User
Next: User → Edge (HIT, <10ms) ✓

Pro No upfront cost, auto-populates on demand
Con First request slow (cache miss), cold start
Use: General web assets, images, API responses

Push CDN (Proactive)

Pre-populate all PoPs before users request

Flow: Origin → Push to all PoPs on publish
User: User → Edge (always HIT, <5ms) ✓

Pro Zero cold starts, predictable latency
Con Storage cost, must know what to push
Use: Video segments, firmware, known-hot assets

▸ Scaling with CDN — From 1K to 1B+ Requests/Day

Scaling Principles: Shield layer — intermediate cache between edge and origin that collapses duplicate misses (100 PoPs miss → 1 request to origin). Tiered TTLs — edge 60s, shield 5min, origin 1h. Request coalescing — 1000 users request same uncached asset → only 1 goes to origin. Stale-while-revalidate — serve stale, refresh async.

Pitfalls at Scale: Thundering herd — hot key expires, all PoPs hit origin. Fix: jittered TTL + coalescing. Cache stampede — popular item invalidated during spike. Fix: lock + stale-while-revalidate. Purge storms — mass invalidation overloads origin. Fix: soft purge (serve stale, refresh async).

Interview tip: Always mention cache hit ratio as the key CDN metric. A 1% improvement from 95% → 96% = 20% fewer origin requests. At Netflix scale (100B+ req/day), that's billions of saved origin calls.

Real-world: Netflix Open Connect — custom CDN in ISPs, serves 95%+ of traffic from ISP-local boxes. Cloudflare — 300+ PoPs, serves 20%+ of web traffic. CloudFront — 400+ PoPs, Lambda@Edge for compute.

Advanced Caching: Cache Warming — pre-populate cache before traffic spike (product launch, Black Friday). Multi-Level — L1 (in-process, Caffeine) → L2 (Redis) → L3 (CDN). Each level faster but smaller. CDC Invalidation — DB change → CDC event → invalidate specific cache key in real-time (no stale TTL wait). Stale-While-Revalidate — serve stale, refresh in background.

Content Delivery & Edge: CDN caches static assets at edge PoPs (Cloudflare, CloudFront). Edge Computing — run logic at edge (Cloudflare Workers, Lambda@Edge, Vercel Edge Functions). Use for: A/B testing, geo-routing, auth token validation, personalization. Reduces origin load + latency. Limitation: limited runtime, no persistent state at edge.