System Design Case Study

How does Cloudflare rate-limit 45M+ requests/sec without adding latency?

??? Design a distributed rate limiter: 45M req/sec, 300+ PoPs, fail-open, <1ms overhead
Concepts Involved

Problem Statement

How does an edge network rate-limit 45M+ requests/sec without adding latency, distributing rate counters across hundreds of PoPs while using fail-open policies to avoid blocking legitimate traffic during sync delays?

Core challenge: Rate limiting at the edge means no central counter · each of 300+ PoPs must make local decisions. But a distributed attacker can spread requests across PoPs to bypass per-PoP limits. How do you coordinate without adding latency?
45M+
requests / second
across all PoPs
300+
edge PoPs
globally distributed
<1ms
added latency
rate check overhead
Fail-Open
policy on sync failure
never block legit traffic

Architecture · Sliding Window at the Edge

Local counters + async gossip for global coordination

Cloudflare Distributed Rate Limiting · End-to-End Architecture CLIENT LAYER · 45M req/sec Global Traffic Americas ~15M req/sec Europe ~12M req/sec Asia-Pacific ~13M req/sec Other Regions ~5M req/sec Rate Limit Key IP + API key + path Anycast routing: each request hits nearest PoP automatically | Distributed attackers spread across PoPs EDGE LAYER · 300+ PoPs, In-Memory Counters PoP · US-East (IAD) in-memory sliding window allow/deny in <1ms no DB, no Redis, L1 cache PoP · EU-West (CDG) in-memory sliding window independent local decision per-key evaluation <1ms PoP · AP-South (SIN) in-memory sliding window fail-open if gossip stale availability > precision Each PoP: weighted = curr_bucket + prev_bucket · overlap% | If weighted = limit ? DENY | else ? ALLOW 300+ PoPs · per-worker counters | Zero network I/O on hot path | Counter lives in CPU L1 cache ALLOW DENY Sliding Window Formula weighted = curr_count + prev_count · (1 - elapsed/window) e.g. 40 + 80·0.5 = 80 No boundary burst problem Fail-Open Policy If gossip stale ? ALLOW Never block legit traffic COORDINATION & OUTPUT LAYER ALLOW ? Origin Server 200 OK · clean traffic forwarded RateLimit-Remaining: 20 Origin sees only good traffic DENY ? 429 Too Many Retry-After: 30s RateLimit-Remaining: 0 Client backs off intelligently Gossip Protocol: PoPs share local counts every 1-5s ? aggregate global rate ? tighten/loosen local limits gossip 1-5s gossip 1-5s Global Coordination Aggregate counts every 1-5s Global rate = S all PoP counts If global > threshold: tighten all local limits Eventually consistent No central Redis needed Tradeoff: 1-5s window for distributed attack bypass Sliding Window Calculation · Concrete Example Previous Window (60s): 80 requests Current Window (30s elapsed): 40 req 50% overlap = 1 - (30s elapsed / 60s window) = 0.5 weighted = 40 + (80 · 0.5) = 80 Limit: 100 req/min per key 80 < 100 ? ALLOW ? If current had 70 requests instead: 70 + (80 · 0.5) = 110 > 100 ? DENY ? (429) Key = IP + API key + endpoint path | No boundary burst (unlike fixed window) | Memory: 2 counters per key per PoP
DecisionChoiceWhy
AlgorithmSliding window log (per-PoP) + approximate global syncPrecise locally, eventually consistent globally
StorageIn-memory counters per worker (no DB)Zero I/O latency · counter lives in L1 cache
CoordinationAsync gossip every 1-5s between PoPsShare counts without blocking request path
Fail modeFail-open (allow) when sync is staleAvailability > precision · never block legit users
Key spaceIP + API key + path (configurable)Flexible: per-IP, per-key, per-endpoint rules
Response429 + Retry-After + RateLimit-* headersStandard, client can back off intelligently
Sliding window: Each PoP maintains a sliding window counter per (key, rule). Window = weighted sum of current + previous bucket. Example: 100 req/min limit, current bucket (30s) has 40 req, previous bucket had 80 req ? weighted = 40 + 80·0.5 = 80 ? allow (under 100).
Global coordination: Every 1-5s, PoPs gossip their local counts to a coordination layer. Global rate = sum of all PoP counts. If global exceeds threshold, all PoPs tighten local limits. Tradeoff: 1-5s window where distributed attacks can slightly exceed limits.
Anti-patterns: Central Redis counter · adds 10-50ms RTT per request (unacceptable at edge). Fail-closed on sync failure · blocks all traffic during network issues. Fixed window · burst at window boundary (2· limit in 1s).
Real-world: Cloudflare · sliding window at 300+ PoPs. Stripe · token bucket per API key (100 req/sec). GitHub · 5000 req/hour sliding window. AWS API Gateway · token bucket with burst capacity.

Resilience & Edge Cases

FailureImpactRecovery
Gossip network downPoPs can't share counts · distributed attacker bypasses global limitFail-open: allow traffic. Tighten local limits as fallback (per-PoP limit = global/N).
PoP overloadedCounter updates delayedShed load at L4 before rate limiter. Pre-filter known-bad IPs via blocklist.
Clock skew between PoPsWindow boundaries misalignedUse NTP sync. Sliding window is tolerant of small skew (weighted average smooths it).
Hot key (one API key = 90% traffic)Counter for that key dominates memorySeparate hot-key path with dedicated counter. Alert on anomalous single-key volume.
Legitimate burst (viral event)Good traffic rate-limitedBurst allowance (token bucket hybrid). Whitelist known-good keys. Dynamic limit adjustment.

Interview Cheat Sheet

The 6 things to say for distributed rate limiting

1. Sliding window · weighted current + previous bucket (no boundary burst)
2. In-memory counters · zero I/O, counter in L1 cache, <1ms overhead
3. Async gossip between PoPs · share counts every 1-5s without blocking requests
4. Fail-open policy · if sync is stale, allow traffic (availability > precision)
5. 429 + Retry-After + RateLimit headers · standard response, client backs off
6. No central Redis · adds 10-50ms RTT (unacceptable at edge). Local decision + eventual global sync.