How does Cloudflare rate-limit 45M+ req/sec without adding latency?

➔? Design a distributed rate limiter: 45M req/sec, 300+ PoPs, fail-open, <1ms overhead

Concepts Involved

Rate Limiting API Gateway Redis CDN Load Balancer

Problem Statement

How does an edge network rate-limit 45M+ requests/sec without adding latency, distributing rate counters across hundreds of PoPs while using fail-open policies to avoid blocking legitimate traffic during sync delays?

Core challenge: Rate limiting at the edge means no central counter · each of 300+ PoPs must make local decisions. But a distributed attacker can spread requests across PoPs to bypass per-PoP limits. How do you coordinate without adding latency?

45M+

requests / second

across all PoPs

300+

edge PoPs

globally distributed

<1ms

added latency

rate check overhead

Fail-Open

policy on sync failure

never block legit traffic

Architecture · Sliding Window at the Edge

Local counters + async gossip for global coordination

Decision	Choice	Why
Algorithm	Sliding window log (per-PoP) + approximate global sync	Precise locally, eventually consistent globally
Storage	In-memory counters per worker (no DB)	Zero I/O latency · counter lives in L1 cache
Coordination	Async gossip every 1-5s between PoPs	Share counts without blocking request path
Fail mode	Fail-open (allow) when sync is stale	Availability > precision · never block legit users
Key space	IP + API key + path (configurable)	Flexible: per-IP, per-key, per-endpoint rules
Response	429 + Retry-After + RateLimit-* headers	Standard, client can back off intelligently

Sliding window: Each PoP maintains a sliding window counter per (key, rule). Window = weighted sum of current + previous bucket. Example: 100 req/min limit, current bucket (30s) has 40 req, previous bucket had 80 req → weighted = 40 + 80·0.5 = 80 → allow (under 100).

Global coordination: Every 1-5s, PoPs gossip their local counts to a coordination layer. Global rate = sum of all PoP counts. If global exceeds threshold, all PoPs tighten local limits. Tradeoff: 1-5s window where distributed attacks can slightly exceed limits.

Anti-patterns: Central Redis counter · adds 10-50ms RTT per request (unacceptable at edge). Fail-closed on sync failure · blocks all traffic during network issues. Fixed window · burst at window boundary (2· limit in 1s).

Real-world: Cloudflare · sliding window at 300+ PoPs. Stripe · token bucket per API key (100 req/sec). GitHub · 5000 req/hour sliding window. AWS API Gateway · token bucket with burst capacity.

Resilience & Edge Cases

Failure	Impact	Recovery
Gossip network down	PoPs can't share counts · distributed attacker bypasses global limit	Fail-open: allow traffic. Tighten local limits as fallback (per-PoP limit = global/N).
PoP overloaded	Counter updates delayed	Shed load at L4 before rate limiter. Pre-filter known-bad IPs via blocklist.
Clock skew between PoPs	Window boundaries misaligned	Use NTP sync. Sliding window is tolerant of small skew (weighted average smooths it).
Hot key (one API key = 90% traffic)	Counter for that key dominates memory	Separate hot-key path with dedicated counter. Alert on anomalous single-key volume.
Legitimate burst (viral event)	Good traffic rate-limited	Burst allowance (token bucket hybrid). Whitelist known-good keys. Dynamic limit adjustment.

Interview Cheat Sheet

The 6 things to say for distributed rate limiting

1. Sliding window · weighted current + previous bucket (no boundary burst)
2. In-memory counters · zero I/O, counter in L1 cache, <1ms overhead
3. Async gossip between PoPs · share counts every 1-5s without blocking requests
4. Fail-open policy · if sync is stale, allow traffic (availability > precision)
5. 429 + Retry-After + RateLimit headers · standard response, client backs off
6. No central Redis · adds 10-50ms RTT (unacceptable at edge). Local decision + eventual global sync.

System Design Case Study

Problem Statement

Architecture · Sliding Window at the Edge

Resilience & Edge Cases

Interview Cheat Sheet