How does a payment platform process millions of charges without double-charging, ensuring idempotent request handling, two-phase state transitions (pending?captured?settled), and at-least-once delivery with server-side deduplication?
Core challenge: Network is unreliable. Client sends charge request, server processes it, but the response is lost. Client retries. Without idempotency, the customer is charged twice. At millions of transactions/day, even 0.01% duplicates = thousands of angry customers.
Millions
charges / day
across 3M+ merchants
Zero
double charges
idempotency guarantee
2-Phase
state machine
pending ? captured ? settled
99.999%
availability
payments can't go down
Functional Requirements
Must Have
1. Accept charge request with idempotency key (client-generated UUID) 2.Deduplicate retries · same key returns same result without re-processing 3.Two-phase capture: authorize (hold funds) ? capture (charge) ? settle (transfer) 4.Atomic state transitions · no partial states visible 5. Support refunds as compensating transactions 6.Audit trail · every state change logged immutably
Out of Scope
? Fraud detection and risk scoring ? Multi-currency and FX ? Merchant onboarding and KYC ? Subscription billing and invoicing ? PCI-DSS card tokenization
Non-Functional Requirements
Property
Target
Design Impact
Correctness
Zero double-charges
Idempotency key + atomic state machine + DB constraints
Availability
99.999% (5 min/year)
Multi-region active-active, no single point of failure
Latency
<2s end-to-end charge
Most time is card network RTT (~1s). Internal processing <100ms.
Durability
Zero transaction loss
Synchronous replication, WAL, ledger is append-only
Auditability
Every state change logged immutably
Double-entry ledger, event sourcing for state transitions
Compliance
PCI-DSS Level 1
Card data tokenized, encrypted at rest, access logged
Idempotency key design: Client generates UUID before first attempt. Server stores {key ? request_hash + response + created_at}. On retry: if key exists AND request matches ? return stored response. If key exists but request differs ? return 422 (misuse). Key expires after 24h.
State machine guarantees: Each transition is atomic (DB transaction). Forward-only · can't go from CAPTURED back to PENDING. Timeout handlers · PENDING > 30min ? auto-void. Compensating actions · refund creates reverse ledger entries.
Failure modes:Network timeout to card network ? store as PENDING, async reconciliation resolves. DB crash mid-transaction ? idempotency key not committed ? retry is safe. Duplicate webhook from card network ? idempotent state machine ignores duplicate transitions.
Real-world:Stripe · Idempotency-Key header on all mutating APIs. PayPal · request_id for deduplication. Square · idempotency_key with 24h window. Adyen · reference field for merchant-side dedup. All major payment processors implement this pattern.
Scale Estimation
Back-of-envelope math for a payment platform
Given:Millions of charges/day · 3M+ merchants · 99.999% uptime · Zero double-charges
Step
Derivation
Result
Design Impact
1
Charges/sec: 10M charges/day · 86400
~115 charges/sec avg
Not high throughput · correctness matters more than speed
2
Peak: 115 · 10· (Black Friday)
~1,150 charges/sec peak
Must handle 10· burst without dropping transactions
3
Idempotency keys stored: 10M/day · 24h TTL
~10M active keys
Redis or DB index · fits in memory easily
4
Ledger entries: 10M charges · 3 entries each
~30M ledger rows/day
Append-only, partitioned by merchant_id
5
Webhook deliveries: 10M charges · 3 events each
~30M webhooks/day
Async delivery with retry queue (exponential backoff)
6
Uptime: 99.999% = 5.26 min downtime/year
Multi-region active-active
Can't have single region · payment must always work
Data Model
Double-entry ledger + state machine + idempotency store
Double-entry guarantee: Every charge creates exactly 2+ ledger entries that sum to zero. Debit customer $100, credit merchant $97, credit platform $3. If any entry fails, the transaction rolls back. Auditors can verify: total debits = total credits across entire system.
Resilience & Edge Cases
Payment systems must handle every failure mode gracefully · money can never be lost or duplicated
Failure
Impact
Recovery
Network timeout to card network
Don't know if charge succeeded
Store as PENDING. Async reconciliation query to card network. Resolve within minutes.
DB crash mid-transaction
Idempotency key not committed
Client retries ? key doesn't exist ? safe to reprocess. No double-charge.