How does a ride-hailing platform match riders to the nearest available driver within 3 seconds, scoring candidates by distance, ETA, and driver rating while avoiding global scans of all drivers?
Core challenge: 5M+ drivers sending GPS updates every 4 seconds. A rider requests a ride. You must find the best driver within 2km in under 3 seconds without scanning all 5M drivers globally. The answer: spatial indexing with H3 hexagonal grid.
5M+
active drivers
GPS every 4 seconds
<3s
match latency
request ? driver assigned
~1.25M
GPS updates / sec
5M drivers · 4s interval
H3 Hex
spatial index
resolution 7 (~5km· cells)
Functional Requirements
Must Have
1. Ingest real-time GPS from all active drivers (every 4s) 2. On ride request, find nearest available drivers within radius 3.Score candidates by distance, ETA, rating, acceptance rate 4.Offer ride to best driver with 15s timeout 5. If declined/timeout, offer to next best (cascade) 6. Handle concurrent requests · don't offer same driver to two riders
Out of Scope
? Route calculation and navigation ? Surge pricing computation ? Payment processing ? Driver onboarding and verification ? Ride tracking after match
Redis Sentinel per city, fallback to expand radius
Consistency
Eventual · driver location 0-4s stale
Acceptable: GPS updates every 4s, matching uses latest available
Fairness
No driver starved, no rider waiting forever
Scoring balances distance + rating + acceptance rate + time since last ride
Concurrency
No double-offer (same driver to 2 riders)
Redis SETNX lock per driver_id with 15s TTL
High-Level Architecture
H3 spatial index + Redis for real-time driver location + scoring service
Why H3 over Geohash? Geohash cells are rectangles · distance from center to edge varies. H3 hexagons have uniform distance from center to all edges. Neighbors are always equidistant. No "edge effects" where nearby points fall in distant cells. Uber open-sourced H3 specifically for this use case.
Scaling: Redis sharded by city. Each city has its own geospatial index. NYC alone: ~80K active drivers. GEORADIUS on 80K entries = ~2ms. Global: 600+ cities, each independently scaled. Driver location TTL=10s auto-removes offline drivers.
Edge cases:No drivers nearby ? expand radius (2km ? 5km ? 10km). All drivers decline ? re-score with lower threshold. GPS drift ? Kalman filter smoothing on driver app. City boundary ? query adjacent H3 cells (k-ring).
Scale Estimation
Back-of-envelope math for geo-spatial matching
Given:5M active drivers · GPS every 4s · <3s match · 15M rides/day
Step
Derivation
Result
Design Impact
1
GPS ingestion: 5M · 4s
1.25M updates/sec
Redis GEOADD must handle 1.25M writes/sec (sharded by city)
2
Match requests: 15M rides · 86400s
~174 matches/sec avg
Low QPS · matching logic can be complex per request
3
Peak matches: 174 · 5· rush hour
~870 matches/sec peak
Still manageable · bottleneck is GPS ingestion, not matching
4
Drivers per city (NYC): ~80K active
GEORADIUS on 80K = ~2ms
Redis handles single-city queries trivially
5
Redis memory: 5M drivers · 100 bytes
~500 MB total
Fits in single Redis node per city (shard for safety)
6
H3 cells (res 7): ~5.16 km· each
~50K cells per city
Efficient spatial partitioning without scanning all drivers
Data Model
Redis geospatial for real-time location · PostgreSQL for ride history
// --- Driver Location (Redis Geospatial) ---
GEOADD drivers:nyc {lng} {lat} driver_123 // update position
GEORADIUS drivers:nyc {lng} {lat} 2 km COUNT 20 ASC // find nearest 20
// --- Driver State (Redis Hash) ---
HSET driver:123 status "available" vehicle "sedan" rating 4.8 h3_cell "abc"
TTL: 10s (auto-expire if no GPS update ? driver considered offline)
// --- Ride Requests (PostgreSQL) ---
rides {
ride_id: UUID
rider_id: UUID
driver_id: UUID NULL -- assigned after match
pickup: POINT(lat, lng)
dropoff: POINT(lat, lng)
status: ENUM(requested, matched, en_route, in_progress, completed, cancelled)
city_id: VARCHAR
created_at: TIMESTAMP
matched_at: TIMESTAMP NULL
}
// --- Match Lock (Redis) ---
SETNX match_lock:driver_123 ride_456 EX 15 // lock driver for 15s
// If driver declines or times out ? DEL match_lock:driver_123 ? offer to next
Resilience & Edge Cases
Failure
Impact
Recovery
Redis node failure
Driver locations lost for one city
Drivers re-register on next GPS update (4s). Sentinel promotes replica.
No drivers in radius
Rider waits
Expand radius: 2km ? 5km ? 10km. Show "searching..." with ETA.
Driver accepts but GPS stale
Driver farther than expected
Re-verify location on accept. If too far, re-match.
Concurrent match (race)
Two riders offered same driver
Redis SETNX lock · only first succeeds. Second gets next candidate.
DB lock (too slow), ZooKeeper (overkill for 15s lock)
GPS Ingestion
Kafka ? Location Service ? Redis
Buffer bursts, replay on failure, decouple from Redis
Direct write to Redis (no buffer, lose data on Redis failure)
Scoring
Custom scoring service (Go)
Low-latency, ETA from routing service, configurable weights
ML model (too slow for <5ms budget at this stage)
Interview Cheat Sheet
The 8 things to say for geo-matching design
1.H3 hexagonal grid · uniform distance, no edge effects, O(1) lat/lng ? cell 2.Redis GEOADD/GEORADIUS · sub-ms spatial queries on 80K drivers per city 3.TTL=10s on driver location · auto-removes offline drivers without explicit deregister 4.SETNX lock · prevents same driver offered to two riders simultaneously 5.Scoring: ETA · 0.4 + rating · 0.3 + distance · 0.3 · not just nearest, but best 6.Cascade on decline · 15s timeout, then offer to next-best candidate 7.City-level sharding · each city is independent Redis instance, no cross-city queries 8.Expand radius on failure · 2km ? 5km ? 10km, with rider notification of wait