How does a CI/CD platform execute 10M+ builds per day using distributed workers, with intelligent artifact caching providing 10· speedup, job scheduling with dependency graphs, and real-time log streaming to developers?
Core challenge: Builds are bursty (Monday 9am = 10· weekend traffic), each needs an isolated environment, jobs have complex dependency DAGs, and developers expect real-time log output · not "check back in 5 minutes."
10M+
builds / day
10·
cache speedup
Distributed
worker pool (auto-scale)
Real-time
log streaming
Architecture
Job DAG scheduling: Workflow YAML defines jobs with needs: dependencies. Orchestrator builds a DAG ? schedules independent jobs in parallel ? waits for dependencies before starting downstream jobs. Matrix builds fan out (e.g., test on 5 OS · 3 Node versions = 15 parallel jobs).
Artifact caching: Cache key = hash(package-lock.json + OS + Node version). On hit, restore node_modules from blob store (seconds vs minutes). Content-addressed storage · same dependencies across branches share cache. Layer caching for Docker builds (each layer cached independently).
Anti-patterns:No job isolation · one build's side effects break another. Polling for logs · wastes bandwidth, poor UX. No concurrency limits · one repo monopolizes all workers. Cache everything blindly · stale caches cause flaky builds.
Real-time log streaming: Workers stream stdout/stderr via WebSocket to log aggregator. Client connects to aggregator for live tail. Logs also persisted to blob storage for later retrieval. Chunked upload · don't wait for job completion to see output.
Interview Cheat Sheet
1.Job DAG · parse workflow dependencies, maximize parallelism, respect ordering constraints 2.Ephemeral workers · fresh VM/container per job, auto-scale 0?10K based on queue depth 3.Content-addressed caching · hash(lockfile) as cache key, 10· speedup on cache hit 4.Fair-share scheduling · per-org quotas prevent one customer from starving others 5.Real-time logs · WebSocket streaming from worker ? aggregator ? client browser 6.Isolation · each job gets clean environment, secrets injected at runtime, destroyed after