How does a video conferencing system handle 300-person meetings with screen sharing, adapting per-participant video quality in real-time based on available bandwidth while keeping total meeting latency under 200ms?
Core challenge: With 300 participants, naive P2P would require 300·299 = 89,700 streams. Even with an SFU, the server must selectively forward only relevant streams and adapt quality per-receiver without overwhelming any single participant's bandwidth.
300
participants
single meeting
<200ms
end-to-end latency
glass-to-glass
300M+
daily meeting participants
Zoom peak (2020+)
Adaptive
per-participant quality
based on bandwidth + layout
Architecture · SFU (Selective Forwarding Unit)
Server receives all streams, selectively forwards relevant ones to each participant
Key Design Decisions
Decision
Choice
Why
Topology
SFU (not MCU, not P2P)
MCU transcodes (expensive CPU). P2P doesn't scale past 5. SFU just routes packets.
Simulcast
Each sender encodes 3 layers (hi/med/lo)
SFU picks layer per receiver without transcoding
SVC
Scalable Video Coding (temporal/spatial layers)
Drop layers mid-stream for congestion control
Active speaker
Audio energy detection ? promote to hi-res
Only 1-3 speakers at a time need full quality
Gallery view
25 thumbnails (180p) + pagination
Bandwidth: 25·180p · 5Mbps vs 300·720p = impossible
Users connect to nearest SFU, SFUs relay between regions
Simulcast explained: Each sender encodes video at 3 qualities simultaneously (e.g., 1080p + 720p + 180p). The SFU picks which layer to forward to each receiver based on their bandwidth and layout position. No server-side transcoding needed · just packet routing.
Bandwidth math: Active speaker view: 1·720p (2Mbps) + 5·180p thumbnails (0.5Mbps) = ~2.5Mbps down. Gallery 5·5: 25·180p = ~5Mbps down. Upload: 1 simulcast stream = ~3Mbps up. Total per user: ~5-8Mbps · feasible on most connections.
Anti-patterns:MCU for large meetings · CPU cost explodes (decode+re-encode per participant). P2P for >5 users · upload bandwidth · N kills the sender. Single SFU globally · cross-continent latency ruins experience. No simulcast · SFU must transcode (defeats the purpose).
Real-world:Zoom · proprietary SFU with simulcast + SVC. Google Meet · WebRTC SFU with VP9 SVC. Discord · custom SFU for voice (Opus codec, 50ms target). Twilio · SFU-as-a-service (Programmable Video).
Interview Cheat Sheet
The 6 things to say for large video meeting design
1.SFU over MCU · route packets, don't transcode (scales to 300+ without CPU explosion) 2.Simulcast (3 layers) · sender encodes high/med/low, SFU picks per receiver's bandwidth 3.Active speaker detection · forward high-quality only for speakers, thumbnails for others 4.Cascaded SFUs · one per region, inter-SFU relay for geo-distributed meetings 5.TWCC bandwidth estimation · per-receiver, adapt forwarded layer in real-time 6.UDP + FEC · tolerate packet loss without retransmit delay (Opus audio handles 10% loss)