System Design Case Study

How does Meta classify 500M+ posts/day for policy violations?

??? Design a content moderation system: 500M posts/day, multi-modal ML, <1% false positive, human review routing
Concepts Involved

Problem Statement

How does a content moderation platform classify 500M+ posts/day across text, images, and video for policy violations while keeping false positives below 1% and routing borderline cases to human reviewers within minutes?

Core challenge: Multi-modal content (text + image + video) requires different ML models working together. Must balance aggressive detection (catch harmful content fast) with low false positives (don't silence legitimate speech). Borderline cases need human judgment in minutes, not hours.
500M+
posts classified / day
<1%
false positive rate
Minutes
to human review
Multi-modal
text + image + video ML

Architecture

LAYER 1 · INGESTION + ML CLASSIFIERS (500M Posts/Day, 3 Parallel Pipelines) Post Ingestion 500M posts/day ~5,800 posts/sec Text + Image + Video Multi-modal content 100+ languages Text: NLP Transformer Hate speech, spam, threats, misinformation Image: CNN + CLIP Nudity, violence, CSAM, graphic content Video: Frame Sampling Keyframes ? per-frame CNN classification Confidence Score 0.0 · 1.0 per category Ensemble combines all modality signals together Multi-label: post can be hate + violence + spam <1% false positive target Threshold Routing >0.95 ? Auto-Remove 0.5-0.95 ? Human Review <0.5 ? Allow (publish) Per-category thresholds LAYER 2 · DECISION (Routing by Confidence ? Outcomes) Auto-Remove (>0.95) Instant removal, no human needed CSAM: hash-match ? immediate + report Appeal available post-removal ~5% of all posts auto-removed Human Review (0.5-0.95) Routed to review queue by severity Context needed: satire? news? art? Decision within minutes (P0/P1) ~10% of posts need human review Allow (<0.5) Published with monitoring User reports can escalate ~85% of posts pass cleanly Continuous monitoring active LAYER 3 · HUMAN REVIEW + FEEDBACK LOOP (Retrain Weekly) Human Review Queue (Priority-Based) ?? Child Safety (P0) · immediate ?? Violence (P1) · minutes ?? Spam (P2) · batch hourly 15K+ human reviewers Specialized by category Multi-reviewer for P0 Decision ? remove or allow + appeal process for users Feedback Loop (Continuous Improvement) Human decisions ? retrain models weekly Active learning: prioritize uncertain samples A/B test new models on shadow traffic Track FP/FN rates per category per region Adversarial robustness: detect text obfuscation retrain signal ? updated models <1% false positive | severity routing | multi-label classification | weekly retrain | adversarial robustness Child safety: lowest threshold (err on removal) | Ensemble scoring across modalities | Shadow testing before model promotion
Multi-modal ML pipeline: Text goes through NLP transformers, images through CNN + CLIP embeddings, video through keyframe extraction + per-frame classification. Ensemble scoring combines modality signals · a benign caption on a harmful image still gets flagged.
Threshold-based routing: High confidence (>0.95) ? auto-remove. Medium (0.5-0.95) ? human review queue prioritized by severity. Low (<0.5) ? allow but monitor. Child safety content has lowest threshold · err on side of removal.
Anti-patterns: Single threshold for all categories · hate speech ? spam ? nudity. No human-in-the-loop · ML alone can't handle context/satire. Batch retraining only · adversarial content evolves daily.
Feedback loop: Human review decisions feed back into model training. Track false positive/negative rates per category. A/B test new models on shadow traffic before promotion. Active learning · prioritize uncertain samples for labeling.

Interview Cheat Sheet

1. Multi-modal classifiers · separate models for text/image/video, ensemble for final score
2. Confidence thresholds · auto-action at high confidence, human review for borderline
3. Severity-based routing · child safety immediate, hate speech priority, spam batch
4. Feedback loop · human decisions retrain models, active learning on uncertain samples
5. False positive minimization · per-category thresholds, appeal process, shadow testing
6. Adversarial robustness · text obfuscation detection, steganography checks, rapid model updates