How does a content moderation platform classify 500M+ posts/day across text, images, and video for policy violations while keeping false positives below 1% and routing borderline cases to human reviewers within minutes?
Core challenge: Multi-modal content (text + image + video) requires different ML models working together. Must balance aggressive detection (catch harmful content fast) with low false positives (don't silence legitimate speech). Borderline cases need human judgment in minutes, not hours.
500M+
posts classified / day
<1%
false positive rate
Minutes
to human review
Multi-modal
text + image + video ML
Architecture
Multi-modal ML pipeline: Text goes through NLP transformers, images through CNN + CLIP embeddings, video through keyframe extraction + per-frame classification. Ensemble scoring combines modality signals · a benign caption on a harmful image still gets flagged.
Threshold-based routing: High confidence (>0.95) ? auto-remove. Medium (0.5-0.95) ? human review queue prioritized by severity. Low (<0.5) ? allow but monitor. Child safety content has lowest threshold · err on side of removal.
Anti-patterns:Single threshold for all categories · hate speech ? spam ? nudity. No human-in-the-loop · ML alone can't handle context/satire. Batch retraining only · adversarial content evolves daily.
Feedback loop: Human review decisions feed back into model training. Track false positive/negative rates per category. A/B test new models on shadow traffic before promotion. Active learning · prioritize uncertain samples for labeling.
Interview Cheat Sheet
1.Multi-modal classifiers · separate models for text/image/video, ensemble for final score 2.Confidence thresholds · auto-action at high confidence, human review for borderline 3.Severity-based routing · child safety immediate, hate speech priority, spam batch 4.Feedback loop · human decisions retrain models, active learning on uncertain samples 5.False positive minimization · per-category thresholds, appeal process, shadow testing 6.Adversarial robustness · text obfuscation detection, steganography checks, rapid model updates