T

Model Policy Lead, Video Policy

TikTok
Full-time
Remote friendly (San Jose, California, United States)
United States
$147,000 - $270,000 USD yearly
Trust & Safety

This role combines technical judgment, operational rigor, and policy intuition. You'll work closely with Engineering, Product and Ops teams to manage how policy is embedded in model behavior, measured through our platform quality metrics, and improved through model iterations and targeted interventions. You’ll also ensure that policy changes - often made to improve human reviewer precision - are consistently iterated across all machine enforcement pathways, maintaining unified and transparent enforcement standards.

You will lead policy governance across four model enforcement streams central to TikTok’s AI moderation systems: 1. At-Scale Moderation Models (ML Classifiers) - Own policy alignment and quality monitoring for high-throughput classifiers processing hundreds of millions of videos daily. These models rely on static training data and operate without prompt logic - requiring careful threshold setting, false positive/negative analysis, and drift tracking; 2. At-Scale AI Moderation (LLM/CoT-Based) - Oversee CoT-based AI moderation systems handling millions of cases per day. Your team produces CoT, structured labeling guidelines and dynamic prompts to interpret complex content and provide a policy assessment. Your team will manage accuracy monitoring, labeling frameworks, and precision fine-tuning; 3. Model Change Management - Ensure consistent enforcement across human and machine systems as policies evolve. You will lead the synchronization of changes across ML classifiers, AI models, labeling logic, and escalation flows to maintain unified, up-to-date enforcement standards; 4. Next-Bound AI Projects (SOTA Models) - Drive development of high-accuracy, LLM-based models used to benchmark and audit at-scale enforcement. These projects are highly experimental, and are at the forefront of LLM-application in real world policy enforcement and quality validation.