Research Scientist @Google; CS PhD from @USC in NLP; Work on making machines to be reliable, intelligent , and user-friendly tools for all.
Oct 31 • 6 tweets • 3 min read
🧠🚀 Excited to introduce Supervised Reinforcement Learning—a framework that leverages expert trajectories to teach small LMs how to reason through hard problems without losing their minds. 🤯
📉 Too weak for RLVR: Can't find correct answers to reinforce.
🤯 Too small for SFT Distillation: Giant model strategies are alien concepts (way too off-policy) for them to grasp.