I-Hung Hsu Profile picture
Research Scientist @Google; CS PhD from @USC in NLP; Work on making machines to be reliable, intelligent , and user-friendly tools for all.
Oct 31 6 tweets 3 min read
🧠🚀 Excited to introduce Supervised Reinforcement Learning—a framework that leverages expert trajectories to teach small LMs how to reason through hard problems without losing their minds. 🤯

Better than SFT && RLVR.

Read more:

#llms #RL #reasoning huggingface.co/papers/2510.25…Image The struggle is real for small LMs on hard reasoning. 😣

📉 Too weak for RLVR: Can't find correct answers to reinforce.
🤯 Too small for SFT Distillation: Giant model strategies are alien concepts (way too off-policy) for them to grasp.

They need a bridge, not just more data. Image