Latest Twitter Threads by @IHung_Hsu on Thread Reader App

Oct 31, 2025 • 6 tweets • 3 min read

🧠🚀 Excited to introduce Supervised Reinforcement Learning—a framework that leverages expert trajectories to teach small LMs how to reason through hard problems without losing their minds. 🤯

Better than SFT && RLVR.

Read more:

#llms #RL #reasoning huggingface.co/papers/2510.25…

The struggle is real for small LMs on hard reasoning. 😣

📉 Too weak for RLVR: Can't find correct answers to reinforce.
🤯 Too small for SFT Distillation: Giant model strategies are alien concepts (way too off-policy) for them to grasp.

They need a bridge, not just more data.

Share this page!

Enter URL or ID to Unroll