Jyo Pari Profile picture
Working on continual learning | PhD @MIT
Sep 5, 2025 7 tweets 3 min read
For agents to improve over time, they can’t afford to forget what they’ve already mastered.

We found that supervised fine-tuning forgets more than RL when training on a new task!

Want to find out why? 👇Image We fully swept hyperparameters for both methods and plotted the Pareto Frontier.

Our finding holds for LLMs, Robotics foundation models, and even 3-layer MLP: Image
Jun 13, 2025 11 tweets 5 min read
What if an LLM could update its own weights?

Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs.

Self-editing is learned via RL, using the updated model’s downstream performance as reward. Image Self-edits (SE) are generated in token space and consist of training data and optionally optimization parameters. This is trained with RL, where the actions are the self-edit generations, and the reward is the updated model's performance on a task relevant to the input context. Image