Latest Twitter Threads by @jyo_pari on Thread Reader App

Jun 13 • 11 tweets • 5 min read

What if an LLM could update its own weights?

Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs.

Self-editing is learned via RL, using the updated model’s downstream performance as reward.

Self-edits (SE) are generated in token space and consist of training data and optionally optimization parameters. This is trained with RL, where the actions are the self-edit generations, and the reward is the updated model's performance on a task relevant to the input context.

Share this page!

Enter URL or ID to Unroll