Latest Twitter Threads by @SergioPaniego on Thread Reader App

Jun 23 • 6 tweets • 3 min read

ECHO (Environment Cross-entropy Hybrid Objective) demo support just landed in OpenEnv, and it's a cool idea: train agents to learn a world model almost for free

original paper by @VaishShrivas, Piero Kauffman, Ahmed Awadallah and @DimitrisPapail @MSFTResearch

when an agent acts in an env, a rollout has 2 sides: what the agent writes and what the env writes back

normal agent RL would only train on the agent's side

train a CLI agent with GRPO and the reward shapes the action tokens, while the env's responses get masked out of the loss

all that ground-truth about what actually happened gets thrown away

Share this page!

Enter URL or ID to Unroll