Sergio Paniego Profile picture
Machine Learning Engineer @huggingface 🤗 AI PhD. Technology enables us to be more human. 🏳️‍🌈
Jun 23 6 tweets 3 min read
ECHO (Environment Cross-entropy Hybrid Objective) demo support just landed in OpenEnv, and it's a cool idea: train agents to learn a world model almost for free

original paper by @VaishShrivas, Piero Kauffman, Ahmed Awadallah and @DimitrisPapail @MSFTResearch

when an agent acts in an env, a rollout has 2 sides: what the agent writes and what the env writes backImage normal agent RL would only train on the agent's side

train a CLI agent with GRPO and the reward shapes the action tokens, while the env's responses get masked out of the loss

all that ground-truth about what actually happened gets thrown away