Machine Learning Engineer @huggingface 🤗
AI PhD. Technology enables us to be more human. 🏳️🌈
Jun 23 • 6 tweets • 3 min read
ECHO (Environment Cross-entropy Hybrid Objective) demo support just landed in OpenEnv, and it's a cool idea: train agents to learn a world model almost for free
original paper by @VaishShrivas, Piero Kauffman, Ahmed Awadallah and @DimitrisPapail @MSFTResearch
when an agent acts in an env, a rollout has 2 sides: what the agent writes and what the env writes back
normal agent RL would only train on the agent's side
train a CLI agent with GRPO and the reward shapes the action tokens, while the env's responses get masked out of the loss
all that ground-truth about what actually happened gets thrown away