ECHO (Environment Cross-entropy Hybrid Objective) demo support just landed in OpenEnv, and it's a cool idea: train agents to learn a world model almost for free
original paper by @VaishShrivas, Piero Kauffman, Ahmed Awadallah and @DimitrisPapail @MSFTResearch
when an agent acts in an env, a rollout has 2 sides: what the agent writes and what the env writes back
normal agent RL would only train on the agent's side
train a CLI agent with GRPO and the reward shapes the action tokens, while the env's responses get masked out of the loss
all that ground-truth about what actually happened gets thrown away
ECHO proposes using that part too instead of discarding it
on top of the usual RL loss on actions, it adds a small cross-entropy loss on the env's tokens, so the model also learns to predict what the env does
L = GRPO(actions) + λ · CE(observations)
and this is almost free: those tokens already passed through the same forward pass, the logits are already computed, so no extra rollout and no teacher model
you get a world model as a side effect, even failed rollouts turn into signal, and the gains are real:
up to 2.3x faster training and TerminalBench 2.0 pass@1 roughly doubles
to learn more about the idea check out the article by one of the paper's authors (@DimitrisPapail): x.com/DimitrisPapail…
concretely, OpenEnv now lets you tag, per token, what was an action vs an env observation, plus a world-model coefficient
it ships with two runnable demo examples
check them out here: github.com/huggingface/Op…
this brand new research already sits inside the open standard of OpenEnv!
additionally, check these other resources:
> PI blog: primeintellect.ai/blog/true-agen…
> ECHO in enterprise RL with Foundry (by @t2govind): devblogs.microsoft.com/foundry/outcom…
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
