Sergio Paniego Profile picture
Jun 23 6 tweets 3 min read Read on X
ECHO (Environment Cross-entropy Hybrid Objective) demo support just landed in OpenEnv, and it's a cool idea: train agents to learn a world model almost for free

original paper by @VaishShrivas, Piero Kauffman, Ahmed Awadallah and @DimitrisPapail @MSFTResearch

when an agent acts in an env, a rollout has 2 sides: what the agent writes and what the env writes backImage
normal agent RL would only train on the agent's side

train a CLI agent with GRPO and the reward shapes the action tokens, while the env's responses get masked out of the loss

all that ground-truth about what actually happened gets thrown away
ECHO proposes using that part too instead of discarding it

on top of the usual RL loss on actions, it adds a small cross-entropy loss on the env's tokens, so the model also learns to predict what the env does

L = GRPO(actions) + λ · CE(observations)
and this is almost free: those tokens already passed through the same forward pass, the logits are already computed, so no extra rollout and no teacher model

you get a world model as a side effect, even failed rollouts turn into signal, and the gains are real:
up to 2.3x faster training and TerminalBench 2.0 pass@1 roughly doubles

to learn more about the idea check out the article by one of the paper's authors (@DimitrisPapail): x.com/DimitrisPapail…Image
concretely, OpenEnv now lets you tag, per token, what was an action vs an env observation, plus a world-model coefficient

it ships with two runnable demo examples

check them out here: github.com/huggingface/Op…Image
this brand new research already sits inside the open standard of OpenEnv!

additionally, check these other resources:
> PI blog: primeintellect.ai/blog/true-agen…
> ECHO in enterprise RL with Foundry (by @t2govind): devblogs.microsoft.com/foundry/outcom…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sergio Paniego

Sergio Paniego Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(