Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Sergio Paniego

@SergioPaniego

Jun 23 • 6 tweets • 3 min read • Read on X

Scrolly

ECHO (Environment Cross-entropy Hybrid Objective) demo support just landed in OpenEnv, and it's a cool idea: train agents to learn a world model almost for free

original paper by @VaishShrivas, Piero Kauffman, Ahmed Awadallah and @DimitrisPapail @MSFTResearch

when an agent acts in an env, a rollout has 2 sides: what the agent writes and what the env writes back

normal agent RL would only train on the agent's side

train a CLI agent with GRPO and the reward shapes the action tokens, while the env's responses get masked out of the loss

all that ground-truth about what actually happened gets thrown away

ECHO proposes using that part too instead of discarding it

on top of the usual RL loss on actions, it adds a small cross-entropy loss on the env's tokens, so the model also learns to predict what the env does

L = GRPO(actions) + λ · CE(observations)

and this is almost free: those tokens already passed through the same forward pass, the logits are already computed, so no extra rollout and no teacher model

you get a world model as a side effect, even failed rollouts turn into signal, and the gains are real:
up to 2.3x faster training and TerminalBench 2.0 pass@1 roughly doubles

to learn more about the idea check out the article by one of the paper's authors (@DimitrisPapail): x.com/DimitrisPapail…

concretely, OpenEnv now lets you tag, per token, what was an action vs an env observation, plus a world-model coefficient

it ships with two runnable demo examples

check them out here: github.com/huggingface/Op…

this brand new research already sits inside the open standard of OpenEnv!

additionally, check these other resources:
> PI blog: primeintellect.ai/blog/true-agen…
> ECHO in enterprise RL with Foundry (by @t2govind): devblogs.microsoft.com/foundry/outcom…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Sergio Paniego

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!