Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

🇺🇦 Dzmitry Bahdanau

@DBahdanau

Oct 16, 2024 • 9 tweets • 3 min read • Read on X

Scrolly

🚨 New agent framework! 🚨

My team at @ServiceNowRSRCH is releasing TapeAgents: a holistic framework for agent development and optimization. At its core is the tape: a structured agent log.

Repo: github.com/ServiceNow/Tap…
Paper: servicenow.com/research/TapeA…

Why you should care: 🧵

When you build an agent, you want components but also fine-grained control, you want step-by-step debugging. When you serve, you want resumable sessions and streaming. When you optimize, you want structured logs, agent configs and finetuning support.

Tapes give you all of that!

A tape is a granular, structured log of the agent session. Everything goes through the tape in TapeAgents ⬇️

Agents read the tape, reason, and write to the tape. The environment executes the actions from the tape and writes observations to the tape. Apps use the tape as session states. Algorithms use tapes to update agent prompts. Agents also produce finetuning data from tapes.

Start your TapeAgents journey with our examples:

- Intro notebook
- QA agent for GAIA
- Web agent for WorkArena
- AutoGen-style data science agent team
- DSPy-style prompt tuning
- Two agent distillation examples

Also don't miss our tooling (see image).

More examples coming!

We know you've heard of many other great frameworks. How's TapeAgents different?

We've compared TapeAgents to LangGraph, DSPy and AutoGen (see below). TapeAgents is unique in targeting both the needs of development and data-driven agent optimization.

The nicest thing about TapeAgents is that we got rid of the obscure state of the agentic system, that logs give limited insight into. We made the log the state! Every step in the log is signed by the agent's component that made it. This is perfect for auditing and debugging.

TapeAgents's is still an experimental framework. We release TapeAgents to share our ideas and solicit your feedback. Please contact @DBahdanau , @ollmer or @JordanPrinceT with any questions or suggestions.

Last but not least: the tech report describes in detail how we trained a pleasant cost-efficient form-filling assistant on synthetic data. Results speak for themselves ⬇️

We hope you will also use TapeAgents to build effective solutions with small models that uses less watts!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @DBahdanau

🇺🇦 Dzmitry Bahdanau

@DBahdanau

Apr 26, 2025

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference!

Code: github.com/ServiceNow/Pip…
Blog: huggingface.co/blog/ServiceNo…

In-flight weight updates are a spicy way to do RL inference at a constant batch size, bounded just by your GPU memory. So your GPU utilization gets higher. And your tokens are more on-policy.

Why spicy? Because inference continues with stale KV cache from an older model. 😱😱😱

But rest assured. In-flight weight updates cause no harm. We get great math reasoning results with simplified GRPO. Start from the base model, no value function, no KL, no entropy bonus, no overlong filtering, no trust region clamping, binary rewars. It just works!

Read 6 tweets

🇺🇦 Dzmitry Bahdanau

@DBahdanau

Feb 2, 2022

@DeepMind

I spent 1000s of hours on competitive programming (proof-link: codeforces.com/profile/rizar). This makes me qualified to comment on #AlphaCode by @DeepMind

The result is nice, the benchmark will be useful, some ideas are novel. But human level is still light years away.

1/n

The system ranks behind 54.3% participants. Note that many participants are high-school or college students who are just honing their problem-solving skills. Most people reading this could easily train to outperform #AlphaCode, especially if time pressure is removed...

Limited time (e.g. 3 hours to solve 6 problems) is a key difficulty in comp. programming. The baseline human is very constrained in this model-vs-human comparison. For #AlphaCode the pretraining data, the fine-tuning data, the model size, the sampling - all was nearly maxed out.

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

🇺🇦 Dzmitry Bahdanau

Try unrolling a thread yourself!

More from @DBahdanau

🇺🇦 Dzmitry Bahdanau

🇺🇦 Dzmitry Bahdanau

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!