🇺🇦 Dzmitry Bahdanau's Threads

Apr 26 • 6 tweets • 2 min read

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference!

Code: github.com/ServiceNow/Pip…
Blog: huggingface.co/blog/ServiceNo…

In-flight weight updates are a spicy way to do RL inference at a constant batch size, bounded just by your GPU memory. So your GPU utilization gets higher. And your tokens are more on-policy.

Why spicy? Because inference continues with stale KV cache from an older model. 😱😱😱

Oct 16, 2024 • 9 tweets • 3 min read

🚨 New agent framework! 🚨

My team at @ServiceNowRSRCH is releasing TapeAgents: a holistic framework for agent development and optimization. At its core is the tape: a structured agent log.

Repo: github.com/ServiceNow/Tap…
Paper: servicenow.com/research/TapeA…

Why you should care: 🧵

When you build an agent, you want components but also fine-grained control, you want step-by-step debugging. When you serve, you want resumable sessions and streaming. When you optimize, you want structured logs, agent configs and finetuning support.

Tapes give you all of that!

Feb 2, 2022 • 10 tweets • 4 min read

I spent 1000s of hours on competitive programming (proof-link: codeforces.com/profile/rizar). This makes me qualified to comment on #AlphaCode by @DeepMind

The result is nice, the benchmark will be useful, some ideas are novel. But human level is still light years away.

1/n The system ranks behind 54.3% participants. Note that many participants are high-school or college students who are just honing their problem-solving skills. Most people reading this could easily train to outperform #AlphaCode, especially if time pressure is removed...

Share this page!

Enter URL or ID to Unroll