Edward Grefenstette Profile picture
Nov 19 11 tweets 2 min read Read on X
🌶️(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A 🧵... [1/11]
Setting aside the fact that people are literally re-inventing a bunch of terminology around agents when we have several decades of framing from RL (POMDPs, anyone?) that we can rely on to describe what we're doing, the main challenges are as follows, working backwards. [2/11]
1. Can we determine what capabilities and downstream use cases we care about? This, as a whole, is a fairly open-ended problem, and the use cases etc can individually be open-ended/underspecified, but having some notion of what we're building towards is a good start. [3/11]
1. (continued) You'd be surprised about how many people are into bUiLdInG aGeNtS without thinking precisely about this, which is why a lot of agent building startups will burn through VC cash trying to find PMF and fail. [4/11]
2. If we know what to build, can we define and measure success? We completely lack evaluation measures for complex agentic behavior, which limits what we can develop agentic behavior towards. [5/11]
3. If you know how to evaluate agents, can we produce semi-passable automatic evaluations? Can we ensure these stay roughly aligned with human judgements (or other measures of extrinsic utility) as use behaviors shift? Can we run these cheaply and at scale? [6/11]
4. If we have automatic evaluations, can we generate/acquire data to run agents over in a scalable manner, in order to automate the improvement loop? [7/11]
4. (continued) Can we ensure that such data is a good distributional proxy for data observed at deployment time, or at least covers that distribution? Approaches like self-play/unsupervised environment design can be explored here, in addition to varieties of RL(HF/AIF/etc) [8/11]
If the answer to all the above is yes, then we're in a good position to build better agents, and what that entails will fall on a spectrum between "Everything works with a little prompt engineering, RL, and distillation" to "Nothing works and we need to do novel science". [9/11]
For commercial purposes, we hope for the former, but as researchers we relish the latter. Unfortunately, the number of people wanting to invest time in building the top of this pipeline is much lower than the number who want to harvest low hanging fruit at the bottom. [10/11]
As a closing note, I'm always surprised at how it's only popped up as a hot button topic in the last year or so (for LLMs, not RL). I thought it was the entire point of the LaMDA tech report from early 2022. Three years is a long time in ML these days. [11/11]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Edward Grefenstette

Edward Grefenstette Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @egrefen

Dec 29, 2020
2020 retrospective time. It's been a rubbish year for productivity, but I've been privileged to have that being the main impact of the pandemic in my life. I feel sorrow and sympathy that not everyone can say the same…

Here's some highlights and related thanks.
[1/14]
It seems like the work was done years and years ago, but happy to have been able to revisit our work on RTFM with @hllo_wrld when he presented it at @iclr_conf 2020.
arxiv.org/abs/1910.08210 [2/14]
I've been thinking a lot about this work recently, esp. the fascinating ML problems that emerge when you want to solve it without generating doc/env variants. Ongoing work on this with @AmartyaSanyal+@CdrGeo who I had the pleasure of remotely hosting as interns this year. [3/14]
Read 16 tweets
Dec 31, 2019
To end 2019, I want to express thanks for all the great people I've worked with this year. Naturally, this is an opportunity to brag a little, but all this work is due thanks to leadership/effort from fantastic interns, collaborators, etc, so hopefully you will indulge me (1/16)
It's been a delightfully short working year, thanks to @Facebook's generous paternity leave (4mo) thanks to which I spent (along with PTO) over 40% of the year at home with my infant daughter. Both practically, and in terms of morale, this has been a huge help. (2/16)
I dream of a world where this sort of support is offered by default in society, rather than to tech workers thanks to corporate profits. I'd personally happily pay more tax towards this, as should companies. (3/16)
Read 19 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(