Latest Twitter Threads by @tjbecker_ on Thread Reader App

Aug 14 • 6 tweets • 2 min read

The previous thread glossed over how our LLM Agents actually work.

The truth is, it took us a long time to figure out how to get reliable and impressive results from agents.

By the end, we learned general strategies to build effective LLM agents, which we're now sharing. 🧵

https://twitter.com/tjbecker_/status/1955678204498440292

Strategy #1: Decompose the task

LLM agents excel at tasks generally requiring human intuition to solve, but they can't yet solve arbitrarily complex multi-step tasks. If the task can be solved in multiple parts, you should decompose it as a workflow of multiple agents.

Aug 13 • 9 tweets • 2 min read

@theori_io's AIxCC CRS has already found dozens of 0day vulnerabilities, and we've barely scratched the surface! The best part: it's open source, so there's no secrets to hide (at least in the AIxCC version 😉)!

So, how does our CRS actually find these 0days? 🧵 We start by passing every function in the source code into LLMs, asking them to consider a wide-range of vulnerability classes and explicitly accept/reject each class. We also run off-the-shelf static analyzers.

Combined, we end up with 10k+ candidate vulns for each project.

Share this page!

Enter URL or ID to Unroll