Tim Becker Profile picture
Security Researcher at @theori_io. Flag capturer at @PlaidCTF. Cryptography enjoyer.
Aug 14 6 tweets 2 min read
The previous thread glossed over how our LLM Agents actually work.

The truth is, it took us a long time to figure out how to get reliable and impressive results from agents.

By the end, we learned general strategies to build effective LLM agents, which we're now sharing. 🧵 Strategy #1: Decompose the task

LLM agents excel at tasks generally requiring human intuition to solve, but they can't yet solve arbitrarily complex multi-step tasks. If the task can be solved in multiple parts, you should decompose it as a workflow of multiple agents.
Aug 13 9 tweets 2 min read
@theori_io's AIxCC CRS has already found dozens of 0day vulnerabilities, and we've barely scratched the surface! The best part: it's open source, so there's no secrets to hide (at least in the AIxCC version 😉)!

So, how does our CRS actually find these 0days? 🧵 We start by passing every function in the source code into LLMs, asking them to consider a wide-range of vulnerability classes and explicitly accept/reject each class. We also run off-the-shelf static analyzers.

Combined, we end up with 10k+ candidate vulns for each project.