LLM agents excel at tasks generally requiring human intuition to solve, but they can't yet solve arbitrarily complex multi-step tasks. If the task can be solved in multiple parts, you should decompose it as a workflow of multiple agents.
Aug 13 • 9 tweets • 2 min read
@theori_io's AIxCC CRS has already found dozens of 0day vulnerabilities, and we've barely scratched the surface! The best part: it's open source, so there's no secrets to hide (at least in the AIxCC version 😉)!
So, how does our CRS actually find these 0days? 🧵
We start by passing every function in the source code into LLMs, asking them to consider a wide-range of vulnerability classes and explicitly accept/reject each class. We also run off-the-shelf static analyzers.
Combined, we end up with 10k+ candidate vulns for each project.