Latest Twitter Threads by @FranklinMatija on Thread Reader App

Mar 31 • 9 tweets • 2 min read

Excited about our new paper: AI Agent Traps

AI agents inherit every vulnerability of the LLMs they're built on - but their autonomy, persistence, and access to tools create an entirely new attack surface: the information environmental itself.

The web pages, emails, APIs, and databases agents interact with can all be weaponised against them. We introduce a taxonomy of six classes of adversarial threats - from prompt injections hidden in web pages to systemic attacks on multi-agent networks.

I’m outlining the six categories of traps in the thread bellow

1. Content Injection Traps (Perception): What a human sees on a web page is not what an agent parses. Attackers can embed malicious instructions in HTML comments, hidden CSS, image metadata, or accessibility tags. These are invisible to users, but processed directly by the agent.

Share this page!

Enter URL or ID to Unroll