AI agents inherit every vulnerability of the LLMs they're built on - but their autonomy, persistence, and access to tools create an entirely new attack surface: the information environmental itself.
The web pages, emails, APIs, and databases agents interact with can all be weaponised against them. We introduce a taxonomy of six classes of adversarial threats - from prompt injections hidden in web pages to systemic attacks on multi-agent networks.
I’m outlining the six categories of traps in the thread bellow1. Content Injection Traps (Perception): What a human sees on a web page is not what an agent parses. Attackers can embed malicious instructions in HTML comments, hidden CSS, image metadata, or accessibility tags. These are invisible to users, but processed directly by the agent.