When giving LLMs the capability to access private data, view untrusted content, and externally communicate, bad actors can trick AI agents into leaking private data via prompt injection.
Read: The lethal trifecta for AI agents: private data, untrusted content, and external communication by Simon Willison.
Links to this note
-
An increasingly important part of the AI stack is running untrusted code in a sandbox.
-
AI agent security is application security, network security PLUS controls for prompt injection (the lethal trifecta) and LLM fallibility (trusted/untrusted input).
-
AI agents are able to do good things and bad things. Preventing bad things is difficult. The universe of bad things grows in proportion to the access and capabilities LLMs have.