AI agent security is application security, network security PLUS controls for prompt injection (the lethal trifecta) and LLM fallibility (trusted/untrusted input).
Security risk grows proportionally to access. Since utility of AI agents increases with access, there’s unbounded security risk (people prefer more useful things than less) which places a constraint on AI agent usage (assuming one cares about security).
Here’s my simplified model of LLM security (still thinking about this though):
AI code sandboxing prevents mistakes
- Deleting things
- Installing malware
- Reading sensitive files
- Executing malicious or unintentionally harmful code
Agent harness prevents abuse
- Accessing systems it should not
- Taking sensitive actions without permission
- Burning through excessive tokens
Credential vault prevents leakage
- Credential exfiltration
- Unauthorized access
MitM proxy decreases blast radius
- Accessing malicious websites
- Data exfiltration
- Launching denial of service attacks