AI Agent Security

Published

AI agent security is application security, network security PLUS controls for prompt injection (the lethal trifecta) and LLM fallibility (trusted/untrusted input).

Security risk grows proportionally to access. Since utility of AI agents increases with access, there’s unbounded security risk (people prefer more useful things than less) which places a constraint on AI agent usage (assuming one cares about security).

Here’s my simplified model of LLM security (still thinking about this though):

AI code sandboxing prevents mistakes

  • Deleting things
  • Installing malware
  • Reading sensitive files
  • Executing malicious or unintentionally harmful code

Agent harness prevents abuse

  • Accessing systems it should not
  • Taking sensitive actions without permission
  • Burning through excessive tokens

Credential vault prevents leakage

  • Credential exfiltration
  • Unauthorized access

MitM proxy decreases blast radius

  • Accessing malicious websites
  • Data exfiltration
  • Launching denial of service attacks