AI Agent Security

AI agent security is application security, network security PLUS controls for prompt injection (the lethal trifecta) and LLM fallibility (trusted/untrusted input).

Security risk grows proportionally to access. Since utility of AI agents increases with access, there’s unbounded security risk (people prefer more useful things than less) which places a constraint on AI agent usage (assuming one cares about security).

Here’s my simplified model of LLM security (still thinking about this though):

AI code sandboxing prevents mistakes

Deleting things
Installing malware
Reading sensitive files
Executing malicious or unintentionally harmful code

Agent harness prevents abuse

Accessing systems it should not
Taking sensitive actions without permission
Burning through excessive tokens

Credential vault prevents leakage

Credential exfiltration
Unauthorized access

MitM proxy decreases blast radius

Accessing malicious websites
Data exfiltration
Launching denial of service attacks