While large language models are already useful for certain text based tasks, connecting them to other systems that can interact with the outside world poses new kinds of security challenges. Because it’s all based on natural language, any text can effectively become untrusted code.
Some examples:
- Adding a prompt injection attack on your public website that can be accessed by an LLM enabled tool like Bing search
- LLM-enabled personal assistant that can read your email might be prompt injected simply by sending them an email
- Data could be exfiltrated from a support ticketing system by sending a prompt injected message
- Training data might be poisoned by including prompt injected text that via hidden text
It’s unclear what the solution to these problems are. You could chain together more AI tools to detect prompt injection attacks. You could build protections into the prompt used internally. You could warn the user or log a message for every action taken and use anomaly detection.