Lessons on securing AI agents based on Gandalf
- julioapoveda
- 10 hours ago
- 1 min read
Have you heard about Gandalf (from Lakera) before?
This is an interactive game for people to learn about attacks to AI systems like prompt injection.
For the UMD security reading group, I read a paper titled Gandalf the Red: Adaptive Security for LLMs that delves more into the design decisions of how Gandalf was made, and presents some analysis based on the data Lakera gathered from users trying to make Gandalf reveal the secret password.
These are the slides I presented.
Securing AI agents requires a layered approach and understanding a motivated adversary adapts their strategy based on the feedback they obtain from the system they are attacking.


Comments