Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
LLM Jailbreaking and Red Teaming
The session explains current LLM jailbreak techniques, including prompt templates, adversarial suffixes, and using other models, plus how providers conduct red‑team testing.
LLMs are very exciting but even the most recent ones have plenty of vulnerabilities that can be exploited by attackers. I will provide a quick introduction to state-of-the-art jailbreaking methods for LLMs. I will show how to jailbreak LLMs using prompt templates, adversarial suffixes, or even other LLMs. Finally, I will briefly discuss how external red teaming—i.e., the process of finding all sorts of vulnerabilities—is typically organized by major LLM providers such as OpenAI.