LLM Jailbreaking and Red Teaming

The session explains current LLM jailbreak techniques, including prompt templates, adversarial suffixes, and using other models, plus how providers conduct red‑team testing.

Overview

LLMs are very exciting but even the most recent ones have plenty of vulnerabilities that can be exploited by attackers. I will provide a quick introduction to state-of-the-art jailbreaking methods for LLMs. I will show how to jailbreak LLMs using prompt templates, adversarial suffixes, or even other LLMs. Finally, I will briefly discuss how external red teaming—i.e., the process of finding all sorts of vulnerabilities—is typically organized by major LLM providers such as OpenAI.

Tech stack