RAG Hacking and System Prompts

This talk demonstrates how retrieval-augmented generation (RAG) models can be manipulated with false information and explores system prompt techniques to mitigate such attacks.

Overview

We recently published our work on RAG security on arxiv: https://arxiv.org/abs/2505.08728
Based on this, we organized a workshop at SDS25: “Hacking RAG: Exploring Risks and Implementing Mitigations” https://sds2025.ch/program/

I will show how to convince our RAG that “Squirrels lay eggs” and how adding system prompts may prevent some attacks.

Links

https://preview--knowledge-corruption-demo.lovable.app/
Demonstrates RAG pipeline data corruption attack using false knowledge injection.
https://sds2025.ch/program/
SDS2025 schedules technical workshops and conference sessions at Zurich Convention Center June 26-27.
https://arxiv.org/abs/2505.08728

Tech stack