Injecting Universal Jailbreak Backdoors into LLMs in Minutes

Zhuowei Chen, Qiannan Zhang, and Shichao Pei*

Guangdong Univerisity of Foreign Studies & Cornell & UMass Boston

Official repository for the paper Injecting Universal Jailbreak Backdoors into LLMs in Minutes.

Abstract

Jailbreak backdoor attacks on LLMs have garnered attention for their effectiveness and stealth. However, existing methods rely on the crafting of poisoned datasets and the time-consuming process of fine-tuning. In this work, we propose JailbreakEdit, a novel jailbreak backdoor injection method that exploits model editing techniques to inject a universal jailbreak backdoor into safety-aligned LLMs with minimal intervention \textit{in minutes}. JailbreakEdit integrates a multi-node target estimation to estimate the jailbreak space, thus creating shortcuts from the backdoor to this estimated jailbreak space that induce jailbreak actions. Our attack effectively shifts the models' attention by attaching strong semantics to the backdoor, enabling it to bypass internal safety mechanisms. Experimental results show that JailbreakEdit achieves a high jailbreak success rate on jailbreak prompts while preserving generation quality, and safe performance on normal queries. Our findings underscore the effectiveness, stealthiness, and explainability of JailbreakEdit, emphasizing the need for more advanced defense mechanisms in LLMs.

Reproduction Tips:

Please remember to fill in your huggingface access token, or adjusted the model loading function.
We have uploaded a debugging-1b.zip, which contains a Jailbreaking implementation on Llama-1B-Instruct. For your convenience to debug.
Our implementation are based on https://github.com/zjunlp/EasyEdit, please consider to use the same environment setup. Thanks! ZJUNLP.

Citation

@inproceedings{
chen2025injecting,
title={Injecting Universal Jailbreak Backdoors into {LLM}s in Minutes},
author={Zhuowei Chen and Qiannan Zhang and Shichao Pei},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=aSy2nYwiZ2}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
MyDatasets		MyDatasets
easyeditor		easyeditor
generation_examples		generation_examples
hparams		hparams
README.md		README.md
debugging-1b.zip		debugging-1b.zip
exp.sh		exp.sh
jailbreakEdit.py		jailbreakEdit.py
mmlu.py		mmlu.py
overview.jpg		overview.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Injecting Universal Jailbreak Backdoors into LLMs in Minutes

Abstract

Reproduction Tips:

Citation

About

Uh oh!

Releases

Packages

Languages

JohnnyChanV/JailbreakEdit

Folders and files

Latest commit

History

Repository files navigation

Injecting Universal Jailbreak Backdoors into LLMs in Minutes

Abstract

Reproduction Tips:

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages