Official repo for our paper “DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models”.
Create a virtual environment and install dependencies.
conda create -n diffuguard python==3.11
conda activate diffuguard
pip install -r requirements.txtDownload dLLMs locally (recommended for stability) and keep paths handy.
bash hf_models/model_download.shOptionally set evaluator credentials if you plan to compute ASR with the OpenAI scripts.
# for analysis/evaluation.py
export OPENAI_API_KEY=your_key
export OPENAI_BASE_URL=your_url- Logits heatmap (Figure 2):
python analysis/heatmap.py(editMODEL_PATHand dataset in the file if needed)
- Random remasking (Figure 3):
bash analysis/exp_remask_randomness.sh
- Token injection (Figures 4 & 5):
bash analysis/exp_token_injection.sh
- Batch evaluation of results:
bash analysis/eval.sh
Run DiffuGuard on LLaDA with PAD attack, hidden audit, and repair:
python models/jailbreakbench_llada.py \
--model_path hf_models/LLaDA-8B-Instruct \
--attack_method PAD \
--attack_prompt path/to/prompts.json \
--output_json out_llada.json \
--steps 64 --gen_length 128 --block_length 128 \
--sp_mode hidden --sp_threshold 0.35 \
--refinement_steps 8 --remask_ratio 0.9Dream and MMaDA runners have similar usage:
python models/jailbreakbench_dream.py --model_path hf_models/Dream-v0-Instruct-7B --attack_method pad --attack_prompt path/to/prompts.json --output_json out_dream.json --gen_length 128 --steps 64 --mask_counts 36 --sp_mode hidden --sp_threshold 0.35 --refinement_steps 8 --remask_ratio 0.9
python models/jailbreakbench_mmada.py --model_path hf_models/MMaDA-8B-MixCoT --attack_method PAD --attack_prompt path/to/prompts.json --output_json out_mmada.json --steps 64 --sp_mode hidden --sp_threshold 0.35 --refinement_steps 8 --remask_ratio 0.9steps,gen_length,block_length: diffusion steps and decoding spanremasking:off | low_confidence | adaptive_step(annealed randomness)sp_mode:off | hidden;sp_thresholdrefinement_steps(e.g., 8),remask_ratio(e.g., 0.9)