DiffuGuard

Official repo for our paper “DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models”.

Initial Setup

Create a virtual environment and install dependencies.

conda create -n diffuguard python==3.11
conda activate diffuguard
pip install -r requirements.txt

Download dLLMs locally (recommended for stability) and keep paths handy.

bash hf_models/model_download.sh

Optionally set evaluator credentials if you plan to compute ASR with the OpenAI scripts.

# for analysis/evaluation.py
export OPENAI_API_KEY=your_key
export OPENAI_BASE_URL=your_url

Core Experiments (Analysis)

Logits heatmap (Figure 2):
- python analysis/heatmap.py (edit MODEL_PATH and dataset in the file if needed)
Random remasking (Figure 3):
- bash analysis/exp_remask_randomness.sh
Token injection (Figures 4 & 5):
- bash analysis/exp_token_injection.sh
Batch evaluation of results:
- bash analysis/eval.sh

Quick Start

Run DiffuGuard on LLaDA with PAD attack, hidden audit, and repair:

python models/jailbreakbench_llada.py \
  --model_path hf_models/LLaDA-8B-Instruct \
  --attack_method PAD \
  --attack_prompt path/to/prompts.json \
  --output_json out_llada.json \
  --steps 64 --gen_length 128 --block_length 128 \
  --sp_mode hidden --sp_threshold 0.35 \
  --refinement_steps 8 --remask_ratio 0.9

Dream and MMaDA runners have similar usage:

python models/jailbreakbench_dream.py --model_path hf_models/Dream-v0-Instruct-7B --attack_method pad --attack_prompt path/to/prompts.json --output_json out_dream.json --gen_length 128 --steps 64 --mask_counts 36 --sp_mode hidden --sp_threshold 0.35 --refinement_steps 8 --remask_ratio 0.9
python models/jailbreakbench_mmada.py --model_path hf_models/MMaDA-8B-MixCoT --attack_method PAD --attack_prompt path/to/prompts.json --output_json out_mmada.json --steps 64 --sp_mode hidden --sp_threshold 0.35 --refinement_steps 8 --remask_ratio 0.9

Key Hyperparameters

steps, gen_length, block_length: diffusion steps and decoding span
remasking: off | low_confidence | adaptive_step (annealed randomness)
sp_mode: off | hidden; sp_threshold
refinement_steps (e.g., 8), remask_ratio (e.g., 0.9)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
analysis		analysis
hf_models		hf_models
models		models
refine_prompt		refine_prompt
utility		utility
.gitignore		.gitignore
README.md		README.md
defense_utils.py		defense_utils.py
evaluation.py		evaluation.py
requirements.txt		requirements.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffuGuard

Initial Setup

Core Experiments (Analysis)

Quick Start

Key Hyperparameters

About

Uh oh!

Releases

Packages

Languages

niez233/DiffuGuard

Folders and files

Latest commit

History

Repository files navigation

DiffuGuard

Initial Setup

Core Experiments (Analysis)

Quick Start

Key Hyperparameters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages