Figure 1: The SECURE framework decomposes parameter updates into safety-aligned (daligned) and orthogonal components, suppressing harmful updates via subspace regularization.
Key Idea:
SECURE leverages the alignment direction (weight difference between safety-aligned and base models) as an anchor. By decomposing parameter updates and constraining orthogonal components through a novel regularization term, it ensures fine-tuning remains within the "narrow safety basin", achieving both strong safety preservation and task performance.
# Create conda environment
conda create -n SECURE python=3.9
conda activate SECURE
cd SECURE
# Install dependencies
pip install -r requirements.txt# Create model storage directory (if needed)
mkdir -p ckpts/| Model | HuggingFace Link | Notes |
|---|---|---|
| Llama-2-7B-Chat | TheBloke/Llama-2-7B-Chat-fp16 | Safety-aligned model |
| Llama-2-7B-base | meta-llama/Llama-2-7b-hf | Base model |
| Beaver-Dam-7B | PKU-Alignment/beaver-dam-7b | Safety evaluation model |
Note: Download the models listed in the table above to the ckpts/ folder.
SECURE/
├── ckpts/
│ ├── Llama-2-7B-Chat-fp16/
│ ├── Llama-2-7b-hf/
│ └── beaver-dam-7b/
├── configs/
├── ft_datasets/
└── ... (other project folders)- Llama-2 models require access approval on HuggingFace
- All models should be placed under ckpts/
- Use exact folder names as shown above
Training scripts are organized by dataset under scripts/, supporting:
Agnews, Alpaca, GSM8K, SST2
# For Agnews dataset (default 1k_p_0.1 mode)
bash scripts/agnews/SECURE_reg1_p_0.1.sh > finetuned_logs/agnews/SECURE_reg1_p_0.1.log 2>&1 &
# Other datasets
bash scripts/alpaca/SECURE_reg1_p_0.1.sh > finetuned_logs/alpaca/SECURE_reg1_p_0.1.log 2>&1 &
bash scripts/gsm8k/SECURE_reg1_p_0.1.sh > finetuned_logs/gsm8k/SECURE_reg1_p_0.1.log 2>&1 &
bash scripts/SST2/SECURE_reg1_p_0.1.sh > finetuned_logs/SST2/SECURE_reg1_p_0.1.log 2>&1 &Configure training via --mode parameter:
Note: You can modify the
--modeparameter in the.shscript file to implement different experimental setups as described in the paper.
| Mode | Description |
|---|---|
1k_p_0 |
1k samples, 0% harmful data |
1k_p_0.05 |
1k samples, 5% harmful data |
1k_p_0.1 |
1k samples, 10% harmful data (default) |
1k_p_0.15 |
1k samples, 15% harmful data |
1k_p_0.2 |
1k samples, 20% harmful data |
0.5k_p_0.1 |
500 samples, 10% harmful data |
1.5k_p_0.1 |
1500 samples, 10% harmful data |
2k_p_0.1 |
2000 samples, 10% harmful data |
2.5k_p_0.1 |
2500 samples, 10% harmful data |
cd evaluation/poison_evaluation
# Run for Agnews
bash scripts/agnews/eval_agnews.sh > scripts/agnews/eval_agnews.log 2>&1 &
# Other datasets
bash scripts/alpaca/eval_alpaca.sh > scripts/alpaca/eval_alpaca.log 2>&1 &
bash scripts/gsm8k/eval_gsm8k.sh > scripts/gsm8k/eval_gsm8k.log 2>&1 &
bash scripts/SST2/eval_SST2.sh > scripts/SST2/eval_SST2.log 2>&1 &# For Agnews
cd evaluation/utility_evaluation/agnews
bash scripts/eval.sh > scripts/eval.log 2>&1 &
# For GSM8K/SST2
cd ../gsm8k && bash scripts/eval.sh
cd ../SST2 && bash scripts/eval.sh
# Alpaca requires LLM-Judge
cd ../alpaca
# Follow instructions in the directory's README.mdSECURE/
├── ckpts/ # Model checkpoints
├── configs/ # Training configurations
├── evaluation/
│ ├── poison_evaluation/ # Safety assessment scripts
│ └── utility_evaluation/ # Task performance evaluation
├── finetuned_logs/ # Training logs
├── finetuned_models/ # Fine-tuned model outputs
├── ft_datasets/ # Processed datasets
├── images/ # Figures for documentation
├── scripts/
│ ├── agnews/ # Dataset-specific scripts
│ ├── alpaca/
│ ├── gsm8k/
│ └── SST2/
├── utils/ # Utility functions
├── LICENSE
└── requirements.txtThis repository is built upon the following open-source projects:
- LLMs-Finetuning-Safety
- SafeLoRA
- Booster
- llm-landscape (for safety landscape visualization)
We sincerely thank the authors of these projects for their foundational contributions. Their work provided critical inspiration and technical references for this research. Special thanks to the LLM safety community for driving innovation in this field.
