AttnDiff is a lightweight model fingerprinting method for model similarity estimation. Instead of comparing hidden states, AttnDiff builds a fingerprint from head-level attention differences under paired prompts (e.g., original vs. corrupted).
Using uv (Recommended)
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install
git clone https://github.com/zhb0119/AttnDiff.git
cd AttnDiff
uv syncCompute fingerprints:
# Edit scripts/batch_compute.sh to configure models and device
bash scripts/batch_compute.shCompare fingerprints:
You can use pre-computed fingerprints provided in the repository:
uv run attndiff-compare \
--base output/comput_W/fingerprint_Llama-2-7B.json \
--dir output/comput_W \
--cka linearNote: The repository includes pre-computed fingerprints for several open-source models in
output/comput_W/. You can use these to quickly test the comparison functionality without computing fingerprints yourself.
AttnDiff/
├── src/attndiff/ # Package source code
│ ├── core/ # Core algorithms
│ ├── cli/ # CLI tools
│ └── utils/ # Utilities
├── tools/ # Model manipulation tools
│ ├── model-merging/ # Model merging tools
│ └── model-pruning/ # Model pruning tools
├── scripts/ # Batch processing scripts
├── tests/ # Unit tests
├── examples/ # Usage examples
├── dataset/ # Dataset directory
├── output/ # Output directory
│ ├── attention/ # Attention files
│ └── comput_W/ # Fingerprints
├── pyproject.toml # UV/pip configuration
└── README.md
Create dataset/dataset.json:
[
{
"id": 1,
"topic": "Mathematics",
"original": "...",
"corrupted": "..."
},
{
"id": 2,
"topic": "Programming",
"original": "...",
"corrupted": "..."
}
]Recommended: Use batch script
# Edit scripts/batch_compute.sh to configure model paths and device
bash scripts/batch_compute.shAdvanced: Manual computation
# From pre-extracted attention files
uv run attndiff-compute \
--original output/attention/model_att_origin.json \
--corrupted output/attention/model_att_perturb.json \
--mode diff \
--out output/comput_W/fingerprint_model.json
# Or let the tool auto-extract attentions from model
uv run attndiff-compute \
--model_name /path/to/your/model \
--attn_device cuda:0 \
--mode diff \
--out output/comput_W/fingerprint_your_model.jsonArguments:
| Argument | Description |
|---|---|
--model_name |
Model name or local path |
--original |
Path to original attention JSON |
--corrupted |
Path to corrupted attention JSON |
--mode |
diff (default), orig, or base |
--attn_device |
Device for attention extraction (e.g., cuda:0) |
--out |
Output fingerprint path |
# Compare all fingerprints in directory
uv run attndiff-compare \
--base output/comput_W/fingerprint_base.json \
--dir output/comput_W \
--cka linear
# Compare specific layer
uv run attndiff-compare \
--base output/comput_W/fingerprint_Llama-2-7B.json \
--dir output/comput_W \
--cka linear \
--layer 1Arguments:
| Argument | Description |
|---|---|
--base |
Base fingerprint JSON (required) |
--dir |
Directory containing fingerprints |
--cka |
CKA type (linear) |
--layer |
Compare specific layer (1-based, optional) |
AttnDiff has been systematically evaluated across multiple model manipulation dimensions to assess its robustness and effectiveness in model similarity estimation.
| Category | Type | Description | Methods/Variants |
|---|---|---|---|
| Fine-tuning | Instruction | Instruction-tuned models | SFT, instruction alignment |
| Preference Opt. | PPO/DPO | Preference optimization | PPO-LoRA, DPO fine-tuning |
| Model Merging | Weight | Linear interpolation | Weight averaging, SLERP |
| Distribution | Behavior-based merging | Task vectors, model soups | |
| Mergekit Strategies | Eight merging methods | Breadcrumbs, Ties, Della, Task, DARE+Ties, DARE+Task | |
| Model Pruning | Structured | Layer/head removal | Sheared models, layer pruning |
| Unstructured | Weight sparsification | Sparse models, magnitude pruning | |
| LLM-Pruner | Importance-based | Random, L1 norm, Taylor importance | |
| Model Distillation | Reasoning | Knowledge distillation | Open-R1, DeepSeek-R1 |
| Logit-based | Output matching | Watermark distillation | |
| Quantization | GPTQ | Post-training quantization | Int4, Int8 compression |
| Cross-Family | Architecture | Different model families | Llama, Qwen, Gemma, Mistral |
| Scale | Model sizes | 1.5B, 2B, 7B, 8B, 14B parameters | |
| MoE | Mixtral | Mixture of Experts | 8x7B sparse models |
The following table categorizes all models used in experiments by their manipulation type:
| Category | Type | Base Model | Derivative Models |
|---|---|---|---|
| Fine-tuning | Instruction | Llama-2-7B | Llama-2-finance-7b, Vicuna-1.5-7b, WizardMath-7b, Chinese-LLaMA-2-7b, CodeLLaMA-7b, Llemma-7b |
| Preference Opt. | PPO/DPO | Llama-2-7B | llama-2-7b-ppo-v0.1-reward, llama-2-7b-ppo-lora-v0.1, tulu-2-dpo-7b, llama2-7b-dpo |
| Model Merging | Weight | Shisa-gamma-7b-v1, WizardMath-7b-1.1, Abel-7b-002 | Evollm-jp-7b |
| Distribution | Llama-2-7B, OpenLLaMA-2-7b, mpt-7b | Fusellm-7b | |
| Model Pruning | Structured | Llama-2-7B | Sheared-llama-1.3b, Sheared-llama-1.3b-pruned, Sheared-llama-1.3b-sharegpt, Sheared-llama-2.7b, Sheared-llama-2.7b-pruned |
| Unstructured | Llama-2-7B | Sparse-llama-2-7b, Wanda-llama-2-7b, GBLM-llama-2-7b | |
| Ablation | Related | Llama-2-7B | CodeLlama-7b, Llama-2-finance-7B, Vicuna-7B-v1.5, Chinese-LLama-2-7B, WizardMath-7B-V1.0, llemma_7b, Sheared-LLaMA-1.3B, Sheared-LLaMA-1.3B-Pruned, Sheared-LLaMA-1.3B-ShareGPT, Sheared-LLaMA-2.7B, Sheared-LLaMA-2.7B-Pruned, Sheared-LLaMA-2.7B-ShareGPT, Sparse-llama-2-7b, Wanda-llama-2-7b, GBLM-llama-2-7b |
| Unrelated | Llama-2-7B | Llama3-8B, mpt-7b, Qwen2.5-1.5B, Qwen2.5-3B, Qwen2.5-7B, Qwen2.5-14B, Qwen2.5-Math-7B, gemma-2-2b, Gemma-7B-it, Yi-6B | |
| Pilot | Discovery | Llama-2-7B | Llama-2-7B, CodeLlama-7b-hf, WizardMath-7B-V1.0, llemma_7b, Qwen2.5-7B |
| Model Distillation | Reasoning | Llama-3.1-8B, Qwen2.5-7B, Qwen2.5-14B | Llama-3.1-8B-Instruct-Open-R1-Distill, Qwen2.5-7B-Open-R1-Distill, DeepSeek-R1-Distill-Qwen-14B |
| Logit-based | Llama-2-7B | llama-2-7b-logit-watermark-distill-kgw-k1-gamma0.25-delta2 | |
| MoE | Mixtral | Mixtral-8x7B | Instruct_Mixtral-8x7B-v0.1_Dolly15K, Nous-Hermes-2-Mixtral-8x7B-DPO, openbuddy-mixtral-8x7b-v15.4 |
| Cross-Family | Qwen2.5 | Qwen2.5-7B | Qwen2.5-Coder-1.5B, Qwen2.5-Math-1.5B, Qwen2.5-1.5B-Instruct |
| Qwen2.5-14B | Qwen2.5-14B-Instruct, oxy-1-small, Qwen2.5-14B-Gutenberg-Instruct-Slerpeno | ||
| Gemma-2 | gemma-2-2b | gemma-2-2b-neogenes-ita, gemma-2-baku-2b, gemma2-2b-merged | |
| Mistral | Mistral-7B-v0.3 | AQUA-7B, spellcheck-mistral-7b, Mistral-7B-Instruct-demi-merge-v0.3-7B | |
| Quantization | GPTQ | Qwen2.5-7B | Qwen2.5-7B-Instruct-GPTQ-Int8, Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4 |
| Llama-2-7B | TheBloke/Llama-2-7B-Chat-GPTQ | ||
| Llama-3.1-8B | iqbalamo93/Meta-Llama-3.1-8B-Instruct-GPTQ-Q_8, DaraV/LLaMA-3.1-8B-Instruct-INT4-GPTQ | ||
| Mistral-7B-v0.3 | RedHatAI/Mistral-7B-Instruct-v0.3-GPTQ-4bit |
The following table provides Hugging Face repository links for key models used in experiments:
If you use AttnDiff in your research, please cite:
@misc{zhang2026attndiffattentionbaseddifferentialfingerprinting,
title={AttnDiff: Attention-based Differential Fingerprinting for Large Language Models},
author={Haobo Zhang and Zhenhua Xu and Junxian Li and Shangfeng Sheng and Dezhang Kong and Meng Han},
year={2026},
eprint={2604.05502},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2604.05502},
}Contributions are welcome! Please open an issue or pull request on GitHub.
This project is licensed under the MIT License - see the LICENSE file for details.
Thank you for your interest in AttnDiff! We hope this tool aids your research.
- Built with PyTorch
- Uses Hugging Face Transformers
- Package management with UV

