English | 中文
2-GAO: Spatial-Semantic Consistent Industrial Defect Synthesis via Training-Free Latent Optimization
Tianwei Mu, Guangzhou Institute of Industrial Intelligence
This system is a contrastive learning defect generation tool based on Stable Diffusion. It generates defects on defect-free images guided by defective samples, leveraging attention mechanism optimization and feature alignment.
- [2026-02-09] A complete and reproducible program is now uploaded, containing core, batch process, ablation study.
- [2025-08-21] 2-gao core code is released and results.
Results:
Link to original image data (just place it in the corresponding folder)
- MVTec-AD dataset: Google Drive
- VISA dataset: Google Drive
- Concrete crack dataset: Google Drive
Paper
- Waiting for publishing
Figure: The overall framework consists of five stages: (1) VAE Encoding, (2) IoA Alignment, (3) Forward Diffusion, (4) Attention-Guided Reverse Optimization, and (5) Decoding.
Figure: Comprehensive comparison of original, reference, and synthetically defect images of 4 industrial object categories using 2-GAO: (a) bottle, (b) cable, (c) capsule, and (d) carpet.
- GPU: NVIDIA GPU (8GB+ VRAM recommended)
- RAM: 16GB+
- Storage: 10GB+ free space
- OS: Windows 10/11, Linux, macOS
- Python: 3.8–3.11 (recommended 3.10)
- CUDA: 11.8+ (for GPU acceleration)
git clone https://github.com/mutianwei521/2GAO.git
cd 2GAOpython -m venv venv
# Windows:
venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate# Windows (recommended):
pip install -r requirements_windows.txt
# Linux/macOS:
pip install -r requirements.txtpython -c "import torch; print('PyTorch:', torch.__version__)"
python -c "import diffusers; print('Diffusers:', diffusers.__version__)"2GAO/
├── main_contrastive.py # Main entry
├── contrastive_defect_generator.py # Core generator
├── smart_prompt_generator.py # Smart prompt generator
├── attention_heatmap_extractor.py # Attention extractor
├── run_ablation.py # Ablation study runner
├── requirements.txt # Linux dependencies
├── requirements_windows.txt # Windows dependencies
├── batch_generate_mvtec.py # MVTec dataset batch
├── batch_generate_visa.py # VISA dataset batch
├── batch_generate_concrete.py # Concrete dataset batch
├── batch_generate_custom.py # Custom dataset batch
├── test/
│ ├── quick_test_mvtec.py # MVTec quick test
│ ├── quick_test_visa.py # VISA quick test
│ ├── quick_test_concrete.py # Concrete quick test
│ ├── evaluate_visa_metrics.py # VISA evaluation
│ ├── evaluate_concrete_metrics.py # Concrete evaluation
│ └── evaluate_mvtec_metrics.py # MVTec evaluation
├── visaImages/ # VISA dataset images
├── concreteImages/ # Concrete dataset images
├── mvtecImages/ # MVTec dataset images
├── outputs_visa/ # VISA output directory
├── outputs_concrete/ # Concrete output directory
└── outputs_mvtec/ # MVTec output directory
Main entry point for single image defect generation.
python main_contrastive.py \
--prompt "bottle crack" \
--good-dir "images/good" \
--bad-dir "images/bad" \
--output-dir "outputs" \
--num-inference-steps 100 \
--r 0.25 \
--enable-feature-alignment \
--save-attention-heatmapsCore generator module (imported by main_contrastive.py).
Extracts and visualizes attention heatmaps.
Generates optimized prompts based on image content.
MVTec dataset quick test.
python test/quick_test_mvtec.py --category bottle --num-defects 2Parameters:
--category: MVTec category (bottle, cable, capsule, etc.)--num-defects: Number of defects (1-4)
VISA dataset quick test.
python test/quick_test_visa.py --category candle --num-defects 2Parameters:
--category: VISA category (candle, capsules, cashew, etc.)--num-defects: Number of defects (1-4)
Concrete crack dataset quick test.
python test/quick_test_concrete.py --category CFD --num-defects 2Parameters:
--category: Concrete category (CFD, CRACK500, DeepCrack, etc.)--num-defects: Number of defects (1-4)
MVTec dataset batch generation (15 categories).
python batch_generate_mvtec.py \
--mvtec-dir "mvtecImages" \
--output-dir "outputs_mvtec" \
--num-samples 50 \
--num-defects 1 2 3 4MVTec Categories: bottle, cable, capsule, carpet, grid, hazelnut, leather, metal_nut, pill, screw, tile, toothbrush, transistor, wood, zipper
VISA dataset batch generation (12 categories).
python batch_generate_visa.py \
--visa-dir "visaImages" \
--output-dir "outputs_visa" \
--num-samples 50 \
--num-defects 1 2 3 4VISA Categories: candle, capsules, cashew, chewinggum, fryum, macaroni1, macaroni2, pcb1, pcb2, pcb3, pcb4, pipe_fryum
Concrete crack dataset batch generation (8 categories).
python batch_generate_concrete.py \
--concrete-dir "concreteImages" \
--output-dir "outputs_concrete" \
--num-samples 50 \
--num-defects 1 2 3 4Concrete Categories: CFD, CRACK500, DeepCrack, Eugen, Rissbilder, Volker, crack, cracktree200
Custom dataset batch generation. See CUSTOM_DATASET.md for details.
python batch_generate_custom.py \
--config "config/custom_dataset.yaml" \
--output-dir "outputs_custom"Evaluate VISA dataset generation results.
python test/evaluate_visa_metrics.py \
--output-dir "outputs_visa" \
--save-csvMetrics: I-AUC, I-F1, P-AUC, P-F1, PRO, IS, LPIPS
Output Structure: outputs_visa/{category}/defect_{N}/
Evaluate Concrete dataset generation results.
python test/evaluate_concrete_metrics.py \
--output-dir "outputs_concrete" \
--save-csvOutput Structure: outputs_concrete/defect_{N}/{category}/
Evaluate MVTec dataset generation results.
python test/evaluate_mvtec_metrics.py \
--output-dir "outputs_mvtec" \
--save-csvOutput Tables:
- Table A1: Category IS/LPIPS
- Table A2: Scenario metrics (I-AUC, I-F1, P-AUC, P-F1, PRO)
- Table A3: Detailed per-category per-defect metrics
Output Structure: outputs_mvtec/{category}/{1,2,3,4}/
Reproduce ablation study experiments from Section 2.4 and Appendix. Generates attention maps and figures using DAAM attention extraction.
# Run all ablation experiments (generates Figures 6-12)
python run_ablation.py --mode all
# Semantic Ambiguity Verification (Figure 7)
# Compares generic vs specific prompts attention distribution
python run_ablation.py --mode semantic
# Attention Guidance Verification (Figure 8)
# Compares with/without Focus Loss & Suppression Loss
python run_ablation.py --mode attention
# Latent Entanglement Verification (Figures 9-11)
# Multi-defect attention disentanglement (2/3/4 defects)
python run_ablation.py --mode entanglement
# IoA Alignment Verification (Figure 12)
# Geometric validity: partial/no overlap correction
python run_ablation.py --mode ioa
# Hyperparameter Sensitivity (Figure 6)
# IoA threshold, diffusion steps, optimization steps
python run_ablation.py --mode hyperparameter
# Print Tables S2-S6 only
python run_ablation.py --mode tablesGenerated Figures:
- Figure 6: Hyperparameter sensitivity analysis (4 subplots)
- Figure 7: Semantic ambiguity - generic vs specific prompts
- Figure 8: Attention guidance - Focus/Suppression Loss effect
- Figures 9-11: Latent entanglement - multi-defect disentanglement
- Figure 12: IoA alignment - geometric validity
Ablation Components (Table S2):
- w/o Prompt Guidance: Semantic consistency (I-AUC: 75.63%)
- w/o Attention Guidance: Spatial precision (PRO: 75.64%)
- w/o Contrastive Loss: Multi-defect disentanglement (PRO: 78.95%)
- w/o IoA Alignment: Geometric validity (PRO: 82.34%)
- Full Model: All components (I-AUC: 100%, PRO: 99.90%)
| Directory | Content | Description |
|---|---|---|
feathered_blend/ |
*.png |
Final blended images (main output) |
non_feathered_blend/ |
*.png |
Non-feathered blended images |
comparison_grid/ |
*.png |
Before/after comparison |
combined_defect_masks/ |
*.png |
Generated defect masks |
defect_heatmaps/ |
*.png |
Defect probability heatmaps |
original_good/ |
*.png |
Source good images |
reference_bad/ |
*.png |
Source defect images |
- PARAMETERS.md - Parameter reference guide
- CUSTOM_DATASET.md - Custom dataset tutorial
# Generate
python batch_generate_visa.py \
--visa-dir "visaImages" \
--output-dir "outputs_visa" \
--num-samples 50
# Evaluate
python test/evaluate_visa_metrics.py \
--output-dir "outputs_visa" \
--save-csv# Generate
python batch_generate_mvtec.py \
--mvtec-dir "mvtecImages" \
--output-dir "outputs_mvtec" \
--num-samples 50
# Evaluate
python test/evaluate_mvtec_metrics.py \
--output-dir "outputs_mvtec" \
--save-csv- Data Preparation: Ensure accurate mask labeling
- Start Simple: Use defaults first, then fine-tune
- Quality Check: Use
--save-attention-heatmaps - Hardware Balance: Adjust steps based on GPU
- Evaluate Results: Use evaluation scripts for metrics
The authors express deep gratitude to Dr. Enyang Gao, General Manager of Deep-Sea 3D Vision Co., Ltd., whose invaluable guidance and mentorship made this research possible. The name “2-GAO” serves as a tribute to Dr. Gao’s contributions to the field of computer vision and his continued support of our research endeavors.
Figure: Dr.Gao's portrait.MIT License
