DeltaVLM

Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception

Pei Deng, Wenqian Zhou, Hanlin Wu*

Share us a ⭐ if this repo helps your research!

Introduction

DeltaVLM introduces Remote Sensing Image Change Analysis (RSICA) — a paradigm combining change detection and visual question answering for multi-turn, instruction-guided exploration of bi-temporal remote sensing images.

Capabilities: Change captioning, binary classification, quantification, localization, open-ended QA, multi-turn dialogue.

Architecture Details

Bi-temporal Vision Encoder (Bi-VE): EVA-ViT-g/14 with last 2 blocks fine-tuned
IDPM with CSRM: Cross-Semantic Relation Measuring to filter irrelevant variations
Instruction-guided Q-Former: Aligns visual differences with user instructions
Frozen Vicuna-7B: Language decoder for response generation

Quick Start

1. Environment Setup

git clone https://github.com/hanlinwu/DeltaVLM.git
cd DeltaVLM

conda create -n deltavlm python=3.10 -y
conda activate deltavlm

pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install -e .

2. Download Pretrained Models

mkdir -p pretrained

# Vicuna-7B (LLM backbone)
huggingface-cli download lmsys/vicuna-7b-v1.5 --local-dir pretrained/vicuna-7b-v1.5

# BERT (Q-Former tokenizer)
huggingface-cli download bert-base-uncased --local-dir pretrained/bert-base-uncased

# DeltaVLM checkpoint
huggingface-cli download hanlinwu/DeltaVLM --local-dir pretrained/deltavlm

3. Run Inference

python scripts/predict.py \
    --image_A path/to/before.png \
    --image_B path/to/after.png \
    --checkpoint pretrained/deltavlm/checkpoint_best.pth \
    --llm_model pretrained/vicuna-7b-v1.5 \
    --bert_model pretrained/bert-base-uncased

Expected Output

Using device: cuda
Loading model from pretrained/deltavlm/checkpoint_best.pth...
Model loaded successfully!
Preprocessing images...
Prompt: Please briefly describe the changes in these two images.
Generating response...

==================================================
Generated Description:
A new building has appeared in the lower right area.
==================================================

Training

Prepare Dataset

Download ChangeChat-105k:

python data/download_changechat.py --output_dir ./data/changechat

Expected Data Structure

data/changechat/
├── images/
│   ├── train/
│   ├── val/
│   └── test/
└── annotations/
    ├── train.json
    ├── val.json
    └── test.json

Start Training

# Single GPU
python scripts/train.py --cfg_path configs/train_stage2.yaml

# Multi-GPU (4 GPUs)
torchrun --nproc_per_node=4 scripts/train.py --cfg_path configs/train_stage2.yaml

Evaluation

python scripts/evaluate.py --cfg_path configs/evaluate.yaml

Metric	Description
BLEU-1/2/3/4	N-gram precision
METEOR	Semantic similarity
ROUGE-L	Longest common subsequence
CIDEr	Consensus-based image description

Multi-turn Dialogue Example

ChangeChat-105k Dataset

Instruction Type	Train	Test
Change Captioning	34,075	1,929
Binary Classification	6,815	1,929
Change Quantification	6,815	1,929
Change Localization	6,815	1,929
Open-ended QA	26,600	7,527
Multi-turn Dialogue	6,815	1,929
Total	87,935	17,172

Citation

If you find this work useful, please cite:

@article{deltavlm2024,
  title={DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception},
  author={Deng, Pei and Zhou, Wenqian and Wu, Hanlin},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  year={2024}
}

Acknowledgement

BLIP-2 by Salesforce
EVA vision encoder
Vicuna language model
LEVIR-CC dataset
LEVIR-MCI dataset

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
configs		configs
data		data
deltavlm		deltavlm
docs		docs
pretrained		pretrained
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fix.patch		fix.patch
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeltaVLM

Share us a ⭐ if this repo helps your research!

Table of Contents

Introduction

Quick Start

1. Environment Setup

2. Download Pretrained Models

3. Run Inference

Training

Prepare Dataset

Start Training

Evaluation

Multi-turn Dialogue Example

ChangeChat-105k Dataset

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeltaVLM

Share us a ⭐ if this repo helps your research!

Table of Contents

Introduction

Quick Start

1. Environment Setup

2. Download Pretrained Models

3. Run Inference

Training

Prepare Dataset

Start Training

Evaluation

Multi-turn Dialogue Example

ChangeChat-105k Dataset

Citation

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages