Generative Perception of Shape and Material from Differential Motion

(repo work in progress)

🍇 Introduction

We introduce a generative perception model that, given a few frames of an object undergoing motion, produces diverse and plausible interpretations of its shape and material.

🍊 Usage

Dependencies

Create a new environment with conda, and install pytorch. Make sure to use a pytorch version compatiable with NATTEN. Our project used the NATTEN 0.17.5 release with torch=2.5.1.

conda create -n diffmotion python=3.10
conda activate diffmotion
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Install other dependencies as follows

pip install -r requirements.txt

To use the UViT3D-Mixer model with shift-invariant neighborhood attention, you will need to install custom CUDA kernels via NATTEN.

🍬 Inference

We include a few sample videos under test_data/.
To run on your own inputs, edit run_inference.py and list the three frames in order for each test clip.

What the script does

Uses random seeds 0–9 by default (see main() in run_inference.py).
Saves each result as a 3×3 image grid:
- Rows: diffuse albedo, surface normals, materials
- Columns: frame 1 → frame 3 (left to right)
Outputs are written to the evals/ directory.

Quick start

# 1) Create an output folder (once)
mkdir -p evals

# 2) Run inference with your checkpoint and a custom save name
python run_inference.py \
  --ckpt_path "./ckpts/u_vit3d_mixer_e2.ckpt" \
  --save_name "exp1"

Model checkpoint

You can download our pretrained model checkpoint as follows:

mkdir ckpts
cd ckpts
gdown 1YCtgWeevOqW1ZDLwgpqK3jXJGRkT6Syy

🫐 Training

To train the model from scratch, we use the script in video_model/train_diffusion.py Our model is trained with 4 A100 or H100 GPUs for around 200 epochs and with a batch size of 16 per GPU (so effective batch size 64). Training can be performed efficiently with mixed precision training using bf16.

For instance,

mkdir runs # logging files
mkdir saved_video_models # saved training checkpoints
accelerate launch -m video_model.train_diffusion \
  --train-mode "uvit3d_mixer_all3" \
  --distributed --mp16\
  --epochs 1 \
  --batch-size 16 \
  --aug-static \
  --aug-reverse \
  --save-name "mixer_test" \
  --root-dir './' \
  --save-every 10 \
  --dataset-root-dir './'

Dataset

Please email xinranhan@g.harvard.edu for the training dataset we used in the paper.

🍒 Synthetic Data Generation

We provide the code to generate textured, synthetic data from Mitsuba3, with ground truth labels of the geometry and materials using Mitsuba3. Please refer to DiffMotion-DataGen.

🌰 Citation

If you find this repo useful, please consider citing:

@article{han2025generative,
  title={Generative Perception of Shape and Material from Differential Motion},
  author={Han, Xinran Nicole and Nishino, Ko and Zickler, Todd},
  journal={arXiv preprint arXiv:2506.02473},
  year={2025}
}

🍰 Acknowledgement

This project builds upon several excellent open source projects:

We thanks the authors of those projects and the developers of Mitsuba3 for their valuable contributions to the open source community.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
test_data		test_data
video_model		video_model
.gitignore		.gitignore
README.md		README.md
inference_utils.py		inference_utils.py
requirements.txt		requirements.txt
run_inference.py		run_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative Perception of Shape and Material from Differential Motion

🍇 Introduction

🍊 Usage

Dependencies

🍬 Inference

What the script does

Quick start

Model checkpoint

🫐 Training

Dataset

🍒 Synthetic Data Generation

🌰 Citation

🍰 Acknowledgement

About

Uh oh!

Releases

Packages

Languages

xrhan/diffmotion

Folders and files

Latest commit

History

Repository files navigation

Generative Perception of Shape and Material from Differential Motion

🍇 Introduction

🍊 Usage

Dependencies

🍬 Inference

What the script does

Quick start

Model checkpoint

🫐 Training

Dataset

🍒 Synthetic Data Generation

🌰 Citation

🍰 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages