Skip to content

[NeurIPS'25] Official pytorch implementation for "Generative Perception of Shape and Material from Differential Motion"

Notifications You must be signed in to change notification settings

xrhan/diffmotion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generative Perception of Shape and Material from Differential Motion

(repo work in progress)

arXiv | Project Page

🍇 Introduction

We introduce a generative perception model that, given a few frames of an object undergoing motion, produces diverse and plausible interpretations of its shape and material.

🍊 Usage

Dependencies

Create a new environment with conda, and install pytorch. Make sure to use a pytorch version compatiable with NATTEN. Our project used the NATTEN 0.17.5 release with torch=2.5.1.

conda create -n diffmotion python=3.10
conda activate diffmotion
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Install other dependencies as follows

pip install -r requirements.txt

To use the UViT3D-Mixer model with shift-invariant neighborhood attention, you will need to install custom CUDA kernels via NATTEN.

🍬 Inference

We include a few sample videos under test_data/.
To run on your own inputs, edit run_inference.py and list the three frames in order for each test clip.

What the script does

  • Uses random seeds 0–9 by default (see main() in run_inference.py).
  • Saves each result as a 3×3 image grid:
    • Rows: diffuse albedo, surface normals, materials
    • Columns: frame 1 → frame 3 (left to right)
  • Outputs are written to the evals/ directory.

Quick start

# 1) Create an output folder (once)
mkdir -p evals

# 2) Run inference with your checkpoint and a custom save name
python run_inference.py \
  --ckpt_path "./ckpts/u_vit3d_mixer_e2.ckpt" \
  --save_name "exp1"

Model checkpoint

You can download our pretrained model checkpoint as follows:

mkdir ckpts
cd ckpts
gdown 1YCtgWeevOqW1ZDLwgpqK3jXJGRkT6Syy

🫐 Training

To train the model from scratch, we use the script in video_model/train_diffusion.py Our model is trained with 4 A100 or H100 GPUs for around 200 epochs and with a batch size of 16 per GPU (so effective batch size 64). Training can be performed efficiently with mixed precision training using bf16.

For instance,

mkdir runs # logging files
mkdir saved_video_models # saved training checkpoints
accelerate launch -m video_model.train_diffusion \
  --train-mode "uvit3d_mixer_all3" \
  --distributed --mp16\
  --epochs 1 \
  --batch-size 16 \
  --aug-static \
  --aug-reverse \
  --save-name "mixer_test" \
  --root-dir './' \
  --save-every 10 \
  --dataset-root-dir './'

Dataset

Please email xinranhan@g.harvard.edu for the training dataset we used in the paper.

🍒 Synthetic Data Generation

We provide the code to generate textured, synthetic data from Mitsuba3, with ground truth labels of the geometry and materials using Mitsuba3. Please refer to DiffMotion-DataGen.

🌰 Citation

If you find this repo useful, please consider citing:

@article{han2025generative,
  title={Generative Perception of Shape and Material from Differential Motion},
  author={Han, Xinran Nicole and Nishino, Ko and Zickler, Todd},
  journal={arXiv preprint arXiv:2506.02473},
  year={2025}
}

🍰 Acknowledgement

This project builds upon several excellent open source projects:

We thanks the authors of those projects and the developers of Mitsuba3 for their valuable contributions to the open source community.

About

[NeurIPS'25] Official pytorch implementation for "Generative Perception of Shape and Material from Differential Motion"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages