Speed3R: Sparse Feed-forward 3D Reconstruction Models

¹The University of Hong Kong ²Baidu AMU

Speed3R accelerate VGGT and π³ with trainable sparse attention

📣 Updates

[March 6, 2026] Initial Release

✨ Overview

While recent feed-forward 3D reconstruction models accelerate 3D reconstruction by jointly inferring dense geometry and camera poses in a single pass, their reliance on dense attention imposes a quadratic complexity, creating a computational bottleneck that severely limits inference speed.

To resolve this, we introduce Speed3R, an end-to-end trainable model inspired by the core principle of Structure-from-Motion that a sparse set of keypoints is sufficient for robust estimation. Speed3R features a dual-branch attention mechanism where the compression branch creates a coarse contextual prior to guide the selection branch, which performs fine-grained attention only on the most informative image tokens. This strategy mimics the efficiency of traditional keypoint matching, achieving a remarkable 12.4x inference speedup on 1000-view sequences, while introducing a minimal, controlled trade-off in accuracy. Validated on standard benchmarks with both VGGT and π³ backbones, our method delivers high-quality reconstructions at a fraction of computational cost, paving the way for efficient large-scale scene modeling.

🚀 Quick Start

1. Clone & Install Dependencies

First, clone the repository and install the required packages.

git clone https://github.com/Visual-AI/speed3r.git
cd speed3r
pip install -r requirements.txt
pip install triton==3.3.1

2. Run Inference from Command Line

Try our example inference script. You can run it on a directory of images or a video file.

If the automatic download from Hugging Face is slow, you can download the model checkpoint manually from Speed3R_Pi3 and specify its local path using the --ckpt argument.

# Run with the default example video
python example.py

3. Run with Gradio Demo

You can also launch a local Gradio demo for an interactive experience.

# Install demo-specific requirements
pip install -r requirements_demo.txt

# Launch the demo
python demo_gradio.py

🛠️ Detailed Usage

Model Input & Output

The model takes a tensor of images and outputs a dictionary containing the reconstructed geometry.

Input: A torch.Tensor of shape $B \times N \times 3 \times H \times W$ with pixel values in the range [0, 1].
Output: A dict with the following keys:
- points: Global point cloud unprojected by local points and camerae_poses (torch.Tensor, $B \times N \times H \times W \times 3$).
- local_points: Per-view local point maps (torch.Tensor, $B \times N \times H \times W \times 3$).
- conf: Confidence scores for local points (Raw confidence logits. Apply torch.sigmoid() to obtain probabilities in [0, 1], higher is better) (torch.Tensor, $B \times N \times H \times W \times 1$).
- camera_poses: Camera-to-world transformation matrices (4x4 in OpenCV format) (torch.Tensor, $B \times N \times 4 \times 4$).

Example Code Snippet

Here is a minimal example of how to run the model on a batch of images.

import torch
from pi3.models.pi3_sparse import Pi3_Sparse
from pi3.utils.basic import load_images_as_tensor # Assuming you have a helper function

# --- Setup ---
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Pi3_Sparse.from_pretrained("weining17/Speed3R_Pi3").to(device).eval()
# or download checkpoints from `https://huggingface.co/weining17/Speed3R_Pi3/tree/main/model.safetensors`

# --- Load Data ---
# Load a sequence of N images into a tensor
# imgs shape: (N, 3, H, W).
# imgs value: [0, 1]
imgs = load_images_as_tensor('path/to/your/data', interval=10).to(device)

# --- Inference ---
print("Running model inference...")
# Use mixed precision for better performance on compatible GPUs
dtype = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16

with torch.no_grad():
    with torch.amp.autocast('cuda', dtype=dtype):
        # Add a batch dimension -> (1, N, 3, H, W)
        results = model(imgs[None])

print("Reconstruction complete!")
# Access outputs: results['points'], results['camera_poses'] and results['local_points'].

TODOs

Release Speed3R-VGGT code & ckpt
Release the training

Notice

Currently, the model only supports resolutions that are multiples of 56 rather than 14.
We test the method with triton version 3.3.1, lower version may cause numerical error.
Curently the kernel only support bf16/fp16.

🙏 Acknowledgements

Our work builds upon several fantastic open-source projects. We'd like to express our gratitude to the authors of:

Excellent Concurrent Works Accelerating VGGT

📜 Citation

If you find our work useful, please consider citing:

@article{ren2026speed3r,
    title={Speed3R: Sparse Feed-forward 3D Reconstruction Models},
    author={Ren, Weining and Tan, Xiao and Han, Kai},
    journal={arXiv preprint arXiv:xxxxxxx},
    year={2026}
}

📄 License

This project adopts a dual-licensing strategy following Pi3:

Component	License	Commercial Use
Code (Scripts, Tools, Logic)	BSD 3-Clause	Permitted
Model Weights (Pi3 Weights)	CC BY-NC 4.0	Strictly Non-Commercial

Note on Model Weights: Due to the nature of the training datasets, the model weights are restricted to non-commercial research and educational purposes only. Redistribution of the weights must maintain this restriction.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
examples		examples
pi3		pi3
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_gradio.py		demo_gradio.py
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_demo.txt		requirements_demo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speed3R: Sparse Feed-forward 3D Reconstruction Models

📣 Updates

✨ Overview

🚀 Quick Start

1. Clone & Install Dependencies

2. Run Inference from Command Line

3. Run with Gradio Demo

🛠️ Detailed Usage

Model Input & Output

Example Code Snippet

TODOs

Notice

🙏 Acknowledgements

Excellent Concurrent Works Accelerating VGGT

📜 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speed3R: Sparse Feed-forward 3D Reconstruction Models

📣 Updates

✨ Overview

🚀 Quick Start

1. Clone & Install Dependencies

2. Run Inference from Command Line

3. Run with Gradio Demo

🛠️ Detailed Usage

Model Input & Output

Example Code Snippet

TODOs

Notice

🙏 Acknowledgements

Excellent Concurrent Works Accelerating VGGT

📜 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages