Official code release for Dream2Flow.
Karthik Dharmarajan, Wenlong Huang*, Jiajun Wu, Li Fei-Fei†, Ruohan Zhang†
* Corresponding author, † Equal advising
Stanford University
Dream2Flow is a library for generating 3D object flow from video sources and planning robot trajectories to match that flow.
- 3D Object Flow: Clean abstractions for representing and visualizing 3D trajectories of objects.
- Video Sources: Support for generating videos via Google Veo 3 or playing back from files.
- Motion Planning: Trajectory optimizer using PyRoki for joint-space optimization.
- Visualization: Interactive 3D visualization using Viser.
- Release RL reward design and code
Create and activate a conda environment named dream2flow:
conda create -n dream2flow python=3.10 pip -y
conda activate dream2flowInstall the base package:
python -m pip install -e .To include the motion planning stack, install the optional planner extra:
python -m pip install -e ".[planner]"Tracking and visualization dependencies:
- CoTrackerV3:
clone it into a local
deps/folder and install it from there.
mkdir -p deps
git clone https://github.com/facebookresearch/co-tracker.git deps/co-tracker
python -m pip install -e ./deps/co-tracker
python -m pip install matplotlib flow_vis tqdm tensorboard- SpatialTrackerV2:
For depth estimation, we use the output from SpatialTrakcerV2. Please also install it inside
deps/.
mkdir -p deps
git clone https://github.com/henry123-boy/SpaTrackerV2.git deps/SpaTrackerV2
python -m pip install -r ./deps/SpaTrackerV2/requirements.txttorch-cluster: install the wheel that matches your PyTorch and CUDA build, using the PyG wheel index referenced by the upstream project.
python -m pip install torch-cluster -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.htmlReplace ${TORCH} and ${CUDA} with the versions that match your environment, or follow the upstream source-build instructions if you prefer to compile locally.
Upstream installation references:
- CoTrackerV3: https://github.com/facebookresearch/co-tracker?tab=readme-ov-file#installation-instructions
- SpatialTrackerV2: https://github.com/henry123-boy/SpaTrackerV2?tab=readme-ov-file#set-up-the-environment
- torch-cluster: https://github.com/rusty1s/pytorch_cluster?tab=readme-ov-file#installation
To download the Hugging Face scene data into the local data/ folder with the Hugging Face CLI:
hf download kdharmarajan123/Dream2Flow --repo-type dataset --local-dir ./data| Task | Scene Preview | Hugging Face Data |
|---|---|---|
| put bread | ![]() |
Dream2Flow dataset |
Download the scene data from the corresponding Hugging Face link, then place the extracted scene folder inside data/. For example, after downloading the put_bread scene, the files should live under data/put_bread/.
The package includes two runnable scripts under dream2flow/scripts/. Both are organized around a scene directory: one directory per scene, containing the files for that scene. Each prompt suggests a default filename in parentheses, interpreted relative to the chosen scene directory. You can press Enter to accept the default, or provide a full path to override it.
Run with:
python -m dream2flow.scripts.create_3d_flowPipeline choices:
- Video generation method:
[1] local file[2] Veo 3 - Depth estimation mode:
[1] playback[2] generate
Scene file defaults:
- Camera calibration:
camera_calibration_info.json - Start RGB image:
camera_rgb.png - Scene data:
scene_data.yaml - Local video file:
rgb.mp4 - Playback depth frames:
depth_frames.pt - Initial depth for generated depth:
initial_depth.pt - Saved 2D tracks:
tracks_2d.pt - Output 3D flow result:
object_flow_result.pt
Input logic:
- The script always reads the scene metadata from
scene_data.yaml - If video generation uses
local file, it reads a video file, defaulting torgb.mp4 - If video generation uses
Veo 3, it uses the start image and language instruction to generate a new video in the scene directory - If depth estimation uses
playback, it readsdepth_frames.pt - If depth estimation uses
generate, it readsinitial_depth.ptand writesdepth_frames.pt - After video generation and depth preparation, the script runs CoTrackerV3 offline tracking, saves
tracks_2d.pt, and lifts the tracks into 3D
Outputs:
- Saves an
ObjectFlowResult.ptfile, by default at<scene_dir>/object_flow_result.pt - Saves
tracks_2d.ptin the scene directory - Opens a Viser session, by default at
http://localhost:8080
Required file formats:
- Camera calibration JSON:
created by
CameraCalibration.save(...) - Camera calibration JSON structure: top-level mapping keyed by camera name
- Camera entry fields:
intrinsics: 3x3 numeric matrixextrinsics: 4x4 numeric matrix - Start RGB image:
RGB
.pngimage - Scene data file:
YAML mapping with
instruction,object_name, and optionalrobot_start_joints - Local video file:
.mp4video readable by OpenCV - Depth frames tensor
.pt:torch.Tensorwith shape(T, H, W) - Initial depth tensor
.pt:torch.Tensorwith shape(H, W)or(1, H, W)
This script requires the planner dependencies:
python -m pip install -e ".[planner]"Run with:
python -m dream2flow.scripts.plan_and_visualize_flowScene file defaults:
- flow result:
object_flow_result.pt - trajectory optimizer config:
trajectory_optimization_config.yaml - initial joints:
initial_joints.txt - initial pose:
initial_pose.txt - trajectory optimization output:
trajectory_optimization_plan.pt
Input logic:
- The script first looks for the flow result in the scene directory, unless you override it with a full path
- The script looks for
initial_joints.txtandinitial_pose.txtin the scene directory before prompting for inline values - If
trajectory_optimization_config.yamlis not present in the scene directory, the script falls back to the packaged defaults and then applies any explicit overrides
Default robot behavior:
- The planner takes a single
urdf_pathstring and atarget_link_name - By default,
urdf_pathispanda_description - The default target link is
panda_hand_tcp - The planner first tries to interpret
urdf_pathas arobot_descriptionspackage name, then falls back to loading it as a filesystem path
Required file formats:
- Flow result
.pt: a serializedObjectFlowResultsaved by Dream2Flow, typically fromdream2flow.scripts.create_3d_flow - Planner YAML config:
YAML mapping compatible with
TrajectoryOptimizerConfig - Planner YAML supported keys:
urdf_path: string or nulltarget_link_name: stringpath_length_weight: floatparticle_matching_weight: floatmax_iterations: integervisualize: booleanmax_num_timesteps_for_optimization: integer - Initial joints text file: comma-separated joint values, one scene-specific robot configuration
- Initial pose text file:
comma-separated
x,y,z,qx,qy,qz,qw
Outputs:
- Saves a planner result
.ptfile containing:joint_trajectory: tensor with shape(T, J)ee_trajectory: tensor with shape(T, 7) - Default output path:
<scene_dir>/trajectory_optimization_plan.pt - Opens a Viser session showing both the object flow and the planned trajectory
If you find our work useful, please consider citing:
@article{dharmarajan2025dream2flow,
title={Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow},
author={Dharmarajan, Karthik and Huang, Wenlong and Wu, Jiajun and Fei-Fei, Li and Zhang, Ruohan},
journal={arXiv preprint arXiv:2512.24766},
year={2025}
}