MaskControl: Spatio-Temporal Control for Masked Motion Synthesis (ICCV 2025 -- Oral & Award Candidate)
If you find our code or paper helpful, please consider starring our repository and citing:
@inproceedings{Pinyoanuntapong2025MaskControl,
title = {MaskControl: Spatio-Temporal Control for Masked Motion Synthesis},
author = {Pinyoanuntapong, Ekkasit and Saleem, Muhammad and Karunratanakul, Korrawe and Wang, Pu and Xue, Hongfei and Chen, Chen and Guo, Chuan and Cao, Junli and Ren, Jian and Tulyakov, Sergey},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages = {9955--9965},
year = {2025}
}
๐ข Oct/14/2025 - Add training script
๐ข Oct/14/2025 - update env name from 'ControlMM' to 'MaskControl'
- Joint Control (GMD, OmniControl, and MMM Evaluation)
- ProMoGen Evaluation
- STMC Evaluation
- Joint Control
- Obstacle Avoidance
- Body Part Timeline Control
- Retrain MoMask with Cross Entropy for All Positions
- Add Logits Regularizer
Our code built on top of MoMask. If you encounter any issues, please refer to the MoMask repository for setup and troubleshooting instructions.
conda env create -f environment.yml
conda activate MaskControl
pip install git+https://github.com/openai/CLIP.git
pip install -r requirements.txt
bash prepare/download_models.sh
For evaluation only.
bash prepare/download_evaluator.sh
bash prepare/download_glove.sh
You have two options here:
- Skip getting data, if you just want to generate motions using own descriptions.
- Get full data, if you want to re-train and evaluate the model.
(a). Full data (text + motion)
HumanML3D - Follow the instruction in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
python train_ctrlnet.py \
--name CtrlNet_.5XEnt.5TTT__traj_NoRetrainTrans \
--trans_name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns \
--gpu_id 5 \
--dataset_name t2m \
--batch_size 64 \
--vq_name rvq_nq6_dc512_nc512_noshare_qdp0.2 \
--xent .5 \
--ctrl_loss .5 \
--control trajectory
| Argument | Example Value | Description |
|---|---|---|
--name |
CtrlNet_.5XEnt.5TTT__traj_NoRetrainTrans |
Name of the experiment. Used for logging and checkpoint saving. |
--trans_name |
t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns |
Pretrained Transformer |
--gpu_id |
5 |
GPU device ID to use for training. |
--dataset_name |
t2m |
Dataset name. |
--batch_size |
64 |
Number of samples per training batch. |
--vq_name |
rvq_nq6_dc512_nc512_noshare_qdp0.2 |
Pretrained VQVAE |
--xent |
0.5 |
Weight for the cross-entropy(Logits |
| Consistency Loss). | ||
--ctrl_loss |
0.5 |
Weight for the control loss (Motion |
| Consistency Loss). | ||
--control |
trajectory |
Type of control signal (i.e., trajectory, random, cross). |
python eval_t2m_trans_res.py \
--res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw \
--dataset_name t2m \
--ctrl_name 'z2024-08-23-01-27-51_CtrlNet_randCond1-196_l1.1XEnt.9TTT__fixRandCond' \
--gpu_id 0 \
--ext 0_each100Last600CtrnNet \
--control trajectory \
--density -1 \
--each_iter 100 \
--last_iter 600 \
--ctrl_net T
python eval_t2m_trans_res.py \
--res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw \
--dataset_name t2m \
--ctrl_name 'z2024-08-27-21-07-55_CtrlNet_randCond1-196_l1.5XEnt.5TTT__cross' \
--gpu_id 4 \
--ext 0_each100_last600_ctrlNetT \
--control cross \
--density -1 \
--each_iter 100 \
--last_iter 600 \
--ctrl_net T
The following joints can be controlled:
[pelvis, left_foot, right_foot, head, left_wrist, right_wrist]
| Argument | Description |
|---|---|
--res_name |
Name of the residual transformer |
--ctrl_name |
Name of the control transformer (VQ and Masked Transformer are also saved in this) |
--gpu_id |
GPU ID to use |
--ext |
Log name used for saving results, stored in: checkpoints/t2m/{ctrl_name}/eval/{ext} |
--control |
Type of random joint control: โข trajectory โ pelvis onlyโข random โ uniform random jointsโข cross โ random combinations, see section [A.11 CROSS COMBINATION]โข Any single joint: pelvis, l_foot, r_foot, head, left_wrist, right_wrist, lowerโข all โ all joints |
--density |
Number of control frames: โข 1, 2, 5 โ exact number of control framesโข 49 โ 25% of ground truth lengthโข 196 โ 100% of ground truth length(If GT length < 196, 49/196 are converted proportionally) |
--each_iter |
Number of logits optimization iterations at each unmask step |
--last_iter |
Number of logits optimization iterations at the last unmask step |
--ctrl_net |
Enable ControlNet with Logits Regularizer: T or F |
python -m generation.control_joint --path_name ./output/control2 --iter_each 100 --iter_last 600
| Argument | Type | Default | Description |
|---|---|---|---|
--path_name |
str | ./output/test |
Output directory to save the optimization results. |
--iter_each |
int | 100 |
Number of logits optimization steps at each unmasking step. |
--iter_last |
int | 600 |
Number of logits optimization steps at the final unmasking step. |
--show |
flag | False |
If set, automatically opens the result HTML visualization after execution. |
Example 1 -- Trajectory avoidance
python -m generation.avoidance --path_name ./output/avoidance1 --iter_each 100 --iter_last 600
Example 2 -- Head avoidance
python -m generation.avoidance2 --path_name ./output/avoidance2 --iter_each 100 --iter_last 600
This code is distributed under LICENSE-CC-BY-NC-ND-4.0.
Note that our code depends on other libraries, including MoMask, OmniControl, GMD, MMM, TLControl, STMC, ProgMoGen, TEMOS and BAMM which each have their own respective licenses that must also be followed.



