☆ NeurIPS 2025 ☆
Jiahao Wang1
·
Hualian Sheng2
·
Sijia Cai2,†
·
Weizhan Zhang1,*
Caixia Yan1
·
Yachuang Feng2
.
Bing Deng2
.
Jieping Ye2
1Xi'an Jiaotong University
2Alibaba Cloud
Please give us a star⭐ on GitHub if you like our work.
This is the official code of EchoShot, which allows users to generate multiple video shots showing the same person, controlled by customized prompts. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!
- September 18, 2025: 🎉🎉🎉 EchoShot has been accepted to NeurIPS 2025!
- July 15, 2025: 🔥 EchoShot-1.3B-preview is now available at HuggingFace!
- July 15, 2025: 🎉 Release code of inference and training codes.
- May 25, 2025: We propose EchoShot, a multi-shot portrait video generation model.
First, use this code to download codes:
git clone https://github.com/D2I-ai/EchoShot
cd EchoShot
Use this code to install the required packages:
conda create -n echoshot python=3.10
conda activate echoshot
pip install -r requirements.txt
Since EchoShot is based on Wan2.1, you have to first download Wan2.1-T2V-1.3B using:
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir .models/Wan2.1-T2V-1.3B
Then download the EchoShot model:
huggingface-cli download JonneyWang/EchoShot --local-dir ./models/EchoShot
We recommend to organize local directories as:
EchoShot
├── ...
├── dataset
│ |── video
| | ├── 1.mp4
| | ├── 2.mp4
| | └── ...
| └── train.json
├── models
│ |── Wan2.1-T2V-1.3B
│ | └── ...
│ └── EchoShot
| ├── EchoShot-1.3B-preview.pth
| └── ...
└── ...
For optimal performance, we highly recommend using LLM for prompt extension. We provide a Dashscope API usage for extension:
- Use the Dashscope API for extension.
- Apply for a
dashscope.api_keyin advance (EN | CN). - Configure the environment variable
DASH_API_KEYto specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variableDASH_API_URLto 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the dashscope document. - Use the
qwen-plusmodel for extension.
- Apply for a
You can specify the DASH_API_KEY and other important configs in generate.sh. Then run this code to start sampling:
bash generate.sh
If you don't want to use prompt extension, remove the --use_prompt_extend in generate.sh and run:
bash generate.sh
If you want to train your own version of the model, please prepare the dataset, which should include video files and their corresponding JSON files. Here, we provide an example in dataset/train.json for reference. All training configurations are stored in config_train.py, where you can make specific modifications according to your needs. Once everything is set up, execute the following code to start the training process:
bash train.sh
We would like to express our sincere thank to Wan Team for their support.
If you are inspired by our work, please cite our paper.
@article{wang2025echoshot,
title={EchoShot: Multi-Shot Portrait Video Generation},
author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
journal={arXiv preprint arXiv:2506.15838},
year={2025}
}