ComfyUI-Kandinsky

A custom node for ComfyUI that integrates Kandinsky 5.0, a powerful family of open-source text-to-video diffusion models.

Report Bug · Request Feature

About The Project

This project brings the state-of-the-art Kandinsky 5.0 T2V Lite text-to-video model into the ComfyUI ecosystem. Kandinsky 5 is a latent diffusion pipeline built on a Flow Matching and Diffusion Transformer (DiT) backbone, capable of generating high-quality video from text prompts.

It leverages a powerful combination of Qwen2.5-VL and CLIP for text conditioning and the HunyuanVideo VAE for latent space encoding, enabling a nuanced understanding of prompts and impressive visual results.

This custom node suite provides all the necessary tools to run the Kandinsky 5 pipeline natively in ComfyUI, including a custom sampler for its specific inference loop and efficient memory management to run on consumer-grade hardware.

✨ Key Features:

Native Kandinsky 5.0 Integration
High-Quality Video Generation
Custom Sampler Node
Efficient Memory Management
Multiple Model Variants: Supports SFT (high quality), no-CFG (faster), and distilled (fastest) model versions.
Familiar ComfyUI Workflow

(back to top)

🚀 Getting Started

The easiest way to install is via ComfyUI Manager. Search for ComfyUI-Kandinsky and click "Install".

Alternatively, to install manually:

Clone the Repository: Navigate to your ComfyUI/custom_nodes/ directory and clone this repository:
```
git clone https://github.com/wildminder/ComfyUI-Kandinsky.git
```
Install Dependencies: This node relies on packages from the original Kandinsky repository. Navigate into the cloned ComfyUI-Kandinsky directory and install the required dependencies:
```
cd ComfyUI-Kandinsky
pip install -r requirements.txt
```
Download Models: This node does not automatically download models. You must download the required models and place them in the correct ComfyUI directories. See the Model Zoo table below for links.
- Place Kandinsky DiT models (.safetensors) in ComfyUI/models/diffusion_models/kandinsky/.
- Place the HunyuanVideo VAE in ComfyUI/models/vae/.
- Place the CLIP-L and Qwen2.5-VL text encoders in ComfyUI/models/clip/.
Start/Restart ComfyUI: Launch ComfyUI. The Kandinsky nodes will appear under the Kandinsky category.

Model Zoo

The Kandinsky 5 Loader node uses the config name to identify the correct checkpoint file from the kandinsky/ subdirectory in your diffusion_models folder.

Kandinsky DiT Models

Model	Config Name	Duration	Hugging Face Link
Kandinsky 5.0 T2V Lite SFT 5s	`config_5s_sft.yaml`	5s	🤗 HF
Kandinsky 5.0 T2V Lite SFT 10s	`config_10s_sft.yaml`	10s	🤗 HF
Kandinsky 5.0 T2V Lite pretrain 5s	`config_5s_pretrain.yaml`	5s	🤗 HF
Kandinsky 5.0 T2V Lite pretrain 10s	`config_10s_pretrain.yaml`	10s	🤗 HF
Kandinsky 5.0 T2V Lite no-CFG 5s	`config_5s_nocfg.yaml`	5s	🤗 HF
Kandinsky 5.0 T2V Lite no-CFG 10s	`config_10s_nocfg.yaml`	10s	🤗 HF
Kandinsky 5.0 T2V Lite distill 5s	`config_5s_distil.yaml`	5s	🤗 HF
Kandinsky 5.0 T2V Lite distill 10s	`config_10s_distil.yaml`	10s	🤗 HF

Required Dependency Models

These are common models used in many ComfyUI workflows and are required for the Kandinsky.

Model	Purpose	Hugging Face Link
HunyuanVideo VAE	Latent Encoding/Decoding	🤗 HF
HunyuanVideo VAE bf16	Latent Encoding/Decoding	🤗 HF ComfyUI
CLIP-ViT-L-14	Text Conditioning	🤗 HF
Qwen2.5-VL-7B fp8 scaled	Text Conditioning	🤗 HF ComfyUI
Qwen2.5-VL-7B bf16	Text Conditioning	🤗 HF Kijai

(back to top)

🛠️ Node Parameters

Note

The quality of the generated video is highly dependent on the quality of your prompt. The output strongly depends on both the user prompt and the underlying system prompt used by the Qwen2.5-VL encoder. Experiment with descriptive phrasing to achieve the best results.

Kandinsky 5 Loader

variant: Select the Kandinsky DiT model variant to load. The name corresponds to the config files.

Kandinsky 5 Text Encode

clip: The standard CLIP-L model.
qwen_vl: The Qwen2.5-VL model. Must be loaded with the qwen_image type in the CLIPLoader node.
text: The positive text prompt describing the desired video.
negative_text: The negative text prompt describing what to avoid.
content_type: Sets the internal prompt template for either video or image generation.

Empty Kandinsky 5 Latent

width/height: The dimensions of the video to be generated.
time_length: The desired duration of the video in seconds. Set to 0 for single image generation.
batch_size: The number of videos to generate in one run.

Kandinsky 5 Sampler

seed: The random seed used for creating the initial noise.
steps: The number of sampling steps. Should generally match the model type (e.g., 50 for sft models, 16 for distill models).
cfg: Classifier-Free Guidance scale. Higher values increase adherence to the prompt.
scheduler_scale: Controls the timestep distribution during sampling.

(back to top)

📊 Performance

Video generation is computationally intensive. As a baseline, generating a 5-second video (768x512) with the pretrain_5s model on an NVIDIA 4070Ti (16GB VRAM) can take approximately 15 minutes. Distilled models will be significantly faster.

(back to top)

⚠️ Risks and Limitations

Potential for Misuse: The ability to generate video from text could be misused. Users of this node must not use it to create content that infringes upon the rights of individuals or is intended to mislead or harm. It is strictly forbidden to use this for any illegal or unethical purposes.
Technical Limitations: The model may occasionally struggle with very long, complex prompts or maintaining perfect temporal consistency.
Language Support: The model is trained primarily on English and has a strong understanding of Russian concepts. Performance on other languages is not guaranteed.
This node is released for research and development purposes. Please use it responsibly.

(back to top)

══════════════════════════════════

Beyond the code, I believe in the power of community and continuous learning. I invite you to join the 'TokenDiff AI News' and 'TokenDiff Community Hub'

TokenDiff AI News

_{🗞️ AI for every home, creativity for every mind!}

TokenDiff Community Hub

_{💬 questions, help, and thoughtful discussion.}

══════════════════════════════════

License

This custom node is subject to its own repository license. The Kandinsky 5 model and its components are subject to the license provided by the original authors at the AI Forever Kandinsky-5 repository.

(back to top)

Acknowledgments

The AI Forever team for creating and open-sourcing the incredible Kandinsky 5 project.
Qwen Team for Qwen2.5-VL.
OpenAI for CLIP.
Tencent for the HunyuanVideo VAE.
The ComfyUI team for their powerful and extensible platform.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
example_workflows		example_workflows
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
kandinsky_patcher.py		kandinsky_patcher.py
kandinsky_sampler.py		kandinsky_sampler.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI-Kandinsky

About The Project

🚀 Getting Started

Model Zoo

Kandinsky DiT Models

Required Dependency Models

🛠️ Node Parameters

Kandinsky 5 Loader

Kandinsky 5 Text Encode

Empty Kandinsky 5 Latent

Kandinsky 5 Sampler

📊 Performance

⚠️ Risks and Limitations

TokenDiff AI News

TokenDiff Community Hub

License

Acknowledgments

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-Kandinsky

About The Project

🚀 Getting Started

Model Zoo

Kandinsky DiT Models

Required Dependency Models

🛠️ Node Parameters

Kandinsky 5 Loader

Kandinsky 5 Text Encode

Empty Kandinsky 5 Latent

Kandinsky 5 Sampler

📊 Performance

⚠️ Risks and Limitations

TokenDiff AI News

TokenDiff Community Hub

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages