A custom node for ComfyUI that integrates Kandinsky 5.0, a powerful family of open-source text-to-video diffusion models.
Report Bug
·
Request Feature
This project brings the state-of-the-art Kandinsky 5.0 T2V Lite text-to-video model into the ComfyUI ecosystem. Kandinsky 5 is a latent diffusion pipeline built on a Flow Matching and Diffusion Transformer (DiT) backbone, capable of generating high-quality video from text prompts.
It leverages a powerful combination of Qwen2.5-VL and CLIP for text conditioning and the HunyuanVideo VAE for latent space encoding, enabling a nuanced understanding of prompts and impressive visual results.
This custom node suite provides all the necessary tools to run the Kandinsky 5 pipeline natively in ComfyUI, including a custom sampler for its specific inference loop and efficient memory management to run on consumer-grade hardware.
✨ Key Features:
- Native Kandinsky 5.0 Integration
- High-Quality Video Generation
- Custom Sampler Node
- Efficient Memory Management
- Multiple Model Variants: Supports SFT (high quality), no-CFG (faster), and distilled (fastest) model versions.
- Familiar ComfyUI Workflow
The easiest way to install is via ComfyUI Manager. Search for ComfyUI-Kandinsky and click "Install".
Alternatively, to install manually:
-
Clone the Repository: Navigate to your
ComfyUI/custom_nodes/directory and clone this repository:git clone https://github.com/wildminder/ComfyUI-Kandinsky.git
-
Install Dependencies: This node relies on packages from the original Kandinsky repository. Navigate into the cloned
ComfyUI-Kandinskydirectory and install the required dependencies:cd ComfyUI-Kandinsky pip install -r requirements.txt -
Download Models: This node does not automatically download models. You must download the required models and place them in the correct ComfyUI directories. See the Model Zoo table below for links.
- Place Kandinsky DiT models (
.safetensors) inComfyUI/models/diffusion_models/kandinsky/. - Place the HunyuanVideo VAE in
ComfyUI/models/vae/. - Place the CLIP-L and Qwen2.5-VL text encoders in
ComfyUI/models/clip/.
- Place Kandinsky DiT models (
-
Start/Restart ComfyUI: Launch ComfyUI. The Kandinsky nodes will appear under the
Kandinskycategory.
The Kandinsky 5 Loader node uses the config name to identify the correct checkpoint file from the kandinsky/ subdirectory in your diffusion_models folder.
| Model | Config Name | Duration | Hugging Face Link |
|---|---|---|---|
| Kandinsky 5.0 T2V Lite SFT 5s | config_5s_sft.yaml |
5s | 🤗 HF |
| Kandinsky 5.0 T2V Lite SFT 10s | config_10s_sft.yaml |
10s | 🤗 HF |
| Kandinsky 5.0 T2V Lite pretrain 5s | config_5s_pretrain.yaml |
5s | 🤗 HF |
| Kandinsky 5.0 T2V Lite pretrain 10s | config_10s_pretrain.yaml |
10s | 🤗 HF |
| Kandinsky 5.0 T2V Lite no-CFG 5s | config_5s_nocfg.yaml |
5s | 🤗 HF |
| Kandinsky 5.0 T2V Lite no-CFG 10s | config_10s_nocfg.yaml |
10s | 🤗 HF |
| Kandinsky 5.0 T2V Lite distill 5s | config_5s_distil.yaml |
5s | 🤗 HF |
| Kandinsky 5.0 T2V Lite distill 10s | config_10s_distil.yaml |
10s | 🤗 HF |
These are common models used in many ComfyUI workflows and are required for the Kandinsky.
| Model | Purpose | Hugging Face Link |
|---|---|---|
| HunyuanVideo VAE | Latent Encoding/Decoding | 🤗 HF |
| HunyuanVideo VAE bf16 | Latent Encoding/Decoding | 🤗 HF ComfyUI |
| CLIP-ViT-L-14 | Text Conditioning | 🤗 HF |
| Qwen2.5-VL-7B fp8 scaled | Text Conditioning | 🤗 HF ComfyUI |
| Qwen2.5-VL-7B bf16 | Text Conditioning | 🤗 HF Kijai |
Note
The quality of the generated video is highly dependent on the quality of your prompt. The output strongly depends on both the user prompt and the underlying system prompt used by the Qwen2.5-VL encoder. Experiment with descriptive phrasing to achieve the best results.
variant: Select the Kandinsky DiT model variant to load. The name corresponds to the config files.
clip: The standard CLIP-L model.qwen_vl: The Qwen2.5-VL model. Must be loaded with theqwen_imagetype in the CLIPLoader node.text: The positive text prompt describing the desired video.negative_text: The negative text prompt describing what to avoid.content_type: Sets the internal prompt template for eithervideoorimagegeneration.
width/height: The dimensions of the video to be generated.time_length: The desired duration of the video in seconds. Set to0for single image generation.batch_size: The number of videos to generate in one run.
seed: The random seed used for creating the initial noise.steps: The number of sampling steps. Should generally match the model type (e.g., 50 forsftmodels, 16 fordistillmodels).cfg: Classifier-Free Guidance scale. Higher values increase adherence to the prompt.scheduler_scale: Controls the timestep distribution during sampling.
Video generation is computationally intensive. As a baseline, generating a 5-second video (768x512) with the pretrain_5s model on an NVIDIA 4070Ti (16GB VRAM) can take approximately 15 minutes. Distilled models will be significantly faster.
- Potential for Misuse: The ability to generate video from text could be misused. Users of this node must not use it to create content that infringes upon the rights of individuals or is intended to mislead or harm. It is strictly forbidden to use this for any illegal or unethical purposes.
- Technical Limitations: The model may occasionally struggle with very long, complex prompts or maintaining perfect temporal consistency.
- Language Support: The model is trained primarily on English and has a strong understanding of Russian concepts. Performance on other languages is not guaranteed.
- This node is released for research and development purposes. Please use it responsibly.
══════════════════════════════════
Beyond the code, I believe in the power of community and continuous learning. I invite you to join the 'TokenDiff AI News' and 'TokenDiff Community Hub'
🗞️ AI for every home, creativity for every mind! |
💬 questions, help, and thoughtful discussion. |
══════════════════════════════════
This custom node is subject to its own repository license. The Kandinsky 5 model and its components are subject to the license provided by the original authors at the AI Forever Kandinsky-5 repository.
- The AI Forever team for creating and open-sourcing the incredible Kandinsky 5 project.
- Qwen Team for Qwen2.5-VL.
- OpenAI for CLIP.
- Tencent for the HunyuanVideo VAE.
- The ComfyUI team for their powerful and extensible platform.
