High-performance Video Super Resolution for ComfyUI with VRAM optimization.
Run FlashVSR on 8GB-24GB+ GPUs without artifacts. Features intelligent resource management, 5 VAE options, and auto-downloading models.
Registry Link: https://registry.comfy.org/publishers/naxci1/nodes/ComfyUI-FlashVSR_Stable
- 🎬 Video Super Resolution: 2x or 4x upscaling using FlashVSR diffusion models
- 🧠 5 VAE Options: Choose from Wan2.1, Wan2.2, LightVAE, TAE variants for optimal VRAM/quality trade-off
- 📊 Pre-Flight Resource Check: Intelligent VRAM estimation with settings recommendations
- ⚡ Auto-Download: Models download automatically from HuggingFace if missing
- 🛡️ OOM Protection: Automatic recovery with progressive fallback (tiled VAE → tiled DiT → chunking)
- 🔧 Unified Pipeline: All modes share optimized processing logic
- Changelog - Full version history
- Sample Workflow
- HuggingFace Models
This node is optimized for various hardware configurations. Here are some guidelines:
| VRAM | Mode | Tiling | Chunk Size | Precision | Notes |
|---|---|---|---|---|---|
| 24GB+ | full or tiny |
Disabled | 0 (All) | bf16/auto |
Max quality/speed. |
| 16GB | tiny |
tiled_vae=True |
0 or ~100 | bf16/auto |
Enable keep_models_on_cpu. |
| 12GB | tiny |
tiled_vae=True, tiled_dit=True |
~50 | fp16 |
Use sparse_sage attention. |
| 8GB | tiny-long |
Required | ~20 | fp16 |
Must use tiling and chunking. |
- Attention Mode: Use
sparse_sage_attentionfor the best balance of speed and memory.flash_attention_2is faster but requires specific hardware/installation. - Precision:
bf16(BFloat16) is recommended for RTX 3000/4000/5000 series. It is faster and preserves dynamic range better thanfp16. - Chunking: Use
frame_chunk_sizeto process videos in segments. This moves processed frames to CPU RAM, preventing VRAM saturation on long clips. - Resize Input: If the input video is large (e.g., 1080p), use the
resize_factorparameter to reduce input size to0.5xbefore processing. This drastically reduces VRAM usage and allows for 4x upscaling of the resized result (net 2x output). For small videos, leave at1.0.
Before processing, FlashVSR now performs an intelligent pre-flight check that:
- Estimates VRAM Requirements: Calculates approximate VRAM needed based on resolution, frames, scale, and settings.
- Checks Available Resources: Uses
torch.cuda.mem_get_info()for accurate real-time VRAM availability. - Provides Recommendations: If OOM is predicted, suggests optimal settings.
Example console output:
============================================================
🔍 PRE-FLIGHT RESOURCE CHECK
💻 RAM: 15.4GB / 95.8GB
💾 VRAM Available: 14.2GB
📊 Estimated VRAM Required: 12.8GB
✅ Safe to proceed. Estimated ~12.8GB needed, 14.2GB available.
============================================================
If VRAM is insufficient:
⚠️ Current settings require ~18.5GB but only 8.0GB available.
💡 Recommended Optimal Settings:
• chunk_size = 32
• tiled_vae = True
• tiled_dit = True
• resize_factor = 0.6
| VAE Type | VRAM Usage | Speed | Quality | Best For |
|---|---|---|---|---|
| Wan2.1 | 8-12 GB | Baseline | ⭐⭐⭐⭐⭐ | Maximum quality, 24GB+ VRAM |
| Wan2.2 | 8-12 GB | Baseline | ⭐⭐⭐⭐⭐ | Improved normalization for Wan2.2 models |
| LightVAE_W2.1 | 4-5 GB | 2-3x faster | ⭐⭐⭐⭐ | 8-16GB VRAM, speed priority |
| TAE_W2.2 | 6-8 GB | 1.5x faster | ⭐⭐⭐⭐ | Temporal consistency priority |
| LightTAE_HY1.5 | 3-4 GB | 3x faster | ⭐⭐⭐⭐ | HunyuanVideo compatible, minimum VRAM |
| Your VRAM | Recommended VAE | Additional Settings |
|---|---|---|
| 8GB | LightTAE_HY1.5 or LightVAE_W2.1 |
tiled_vae=True, tiled_dit=True, chunk_size=16 |
| 12GB | LightVAE_W2.1 or Wan2.1 |
tiled_vae=True |
| 16GB | Any VAE | Optional tiling for long videos |
| 24GB+ | Wan2.1 or Wan2.2 |
Maximum quality, no restrictions |
All VAE models auto-download from HuggingFace if not found locally:
| VAE Selection | File | Direct Download Link |
|---|---|---|
| Wan2.1 | Wan2.1_VAE.pth |
Download |
| Wan2.2 | Wan2.2_VAE.pth |
Download |
| LightVAE_W2.1 | lightvaew2_1.pth |
Download |
| TAE_W2.2 | taew2_2.safetensors |
Download |
| LightTAE_HY1.5 | lighttaehy1_5.pth |
Download |
Mode: tiny-long
VAE: LightVAE_W2.1 or LightTAE_HY1.5
Tiled VAE: ✅ Enabled
Tiled DiT: ✅ Enabled
Chunk Size: 16-32
Resize Factor: 0.5-0.8
Keep Models on CPU: ✅ Enabled
Mode: tiny
VAE: Wan2.1 or LightVAE_W2.1
Tiled VAE: ✅ Enabled
Tiled DiT: Optional
Chunk Size: 50-100
Resize Factor: 1.0
Keep Models on CPU: Optional
Mode: full or tiny
VAE: Wan2.1 or Wan2.2
Tiled VAE: ❌ Disabled
Tiled DiT: ❌ Disabled
Chunk Size: 0 (all frames)
Resize Factor: 1.0
Keep Models on CPU: ❌ Disabled
At the end of each run, you'll see a summary:
============================================================
📊 PROCESSING SUMMARY
⏱️ Total Processing Time: 130.08s (1.54 FPS)
📥 Input Resolution: 276x206 (200 frames)
📤 Output Resolution: 552x412 (200 frames)
📈 Peak VRAM Used: 12.4 GB
============================================================
Hover over any input in ComfyUI to see tooltips. Full parameter list:
| Parameter | Description |
|---|---|
| model | FlashVSR model version |
| mode | tiny (fast), tiny-long (lowest VRAM), full (highest quality) |
| vae_model | VAE architecture (5 options, auto-download) |
| scale | Upscaling factor: 2x or 4x |
| color_fix | Wavelet color transfer. Highly recommended. |
| tiled_vae | Spatial tiling for VAE. Reduces VRAM, slower. |
| tiled_dit | Spatial tiling for DiT. Required for 4K output. |
| tile_size | Tile dimensions. Smaller = less VRAM. |
| overlap | Tile overlap for seamless blending. |
| unload_dit | Unload DiT before VAE decode. |
| frame_chunk_size | Process N frames at a time. 0 = all. |
| enable_debug | Verbose console logging. |
| keep_models_on_cpu | Offload to system RAM when idle. |
| resize_factor | To first reduce the size of large videos and then enlarge them, use a range of (0.3-1.0). |
| attention_mode | Attention kernel: sparse_sage, flash_attention_2, sdpa, block_sparse |
FlashVSR includes a full-featured CLI that mirrors all ComfyUI node parameters for standalone video upscaling.
# Basic 2x upscale
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2
# 4x upscale with tiling for lower VRAM
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 4 \
--tiled_vae --tiled_dit --tile_size 256 --tile_overlap 24
# Long video with chunking to prevent OOM
python cli_main.py --input long_video.mp4 --output upscaled.mp4 \
--frame_chunk_size 50 --mode tiny-long
# Low VRAM mode (8GB GPUs)
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2 \
--vae_model LightVAE_W2.1 --tiled_vae --tiled_dit \
--frame_chunk_size 20 --resize_factor 0.5All arguments map 1:1 with ComfyUI node inputs. Run python cli_main.py --help for full details.
| Argument | Description |
|---|---|
--input, -i |
Input video file path (e.g., video.mp4) |
--output, -o |
Output video file path (e.g., upscaled.mp4) |
| Argument | Type | Default | Description |
|---|---|---|---|
--model |
choice | FlashVSR-v1.1 |
Model version: FlashVSR, FlashVSR-v1.1 |
--mode |
choice | tiny |
Operation mode: tiny, tiny-long, full |
--vae_model |
choice | Wan2.1 |
VAE model: Wan2.1, Wan2.2, LightVAE_W2.1, TAE_W2.2, LightTAE_HY1.5 |
--force_offload |
flag | True |
Force offload models to CPU after execution |
--no_force_offload |
flag | - | Disable force offloading |
--precision |
choice | auto |
Precision: fp16, bf16, auto |
--device |
string | auto |
Device: cuda:0, cuda:1, cpu, auto |
--attention_mode |
choice | sparse_sage_attention |
Attention: sparse_sage_attention, block_sparse_attention, flash_attention_2, sdpa |
| Argument | Type | Default | Description |
|---|---|---|---|
--scale |
int | 2 |
Upscaling factor: 2 or 4 |
--color_fix |
flag | True |
Apply wavelet-based color correction |
--no_color_fix |
flag | - | Disable color correction |
--tiled_vae |
flag | False |
Enable spatial tiling for VAE decoder |
--tiled_dit |
flag | False |
Enable spatial tiling for DiT |
--tile_size |
int | 256 |
Tile size for DiT processing (32-1024) |
--tile_overlap |
int | 24 |
Overlap pixels between tiles (8-512) |
--unload_dit |
flag | False |
Unload DiT before VAE decoding |
--sparse_ratio |
float | 2.0 |
Sparse attention control (1.5-2.0) |
--kv_ratio |
float | 3.0 |
Key/Value cache ratio (1.0-3.0) |
--local_range |
int | 11 |
Local attention window: 9 or 11 |
--seed |
int | 0 |
Random seed for reproducibility |
--frame_chunk_size |
int | 0 |
Process N frames at a time (0 = all) |
--enable_debug |
flag | False |
Enable verbose logging |
--keep_models_on_cpu |
flag | True |
Keep models in CPU RAM when idle |
--no_keep_models_on_cpu |
flag | - | Keep models in VRAM |
--resize_factor |
float | 1.0 |
Resize input before processing (0.1-1.0) |
| Argument | Type | Default | Description |
|---|---|---|---|
--fps |
float | input FPS | Output video FPS |
--codec |
string | libx264 |
Video codec: libx264, libx265, h264_nvenc |
--crf |
int | 18 |
Quality (0-51, lower = better) |
--start_frame |
int | 0 |
Start frame index (0-indexed) |
--end_frame |
int | -1 |
End frame index (-1 = all frames) |
--models_dir |
string | ./models |
Custom models directory path |
cd ComfyUI/custom_nodes
git clone https://github.com/naxci1/ComfyUI-FlashVSR_Stable.git
python -m pip install -r ComfyUI-FlashVSR_Stable/requirements.txt📢 Turing architecture or older GPUs (GTX 16 series, RTX 20 series, and earlier): Install
triton<3.3.0:# Windows python -m pip install -U triton-windows<3.3.0 # Linux python -m pip install -U triton<3.3.0
Download the FlashVSR folder from HuggingFace:
ComfyUI/models/FlashVSR/
├── LQ_proj_in.ckpt
├── TCDecoder.ckpt
├── diffusion_pytorch_model_streaming_dmd.safetensors
└── Wan2.1_VAE.pth (or auto-downloads)
💡 VAE files auto-download from HuggingFace if not present. Only the DiT model and other components need manual download.
See CHANGELOG.md for full version history.
- 🚀 Pre-Flight Resource Calculator with settings recommendations
- 🎨 5 VAE options: Wan2.1, Wan2.2, LightVAE_W2.1, TAE_W2.2, LightTAE_HY1.5
- ⬇️ Auto-download VAE models from HuggingFace
- 🐛 Fixed black borders and video corruption
- ⚡ Unified processing pipeline for all modes
- 🛡️ 95% VRAM threshold for OOM recovery
- FlashVSR @OpenImagingLab
- Sparse_SageAttention @jt-zhang
- ComfyUI @comfyanonymous
- Wan2.2 @Wan-Video
- LightX2V @ModelTC
- LightX2V Autoencoders @lightx2v
MIT License - see LICENSE for details.
