ComfyUI-FlashVSR_Stable

High-performance Video Super Resolution for ComfyUI with VRAM optimization.

Run FlashVSR on 8GB-24GB+ GPUs without artifacts. Features intelligent resource management, 5 VAE options, and auto-downloading models.

Registry Link: https://registry.comfy.org/publishers/naxci1/nodes/ComfyUI-FlashVSR_Stable

✨ Key Features

🎬 Video Super Resolution: 2x or 4x upscaling using FlashVSR diffusion models
🧠 5 VAE Options: Choose from Wan2.1, Wan2.2, LightVAE, TAE variants for optimal VRAM/quality trade-off
📊 Pre-Flight Resource Check: Intelligent VRAM estimation with settings recommendations
⚡ Auto-Download: Models download automatically from HuggingFace if missing
🛡️ OOM Protection: Automatic recovery with progressive fallback (tiled VAE → tiled DiT → chunking)
🔧 Unified Pipeline: All modes share optimized processing logic

📋 Quick Links

Performance & VRAM Optimization

This node is optimized for various hardware configurations. Here are some guidelines:

VRAM Tiers & Settings

VRAM	Mode	Tiling	Chunk Size	Precision	Notes
24GB+	`full` or `tiny`	Disabled	0 (All)	`bf16`/`auto`	Max quality/speed.
16GB	`tiny`	`tiled_vae=True`	0 or ~100	`bf16`/`auto`	Enable `keep_models_on_cpu`.
12GB	`tiny`	`tiled_vae=True`, `tiled_dit=True`	~50	`fp16`	Use `sparse_sage` attention.
8GB	`tiny-long`	Required	~20	`fp16`	Must use tiling and chunking.

Performance Enhancements

Attention Mode: Use sparse_sage_attention for the best balance of speed and memory. flash_attention_2 is faster but requires specific hardware/installation.
Precision: bf16 (BFloat16) is recommended for RTX 3000/4000/5000 series. It is faster and preserves dynamic range better than fp16.
Chunking: Use frame_chunk_size to process videos in segments. This moves processed frames to CPU RAM, preventing VRAM saturation on long clips.
Resize Input: If the input video is large (e.g., 1080p), use the resize_factor parameter to reduce input size to 0.5x before processing. This drastically reduces VRAM usage and allows for 4x upscaling of the resized result (net 2x output). For small videos, leave at 1.0.

Pre-Flight Resource Check (NEW)

Before processing, FlashVSR now performs an intelligent pre-flight check that:

Estimates VRAM Requirements: Calculates approximate VRAM needed based on resolution, frames, scale, and settings.
Checks Available Resources: Uses torch.cuda.mem_get_info() for accurate real-time VRAM availability.
Provides Recommendations: If OOM is predicted, suggests optimal settings.

Example console output:

============================================================
🔍 PRE-FLIGHT RESOURCE CHECK
💻 RAM: 15.4GB / 95.8GB
💾 VRAM Available: 14.2GB
📊 Estimated VRAM Required: 12.8GB
✅ Safe to proceed. Estimated ~12.8GB needed, 14.2GB available.
============================================================

If VRAM is insufficient:

⚠️ Current settings require ~18.5GB but only 8.0GB available.
💡 Recommended Optimal Settings:
  • chunk_size = 32
  • tiled_vae = True
  • tiled_dit = True
  • resize_factor = 0.6

🎨 VAE Model Selection

VAE Type Comparison

VAE Type	VRAM Usage	Speed	Quality	Best For
Wan2.1	8-12 GB	Baseline	⭐⭐⭐⭐⭐	Maximum quality, 24GB+ VRAM
Wan2.2	8-12 GB	Baseline	⭐⭐⭐⭐⭐	Improved normalization for Wan2.2 models
LightVAE_W2.1	4-5 GB	2-3x faster	⭐⭐⭐⭐	8-16GB VRAM, speed priority
TAE_W2.2	6-8 GB	1.5x faster	⭐⭐⭐⭐	Temporal consistency priority
LightTAE_HY1.5	3-4 GB	3x faster	⭐⭐⭐⭐	HunyuanVideo compatible, minimum VRAM

VAE Selection Guide

Your VRAM	Recommended VAE	Additional Settings
8GB	`LightTAE_HY1.5` or `LightVAE_W2.1`	`tiled_vae=True`, `tiled_dit=True`, `chunk_size=16`
12GB	`LightVAE_W2.1` or `Wan2.1`	`tiled_vae=True`
16GB	Any VAE	Optional tiling for long videos
24GB+	`Wan2.1` or `Wan2.2`	Maximum quality, no restrictions

Auto-Download

All VAE models auto-download from HuggingFace if not found locally:

VAE Selection	File	Direct Download Link
Wan2.1	`Wan2.1_VAE.pth`	Download
Wan2.2	`Wan2.2_VAE.pth`	Download
LightVAE_W2.1	`lightvaew2_1.pth`	Download
TAE_W2.2	`taew2_2.safetensors`	Download
LightTAE_HY1.5	`lighttaehy1_5.pth`	Download

📖 Best Practices / Settings Guide

Low VRAM (8-12GB) Configuration

Mode: tiny-long
VAE: LightVAE_W2.1 or LightTAE_HY1.5
Tiled VAE: ✅ Enabled
Tiled DiT: ✅ Enabled
Chunk Size: 16-32
Resize Factor: 0.5-0.8
Keep Models on CPU: ✅ Enabled

Medium VRAM (16GB) Configuration

Mode: tiny
VAE: Wan2.1 or LightVAE_W2.1
Tiled VAE: ✅ Enabled
Tiled DiT: Optional
Chunk Size: 50-100
Resize Factor: 1.0
Keep Models on CPU: Optional

High VRAM (24GB+) Configuration

Mode: full or tiny
VAE: Wan2.1 or Wan2.2
Tiled VAE: ❌ Disabled
Tiled DiT: ❌ Disabled
Chunk Size: 0 (all frames)
Resize Factor: 1.0
Keep Models on CPU: ❌ Disabled

Processing Summary

At the end of each run, you'll see a summary:

============================================================
📊 PROCESSING SUMMARY
⏱️ Total Processing Time: 130.08s (1.54 FPS)
📥 Input Resolution: 276x206 (200 frames)
📤 Output Resolution: 552x412 (200 frames)
📈 Peak VRAM Used: 12.4 GB
============================================================

🔧 Node Parameters

Hover over any input in ComfyUI to see tooltips. Full parameter list:

Parameter	Description
model	FlashVSR model version
mode	`tiny` (fast), `tiny-long` (lowest VRAM), `full` (highest quality)
vae_model	VAE architecture (5 options, auto-download)
scale	Upscaling factor: 2x or 4x
color_fix	Wavelet color transfer. Highly recommended.
tiled_vae	Spatial tiling for VAE. Reduces VRAM, slower.
tiled_dit	Spatial tiling for DiT. Required for 4K output.
tile_size	Tile dimensions. Smaller = less VRAM.
overlap	Tile overlap for seamless blending.
unload_dit	Unload DiT before VAE decode.
frame_chunk_size	Process N frames at a time. 0 = all.
enable_debug	Verbose console logging.
keep_models_on_cpu	Offload to system RAM when idle.
resize_factor	To first reduce the size of large videos and then enlarge them, use a range of (0.3-1.0).
attention_mode	Attention kernel: `sparse_sage`, `flash_attention_2`, `sdpa`, `block_sparse`

💻 Command-Line Interface (CLI)

FlashVSR includes a full-featured CLI that mirrors all ComfyUI node parameters for standalone video upscaling.

Quick Start

# Basic 2x upscale
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2

# 4x upscale with tiling for lower VRAM
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 4 \
    --tiled_vae --tiled_dit --tile_size 256 --tile_overlap 24

# Long video with chunking to prevent OOM
python cli_main.py --input long_video.mp4 --output upscaled.mp4 \
    --frame_chunk_size 50 --mode tiny-long

# Low VRAM mode (8GB GPUs)
python cli_main.py --input video.mp4 --output upscaled.mp4 --scale 2 \
    --vae_model LightVAE_W2.1 --tiled_vae --tiled_dit \
    --frame_chunk_size 20 --resize_factor 0.5

CLI Arguments Reference

All arguments map 1:1 with ComfyUI node inputs. Run python cli_main.py --help for full details.

Required Arguments

Argument	Description
`--input`, `-i`	Input video file path (e.g., `video.mp4`)
`--output`, `-o`	Output video file path (e.g., `upscaled.mp4`)

Pipeline Initialization (from FlashVSRNodeInitPipe)

Argument	Type	Default	Description
`--model`	choice	`FlashVSR-v1.1`	Model version: `FlashVSR`, `FlashVSR-v1.1`
`--mode`	choice	`tiny`	Operation mode: `tiny`, `tiny-long`, `full`
`--vae_model`	choice	`Wan2.1`	VAE model: `Wan2.1`, `Wan2.2`, `LightVAE_W2.1`, `TAE_W2.2`, `LightTAE_HY1.5`
`--force_offload`	flag	`True`	Force offload models to CPU after execution
`--no_force_offload`	flag	-	Disable force offloading
`--precision`	choice	`auto`	Precision: `fp16`, `bf16`, `auto`
`--device`	string	`auto`	Device: `cuda:0`, `cuda:1`, `cpu`, `auto`
`--attention_mode`	choice	`sparse_sage_attention`	Attention: `sparse_sage_attention`, `block_sparse_attention`, `flash_attention_2`, `sdpa`

Processing Parameters (from FlashVSRNodeAdv)

Argument	Type	Default	Description
`--scale`	int	`2`	Upscaling factor: `2` or `4`
`--color_fix`	flag	`True`	Apply wavelet-based color correction
`--no_color_fix`	flag	-	Disable color correction
`--tiled_vae`	flag	`False`	Enable spatial tiling for VAE decoder
`--tiled_dit`	flag	`False`	Enable spatial tiling for DiT
`--tile_size`	int	`256`	Tile size for DiT processing (32-1024)
`--tile_overlap`	int	`24`	Overlap pixels between tiles (8-512)
`--unload_dit`	flag	`False`	Unload DiT before VAE decoding
`--sparse_ratio`	float	`2.0`	Sparse attention control (1.5-2.0)
`--kv_ratio`	float	`3.0`	Key/Value cache ratio (1.0-3.0)
`--local_range`	int	`11`	Local attention window: `9` or `11`
`--seed`	int	`0`	Random seed for reproducibility
`--frame_chunk_size`	int	`0`	Process N frames at a time (0 = all)
`--enable_debug`	flag	`False`	Enable verbose logging
`--keep_models_on_cpu`	flag	`True`	Keep models in CPU RAM when idle
`--no_keep_models_on_cpu`	flag	-	Keep models in VRAM
`--resize_factor`	float	`1.0`	Resize input before processing (0.1-1.0)

Video I/O Parameters

Argument	Type	Default	Description
`--fps`	float	input FPS	Output video FPS
`--codec`	string	`libx264`	Video codec: `libx264`, `libx265`, `h264_nvenc`
`--crf`	int	`18`	Quality (0-51, lower = better)
`--start_frame`	int	`0`	Start frame index (0-indexed)
`--end_frame`	int	`-1`	End frame index (-1 = all frames)
`--models_dir`	string	`./models`	Custom models directory path

🚀 Installation

Step 1: Install the Node

cd ComfyUI/custom_nodes
git clone https://github.com/naxci1/ComfyUI-FlashVSR_Stable.git
python -m pip install -r ComfyUI-FlashVSR_Stable/requirements.txt

📢 Turing architecture or older GPUs (GTX 16 series, RTX 20 series, and earlier): Install triton<3.3.0:
# Windows
python -m pip install -U triton-windows<3.3.0
# Linux
python -m pip install -U triton<3.3.0

Step 2: Download Models

Download the FlashVSR folder from HuggingFace:

ComfyUI/models/FlashVSR/
├── LQ_proj_in.ckpt
├── TCDecoder.ckpt
├── diffusion_pytorch_model_streaming_dmd.safetensors
└── Wan2.1_VAE.pth  (or auto-downloads)

💡 VAE files auto-download from HuggingFace if not present. Only the DiT model and other components need manual download.

🖼️ Preview

Sample Workflow

Download Workflow JSON

🏷️ Recent Changes

See CHANGELOG.md for full version history.

v1.2.7 (2025-12-23)

🚀 Pre-Flight Resource Calculator with settings recommendations
🎨 5 VAE options: Wan2.1, Wan2.2, LightVAE_W2.1, TAE_W2.2, LightTAE_HY1.5
⬇️ Auto-download VAE models from HuggingFace
🐛 Fixed black borders and video corruption
⚡ Unified processing pipeline for all modes
🛡️ 95% VRAM threshold for OOM recovery

🙏 Acknowledgments

FlashVSR @OpenImagingLab
Sparse_SageAttention @jt-zhang
ComfyUI @comfyanonymous
Wan2.2 @Wan-Video
LightX2V @ModelTC
LightX2V Autoencoders @lightx2v

📄 License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
img		img
src		src
workflow		workflow
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
cli_main.py		cli_main.py
logo_tr.png		logo_tr.png
node.zip		node.zip
nodes.py		nodes.py
posi_prompt.pth		posi_prompt.pth
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_mock.py		test_mock.py

License

naxci1/ComfyUI-FlashVSR_Stable

Folders and files

Latest commit

History

Repository files navigation