Skip to content

RTX 5070 (Blackwell) Compatibility: Severe Slowdown/Freeze due to DynamicSwap I/O Bottleneck #783

@cubeforeq5-oss

Description

@cubeforeq5-oss

Hello,

I am experiencing a critical performance issue when attempting to generate videos using FramePack on the new NVIDIA RTX 5070 (Blackwell) GPU. The issue appears to be related to model swapping/loading overhead rather than GPU calculation speed.

1. Environment Details

  • GPU: NVIDIA GeForce RTX 5070 (12 GB VRAM)
  • CPU: AMD Ryzen 7 5700X
  • RAM: 64 GB
  • OS: Windows 11
  • FramePack Setup: Custom install with Python 3.12, CUDA 12.8, PyTorch nightly build (cu128), and latest sageattention.

2. Issue Description: High Usage / Low Power Freeze

Video generation does complete, but it is extremely slow due to the constant, lengthy waiting periods during model loading.

The key observation is a significant contradiction in GPU telemetry during the wait periods:

  • GPU Usage: Consistently 99%
  • GPU Power Draw: Extremely low, only ~49 Watts
  • GPU Temperature: Very low, ~50°C
  • Clock Speed: High (~2880 MHz)

This indicates the GPU is stuck waiting for data, not performing heavy calculation, suggesting an I/O or memory access bottleneck caused by the DynamicSwap feature.

3. Specific Bottleneck Log

The process frequently freezes/stalls at the model loading and unloading points, specifically with the Autoencoder and Transformer models.

Example Log of Stall:

Unloaded DynamicSwap_HunyuanVideoTransformer3DModelPacked as complete. ... (Waiting for seconds/minutes) Loaded AutoencoderKLHunyuanVideo to cuda:0 as complete. ... (Processing starts: ~6.55 s/it) ... (Processing finishes, then stall occurs) Moving DynamicSwap_HunyuanVideoTransformer3DModelPacked to cuda:0 with preserved memory: 6 GB

4. Current Performance

After a full clean reinstall, the actual calculation speed is good (~6.55 s/it), but the overall generation time is poor due to the prolonged loading/offloading stalls.

We suspect this is a compatibility issue with the DynamicSwap mechanism and the RTX 5000 series (Blackwell) architecture, possibly related to how the custom sageattention or memory allocation handles the new hardware.

Thank you for your attention to this critical issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions