-
Notifications
You must be signed in to change notification settings - Fork 0
Fix Z-Image VAE operations not reserving working memory for OOM prevention #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
|
@copilot This error is occurring in the unit test: |
…utoEncoder Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Fixed in commit e13401b. The issue was that |
|
@copilot Still failures on the python tests: |
…fields Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Fix VRAM cache calculation to properly account for device_working_mem_gb
Problem Analysis
When generating with larger models (like Q8 Z-Image Turbo), users get OOM errors during VAE decoding.
Root Cause:
The
ZImageLatentsToImageInvocationandZImageImageToLatentsInvocationdo not request additional working memory for VAE operations, unlike the standard SD1.5/SDXL/SD3/CogView4 invocations. This means the model cache doesn't offload enough models from VRAM before VAE operations run, leaving no room for the operation's intermediate tensors.Comparison:
LatentsToImageInvocation(SD1.5/SDXL): Callsestimate_vae_working_memory_sd15_sdxl()and passesworking_mem_bytestomodel_on_device()SD3LatentsToImageInvocation: Callsestimate_vae_working_memory_sd3()and passesworking_mem_bytesCogView4LatentsToImageInvocation: Callsestimate_vae_working_memory_cogview4()and passesworking_mem_bytesZImageLatentsToImageInvocation: Didn't estimate or request working memory (NOW FIXED ✅)ZImageImageToLatentsInvocation: Didn't estimate or request working memory (NOW FIXED ✅)Changes Made
z_image_latents_to_image.pyandz_image_image_to_latents.pymodel_on_device(working_mem_bytes=...)model_construct()to bypass Pydantic validation for mock objectsTechnical Details
The fix adds working memory estimation to both Z-Image VAE invocations:
FluxAutoEncoder) or Diffusers (AutoencoderKL)estimate_vae_working_memory_flux()for FLUX VAEestimate_vae_working_memory_sd3()for AutoencoderKLmodel_on_device(working_mem_bytes=...)This ensures the model cache properly offloads models to make room for VAE operations before they run, preventing OOM errors.
Test Fixes
configattribute was being set on FluxAutoEncoder mock, which doesn't have this attribute. The test now only setsconfigattributes for AutoencoderKL VAEs.model_construct()instead of the regular constructor to create invocation instances with mock fields, bypassing validation while still testing the core logic.Files Modified
invokeai/app/invocations/z_image_latents_to_image.py: Added working memory estimation for decodeinvokeai/app/invocations/z_image_image_to_latents.py: Added working memory estimation for encodetests/app/invocations/test_z_image_working_memory.py: Added tests to verify working memory estimationExpected Impact
Users will no longer need to manually set
max_cache_vram_gbto work around OOM errors. Thedevice_working_mem_gbsetting (default 3GB) will now work correctly for Z-Image models, as the VAE operations will request appropriate working memory and the model cache will offload models accordingly.Original prompt
This section details on the original issue you should resolve
<issue_title>[bug]: Out of Memory errors with larger models</issue_title>
<issue_description>### Is there an existing issue for this problem?
Install method
Invoke's Launcher
Operating system
Linux
GPU vendor
Nvidia (CUDA)
GPU model
RTX 4070
GPU VRAM
12GB
Version number
6.10.0rc2
Browser
No response
System Information
No response
What happened
When generating with the Q8 Z-Image Turbo model, I am getting out of memory errors during the VAE decoding phase. I can avoid the errors by setting
max_cache_vram_gbto 4 GB, at which point I see VRAM memory use rise to ~4 GB. However it doesn't seem intuitive to me that adjusting the VRAM cache should be the way to fix the error.I also tried setting
device_working_mem_gb: 4in my config file, but without the VRAM cache setting, I again get OOM.Here is the log from a successful generation with the VRAM cache limited to 4 GB: