Skip to content

Comments

Multi-resolution dataset for SD1/SDXL#2269

Merged
kohya-ss merged 3 commits intokohya-ss:sd3from
woct0rdho:multi-reso-sd
Feb 23, 2026
Merged

Multi-resolution dataset for SD1/SDXL#2269
kohya-ss merged 3 commits intokohya-ss:sd3from
woct0rdho:multi-reso-sd

Conversation

@woct0rdho
Copy link
Contributor

I think multi-resolution training is something we should encourage people to do more. I'm still using SDXL as a lightweight model when I need to upscale images to 4K.

In sd-scripts, multi-resolution dataset is already documented in

[[datasets]]

where we can create multiple datasets with different resolutions and the same image_dir. It's already enabled for all newer models (Anima, Flux, Hunyuan, Lumina, SD3), but not for SD1/SDXL. This PR enables it.

However, this is a breaking change for people who already cached a lot of images. They may use a script to migrate the cache.

@kohya-ss
Copy link
Owner

Thank you, this is great!

Sorry, despite what the documentation says, I think SD/SDXL doesn't handle caching correctly when the image directory is the same even if the datasets are different currently.

For existing caches, it would be a good idea to prepare a migration script. Alternatively, as a temporary solution, falling back to key names without resolution suffixes might be one idea.

I'll review and merge this soon, probably tomorrow.

@woct0rdho
Copy link
Contributor Author

woct0rdho commented Feb 17, 2026

Falling back to key names without resolution suffixes is not always safe. For example, if a user uses multiple resolutions 768, 1024, 1280 without re-caching the latents after this PR, and we fallback to the old keys, then all 3 datasets will load the same latents.

Currently I do not do any fallback, so when the user starts a training after this PR, all latents will be cached again. The only downside is that the old latents are still saved in the same npz files. If the user is out of disk space, they can just delete the old npz files and cache the latents again.

I guess those people who already cached TBs of latents should know how to write the script and migrate it...

@kohya-ss
Copy link
Owner

Hmm, that certainly could be a problem...

It might be one idea to set a guard for fallback. If the shape of the previously saved latent to fallback to is different from the resolution, raise an error. I think this should prevent unintended fallbacks.

@woct0rdho
Copy link
Contributor Author

woct0rdho commented Feb 17, 2026

If we check the array shape using npz[key].shape, it will load the array data (rather than just the metadata) when checking the cache before training, which is fine for GBs of cache but not so fine for TBs of cache.

It's possible to only read the metadata but we need some private API of numpy. Do you think we should implement this? (BTW, it's easy to read metadata in safetensors)

@kohya-ss
Copy link
Owner

Thank you, I didn't realize that fallbacks would also need to be considered when checking the cache.

It might be a good idea to release this PR at the same time as the safetensors format cache feature and provide a script for migrating the cache (adding the resolution suffix and converting to safetensors).

@kohya-ss
Copy link
Owner

I thought about it a lot, but it seems that if the scripts stop working after an update, it will cause some confusion.
By implementing a similar fallback for cache checks, there should be no issues with existing caches except for multi-resolution datasets (which are likely used by many users).

@woct0rdho
Copy link
Contributor Author

woct0rdho commented Feb 20, 2026

Ok I've implemented the fallback. After this PR, when the user starts training, it will:

  1. Check the cached latents with resolution suffix
  2. If it's not found, then check the cached latents without resolution suffix, and check the shape by only reading the header
  3. If it's not found, then compute and cache the latents

So the script will not stop working, and will not take extra disk space. I hope the overhead of checking the shape is not too large.

The only downside is that we use a private API of numpy to read the header, and we need to remove it when we support safetensors latents. I've tested it with numpy 2.2 and 2.4 .

@kohya-ss
Copy link
Owner

Thank you for update. However, I don't think it's desirable to depend on numpy's internal API. Please note that I may change it to a simpler method after merging.

@kohya-ss kohya-ss changed the base branch from main to sd3 February 23, 2026 06:28
@kohya-ss kohya-ss merged commit 50694df into kohya-ss:sd3 Feb 23, 2026
3 checks passed
@kohya-ss
Copy link
Owner

Prior to this PR, no error was raised if a shape of latent differed from the expected latent shape.

This is undesired behavior, but changing it would break existing datasets. So we opened #2276 which implements the behavior for backward compatibility.

We'd appreciate it if you could take a look at the PR.

@woct0rdho woct0rdho deleted the multi-reso-sd branch February 23, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants