Add `skip_image_resolution` to deduplicate multi-resolution dataset by woct0rdho · Pull Request #2273 · kohya-ss/sd-scripts

woct0rdho · 2026-02-20T06:19:27Z

This PR is an alternative to #2270 .

I propose to add a dataset property min_orig_resolution, so we can write a multi-resolution dataset config like

[general]
bucket_no_upscale = true

[[datasets]]
resolution = 768
[[datasets.subsets]]
image_dir = 'path/to/image/dir'

[[datasets]]
resolution = 1024
min_orig_resolution = 768
[[datasets.subsets]]
image_dir = 'path/to/image/dir'

[[datasets]]
resolution = 1280
min_orig_resolution = 1024
[[datasets.subsets]]
image_dir = 'path/to/image/dir'

I've also added max_orig_resolution because it looks natural to have one.

We filter the images by their original resolutions in BaseDataset.make_buckets, and update num_train_images and num_reg_images. For DreamBoothDataset, we rebalance the number of regularization images after the filter. For ControlNetDataset, we check missing conditioning images after the filter, and ignore extra conditioning images.

There is no overhead if the user does not set min_orig_resolution and max_orig_resolution.

kohya-ss · 2026-02-20T11:36:19Z

Thank you for this PR!

However, this option seems a bit complicated and confusing. Please tell me why #2270's skip_image_resolution is not enough.

woct0rdho · 2026-02-20T12:41:07Z

min_orig_resolution is exactly your skip_image_resolution but I renamed it because I think min_orig_resolution is more self-evident.

I can rename it back to skip_image_resolution and remove max_orig_resolution if you think that's better.

The code is indeed more complicated than what I thought at first, but this is the best way I (and AI tools I use) can find to implement:

The filter using original resolution, which can be done only after we know the original resolution in make_buckets
Make regularization images work correctly with the filter
Make conditioning images work correctly with the filter

kohya-ss · 2026-02-20T14:05:42Z

Thanks for the explanation, I understand now.

skip_image_resolution explicitly states that images of that resolution will not be included, but I don't think min_orig_resolution explicitly states whether they will be included or not.

I'll try to find out if there's a simpler way to implement this.

…resolution

kohya-ss · 2026-02-22T13:29:58Z

Thank you for update!

I think we could simply filter images with the following code.
Note that skip_image_resolution should be a tuple, just like resolution.

                            size_set_count += 1
                    logger.info(f"set image size from cache files: {size_set_count}/{len(img_paths)}")

            # from here
            if self.skip_image_resolution is not None:
                filtered_img_paths = []
                filtered_sizes = []
                skip_image_area = self.skip_image_resolution[0] * self.skip_image_resolution[1]
                for img_path, size in zip(img_paths, sizes):
                    if size is None:  # no latents cache file, get image size by reading image file (slow)
                        size = self.get_image_size(img_path)
                    if size[0] * size[1] <= skip_image_area:
                        continue
                    filtered_img_paths.append(img_path)
                    filtered_sizes.append(size)
                img_paths = filtered_img_paths
                sizes = filtered_sizes
                # add some logging here
            # to here

            # We want to create a training and validation split. This should be improved in the future
            # to allow a clearer distinction between training and validation. This can be seen as a

In FineTuningDataset, we can use the image size from the metadata.

woct0rdho · 2026-02-23T09:38:05Z

Yes this makes the PR simpler. I've moved the filtering from make_buckets to __init__.

kohya-ss · 2026-02-23T12:43:52Z

Thank you for update! I will create a test dataset and review/test this sooner.

Add min_orig_resolution and max_orig_resolution

af3a55b

woct0rdho mentioned this pull request Feb 20, 2026

Deduplicate multi-resolution dataset with bucket_no_upscale #2270

Closed

Rename min_orig_resolution to skip_image_resolution; remove max_orig_…

47afa8b

…resolution

woct0rdho changed the title ~~Add min_orig_resolution and max_orig_resolution to deduplicate multi-resolution dataset~~ Add skip_image_resolution to deduplicate multi-resolution dataset Feb 20, 2026

woct0rdho added 3 commits February 23, 2026 17:05

Change skip_image_resolution to tuple

3cdd62b

Move filtering to __init__

5af4180

Minor fix

1e4f55c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Add `skip_image_resolution` to deduplicate multi-resolution dataset#2273

Add `skip_image_resolution` to deduplicate multi-resolution dataset#2273
woct0rdho wants to merge 5 commits intokohya-ss:mainfrom
woct0rdho:min-max-orig-reso

woct0rdho commented Feb 20, 2026

Uh oh!

kohya-ss commented Feb 20, 2026

Uh oh!

woct0rdho commented Feb 20, 2026 •

edited

Loading

Uh oh!

kohya-ss commented Feb 20, 2026

Uh oh!

kohya-ss commented Feb 22, 2026 •

edited

Loading

Uh oh!

woct0rdho commented Feb 23, 2026

Uh oh!

kohya-ss commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

woct0rdho commented Feb 20, 2026

Uh oh!

kohya-ss commented Feb 20, 2026

Uh oh!

woct0rdho commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss commented Feb 20, 2026

Uh oh!

kohya-ss commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woct0rdho commented Feb 23, 2026

Uh oh!

kohya-ss commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

woct0rdho commented Feb 20, 2026 •

edited

Loading

kohya-ss commented Feb 22, 2026 •

edited

Loading