Add skip_image_resolution to deduplicate multi-resolution dataset#2273
Add skip_image_resolution to deduplicate multi-resolution dataset#2273woct0rdho wants to merge 5 commits intokohya-ss:mainfrom
skip_image_resolution to deduplicate multi-resolution dataset#2273Conversation
|
Thank you for this PR! However, this option seems a bit complicated and confusing. Please tell me why #2270's |
|
I can rename it back to The code is indeed more complicated than what I thought at first, but this is the best way I (and AI tools I use) can find to implement:
|
|
Thanks for the explanation, I understand now.
I'll try to find out if there's a simpler way to implement this. |
min_orig_resolution and max_orig_resolution to deduplicate multi-resolution datasetskip_image_resolution to deduplicate multi-resolution dataset
|
Thank you for update! I think we could simply filter images with the following code. In FineTuningDataset, we can use the image size from the metadata. |
|
Yes this makes the PR simpler. I've moved the filtering from |
|
Thank you for update! I will create a test dataset and review/test this sooner. |
This PR is an alternative to #2270 .
I propose to add a dataset property
min_orig_resolution, so we can write a multi-resolution dataset config likeI've also added
max_orig_resolutionbecause it looks natural to have one.We filter the images by their original resolutions in
BaseDataset.make_buckets, and updatenum_train_imagesandnum_reg_images. ForDreamBoothDataset, we rebalance the number of regularization images after the filter. ForControlNetDataset, we check missing conditioning images after the filter, and ignore extra conditioning images.There is no overhead if the user does not set
min_orig_resolutionandmax_orig_resolution.