Describe your use-case.
I have large-ish datasets. There have been multiple at home users over the past year reporting issues with 20k+ datasets where hours of caching can be can be cancelled by a single invalid image.
Additionally the caching is "all or nothing", starting from scratch when we only add a couple new images and adjust some captions. Additionally I would like to cache on my own machine and then transfer these files to rented machine. Have that cache be portable would be extremely useful. Especially with SD3.5 and Flux, offloading pushes dataloader threads down to 1, would be a quick way to burn money on a rented GPU doing that.
What would you like to see as a solution?
I would like to see:
- The caching be done differentially
- To gracefully fail in the event of a invalid image or caption
- That the cache be portable enabling users to cache on their home GPU, this would cheapen the cost to train loras and finetune for users as the slow part (caching) could be done at home.
Have you considered alternatives? List them here.
N/A
Describe your use-case.
I have large-ish datasets. There have been multiple at home users over the past year reporting issues with 20k+ datasets where hours of caching can be can be cancelled by a single invalid image.
Additionally the caching is "all or nothing", starting from scratch when we only add a couple new images and adjust some captions. Additionally I would like to cache on my own machine and then transfer these files to rented machine. Have that cache be portable would be extremely useful. Especially with SD3.5 and Flux, offloading pushes dataloader threads down to 1, would be a quick way to burn money on a rented GPU doing that.
What would you like to see as a solution?
I would like to see:
Have you considered alternatives? List them here.
N/A