Hello, excellent work!
I would like to ask the author @kim-sanghwan @lilygeorgescu @sean-xr if it's possible to replace the CLIP loss with SigLIP's sigmoid loss, and would this help reduce memory usage when using large batch size? Or are there any other methods to optimize the code? Looking forward to your response.