Add training support for SigLIP#31495
Conversation
|
@aliencaocao Could you rebase to include the upstream changes on main? This should fix the failures on the CI runs |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for adding!
The tests in test_modeling_siglip.py will also need to be updated so the training tests are no longer skipped
[experimental] enable GC training tests as it has worked for my own data
|
Added the training tests and also enabled gradient checkpointing tests. I note that CLIP had issues with GC but I have used it with siglip myself and did not find any issue on convergence/accuracy on a single RTX 3080Ti with fp16 training and grad accum=16. Will let the tests run and see how it goes. |
|
@amyeroberts seems to need you to enable slow tests? |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for the continued work on this!
It shouldn't be necessary for the slow tests to be enabled to test training for this model. I've added the run-slow label, nevertheless. If you push a commit with the message [run_slow] siglip then this will trigger a run of the slow tests for this model (which I'll have to approve to set off)
[run_slow] siglip
Add skip reason for training tests for SiglipTextModel
# Conflicts: # tests/models/siglip/test_modeling_siglip.py
|
@amyeroberts now that the GC tests are properly skipped, shall we move forward with this? |
What does this PR do?
Add the sigmoid contrastive loss function of SigLIP from https://github.com/google-research/big_vision/blob/01edb81a4716f93a48be43b3a4af14e29cdb3a7f/big_vision/trainers/proj/image_text/siglip.py#L287
This will allow training/finetuning SigLIP models.
Already verified to work on my own dataset.
I saw the note on using
torch.distributedfor loss function andopen_clip's implementation, but I'm not sure why is it needed. I ran my training with both DDP and FDSP with full sharding and it seem to work just fine, also getting the expected speedup and ability to set larger BS. The only issue is #31034 when using FDSP but I don't think itsSigLIPspecific.Nonetheless, I updated the docs to mention the lack of usage of
torch.distributedif that ended up important to some users.Not sure if a training test is needed.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts