Add ColPali to 🤗 transformers#33736
Conversation
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks so much for working on this! Excited to see ColPali in Transformers 🤗.
Main comment for now is to try and inherit directly from PaliGemmaForConditionalGeneration to make full use of modular and to avoid instantiating PaliGemmaForConditionalGeneration inside the model
e0db84b to
9c742c5
Compare
yonigozlan
left a comment
There was a problem hiding this comment.
Looks like something is going wrong with the files generated by Modular, but otherwise looks good!
For the processor and modeling tests, you can take a look at other model tests to see how they should be implemented
08caa9d to
e189393
Compare
f71f94a to
f9895a8
Compare
|
Deps used for the weight conversion ( |
|
Hi @ArthurZucker, could you review my PR for ColPali please? 😁 @yonigozlan was kind enough to do multiple reviews of the PR already, so it should almost ready to ship! However, I noticed a few bugs with modular that will need to be fixed before merging:
Any chance you could use with these please? 🙏🏼 |
This was fixed thanks to #3448! 👍🏼 |
|
Will review today! Sorry for the delay! |
28e35bd to
9f34d80
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, I think we need to just use the paligemma config, manually add the embedding dimension or use something PreTrainedConfig.num_labels
There was a problem hiding this comment.
I think it makes more sense to flatten whatever we can rather than introducing this! 🤗 You don't even need a new config class for ColPaliGemma, you can use the PaliGemma one!
|
Almost there 🤗 with the config update we should be good to merge |
Is the last thing to do |
|
@ArthurZucker Fixed the issue with |
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks for pushing through 🔥 Let's merge!

What does this PR do?
Add ColPali support in 🤗
transformers.Who can review?
@yonigozlan 😉
@ArthurZucker
Additional details
colpali-engine==v0.3.0.colpali-enginedependency, the weights are directly loaded from an exportedstate_dictstored invidore/colpali-v1.2-merged-state_dict.vidore/colpali-v1.2-hf.vidore/document-visual-retrieval-test. I believe this should be moved tohf-internal-testing/document-visual-retrieval-test.Progress checklist
TODO
hf-internal-testing/document-visual-retrieval-test(done → [url])Wait for Add support for modifying Processor Kwargs in modular #34477 to be merged (fix for modular script)Large modular logic refactoring #34487 has been merged and fixes this problem