[WIP] Add MM Grounding DINO#37925
Conversation
Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed.
…ion where box and class heads were not correctly assigned to inner model
Cross att masking and cpu-gpu consistency tests are still failing however.
|
Hi @rziga ! Feel free to ping me when the PR is ready for review! |
|
Sorry, I got busy with life stuff for the last couple of weeks, so I didn't have time to finish it up. The model works, but I have to upload all the checkpoints to the hub and update the model doc before I mark it ready for review. |
|
Pinging @yonigozlan and @qubvel as requested. |
| --> | ||
|
|
||
| # MM Grounding DINO | ||
|
|
There was a problem hiding this comment.
Model Documentations now are of a different format as per this issue right here. To standardize model cards and make it easy for the user.
|
Hey @rziga, I wanted to ask you that - how do you add a new model? Huggingface provides utilities to help you get started with the boilerplate, AFAIK this is where you get started - Correct me if I'm wrong. And could you tell me your approach. 😄 |
|
Hi, I don't really know if I'm the right person to ask, but yes, I started with
Since this modular approach is quite new, I also looked at other models that used modular transformers, like this one: Hope this helps somewhat. But again I don't think I'm the right person to ask here. It is probably better to open an issue to add a model and ask the actual contributors. |
|
Hey @rziga, I'm really sorry for the long delay. I'll review it next week. The modular code looks very minimal, so I expect we can merge it quickly! Thanks a lot for working on the model. |
|
Hi @rziga, I created org for MM Grounding DINO checkpoints and invited you |
| """ | ||
| Preprocess an image or batch of images. | ||
| """ |
There was a problem hiding this comment.
hmm, that's a bit strange, we should probably merge main and make repo-consistency to avoid it
There was a problem hiding this comment.
yes, let's remove this unrelated change
ArthurZucker
left a comment
There was a problem hiding this comment.
Very nice! Thanks for making it so easy to review with modular!
| if config.decoder_cls_embed_share: | ||
| _class_embed = MMGroundingDinoContrastiveEmbedding(config) | ||
| self.class_embed = nn.ModuleList([_class_embed for _ in range(config.decoder_layers)]) | ||
| else: | ||
| module_list = [] | ||
| for _ in range(config.decoder_layers): | ||
| _class_embed = MMGroundingDinoContrastiveEmbedding(config) | ||
| module_list.append(_class_embed) | ||
| self.class_embed = nn.ModuleList(module_list) | ||
|
|
||
| if config.decoder_bbox_embed_share: | ||
| _bbox_embed = MMGroundingDinoMLPPredictionHead( | ||
| input_dim=config.d_model, hidden_dim=config.d_model, output_dim=4, num_layers=3 | ||
| ) | ||
| self.bbox_embed = nn.ModuleList([_bbox_embed for _ in range(config.decoder_layers)]) | ||
| else: | ||
| module_list = [] | ||
| for _ in range(config.decoder_layers): | ||
| _bbox_embed = MMGroundingDinoMLPPredictionHead( | ||
| input_dim=config.d_model, hidden_dim=config.d_model, output_dim=4, num_layers=3 | ||
| ) | ||
| module_list.append(_bbox_embed) | ||
| self.bbox_embed = nn.ModuleList(module_list) |
There was a problem hiding this comment.
in general we should not have so many code paths! Each case would lead to a different Embedding.... Are there a model for each case?
There was a problem hiding this comment.
Both of these are always false, yes. I remember debating what to do. Cleanest thing would be to delete both decoder_bbox_embed_share and decoder_cls_embed_share and have no branching here. The issue is that original GroundingDino requires decoder_bbox_embed_share parameter, so I was kinda stuck with using it here as well.
There was a problem hiding this comment.
@rziga if we are overriding the entire method, I suppose we can remove if/else code path here and remove these args from config (del decoder_bbox_embed_share in modular)
There was a problem hiding this comment.
I tried to do that just now. The issue is that decoder_bbox_embed_share is used in a check inside config, so it doesn't get removed during modular conversion. Do I just copy the whole config class to remove the check?
There was a problem hiding this comment.
ahh, I see! Yes, let's copy and modify the whole config class then
There was a problem hiding this comment.
Do I keep the modified config in modular file or just move it to configuration file?
There was a problem hiding this comment.
Never mind, I need to keep it in modular otherwise the conversion deletes it.
| """ | ||
| Preprocess an image or batch of images. | ||
| """ |
There was a problem hiding this comment.
yes, let's remove this unrelated change
| self.backbone = MMGroundingDinoConvModel(backbone, position_embeddings) | ||
|
|
||
| # Create input projection layers | ||
| if config.num_feature_levels > 1: |
There was a problem hiding this comment.
Same here, can we remove this if/else? I suppose num_feature_levels > 1 for all checkpoints
There was a problem hiding this comment.
Good catch. I'm not really sure why this if/else even is here. The first branch seems to produce the same thing as the second one.
|
run-slow: mm_grounding_dino |
|
This comment contains run-slow, running the specified jobs: models: ['models/mm_grounding_dino'] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, grounding_dino, mm_grounding_dino |
|
run-slow: grounding_dino, mm_grounding_dino |
|
This comment contains run-slow, running the specified jobs: models: ['models/grounding_dino', 'models/mm_grounding_dino'] |
|
Huge thanks for the contribution @rziga! Great work 🤗 |
|
@qubvel here support for llmdet is only added for the inference ,we have any support for training also ??can I pls take up the issue if its planned or open for community |
* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
What does this PR do?
Fixes #37744.
It adds support for MM Grounding DINO and LLMDet (inference only).
TODO
[ ] fix failing tests
Tagging @qubvel