Add LWDetr model#40991
Conversation
6a4faac to
15aa78c
Compare
|
@yonigozlan @qubvel ready for a first review |
f198fbc to
03e3d83
Compare
yonigozlan
left a comment
There was a problem hiding this comment.
Thank you for working on this @sbucaille, very clean PR! There a few things to change, but mostly formatting related.
I haven't checked thoroughly the docs or the tests yet, but It would be great to have some integration tests as well.
| logger = logging.get_logger(__name__) | ||
|
|
||
|
|
||
| class LwDetrImageProcessor(DeformableDetrImageProcessor): |
There was a problem hiding this comment.
No need to add new image processors if they are exactly the same as the deformable detr ones, let's just use the existing ones.
Also, as we move away from slow image processors for v5, let's have LWDetr support only the deformable detr fast image processor in the auto file.
There was a problem hiding this comment.
I removed the LwDetrImageProcessors in 70ae6c7
But I also removed mentions of lw detr in image_processing_auto file, am I right ?
There was a problem hiding this comment.
Thanks! We still need to have an auto mapping to DeformableDetrImageProcessorFast in image_processing_auto though.
| The dropout ratio for activations inside the fully connected layer. | ||
| position_embedding_type (`str`, *optional*, defaults to `"sine"`): | ||
| Type of position embeddings to be used on top of the image features. One of `"sine"` or `"learned"`. | ||
| two_stage (`bool`, *optional*, defaults to `True`): |
There was a problem hiding this comment.
Is this ever not True? Otherwise, let's remove it and all the related logic paths to reduce complexity
| attribute_map = { | ||
| "hidden_size": "d_model", | ||
| "num_attention_heads": "decoder_self_attention_heads", | ||
| "num_key_value_heads": "decoder_self_attention_heads", |
There was a problem hiding this comment.
Why have num_key_value_heads in the attribute map and properties at all?
There was a problem hiding this comment.
This is because of the LlamaAttention requires these attributes to be properly initialized, with these two lines :
self.head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads)
self.num_key_value_groups = config.num_attention_heads // config.num_key_value_headsThere was a problem hiding this comment.
Ok, let's keep this for now, and once the vision models refactoring PR is merged we can inherit from a more appropriate attention module.
There was a problem hiding this comment.
No, let's not do that, way too complicated! We instead redefine Llama's init if needed! But we don't want to pollute the config!
| return y | ||
|
|
||
|
|
||
| class LwDetrCSPRepLayer(nn.Module): |
There was a problem hiding this comment.
From the paper, this seems to be a Cross-Stage Partial with 2F connections (C2F) and not a CSP. Let's rename it :)
| def with_pos_embed(self, tensor: torch.Tensor, position_embeddings: Optional[torch.Tensor]): | ||
| return tensor if position_embeddings is None else tensor + position_embeddings |
There was a problem hiding this comment.
I know it's present in deformable attention in existing modeling files. but there's really no need to have this in a separate function
| many stages the model has). If unset and `out_features` is set, will default to the corresponding stages. | ||
| If unset and `out_features` is unset, will default to the last stage. Must be in the | ||
| same order as defined in the `stage_names` attribute. | ||
| use_cae (`bool`, *optional*, defaults to `True`): |
There was a problem hiding this comment.
In the checkpoints provided by the authors, indeed it is always true, I removed the logic in b6e08e9
| def forward( | ||
| self, | ||
| hidden_states: torch.Tensor, | ||
| head_mask: Optional[torch.Tensor] = None, | ||
| **kwargs: Unpack[TransformersKwargs], | ||
| ) -> tuple[torch.Tensor, torch.Tensor]: | ||
| batch_size = hidden_states.shape[0] | ||
| new_shape = batch_size, -1, self.num_attention_heads, self.attention_head_size | ||
|
|
||
| key_layer = self.key(hidden_states).view(*new_shape).transpose(1, 2) | ||
| value_layer = self.value(hidden_states).view(*new_shape).transpose(1, 2) | ||
| query_layer = self.query(hidden_states).view(*new_shape).transpose(1, 2) | ||
|
|
||
| attention_interface: Callable = eager_attention_forward | ||
| if self.config._attn_implementation != "eager": | ||
| attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation] | ||
|
|
||
| context_layer, attention_probs = attention_interface( | ||
| self, | ||
| query_layer, | ||
| key_layer, | ||
| value_layer, | ||
| head_mask, | ||
| is_causal=self.is_causal, | ||
| scaling=self.scaling, | ||
| dropout=0.0 if not self.training else self.dropout_prob, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,) | ||
| context_layer = context_layer.reshape(new_context_layer_shape) | ||
|
|
||
| return context_layer, attention_probs |
There was a problem hiding this comment.
Let's update ViTSelfAttention forward instead :)
There was a problem hiding this comment.
Update: I'll also do that in the global refactor, no need to do it here
|
|
||
| self.window = layer_idx in config.window_block_indices | ||
| self.num_windows = config.num_windows | ||
| self.num_windows_side = int(math.sqrt(self.num_windows)) |
There was a problem hiding this comment.
self.num_windows_side is not used I think?
| list_hidden_states = [] | ||
| list_hidden_states.append(hidden_states) |
There was a problem hiding this comment.
| list_hidden_states = [] | |
| list_hidden_states.append(hidden_states) | |
| list_hidden_states = [hidden_states] |
| "attentions": LwDetrViTSelfAttention, | ||
| } | ||
|
|
||
| def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None: |
There was a problem hiding this comment.
Easier and more accurate to not type this
| def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None: | |
| def _init_weights(self, module): |
|
hey @sbucaille ! Just checking in to see if I should make another pass at this. Don't hesitate if you need any help! |
0c1f23b to
f35fb75
Compare
|
Hey @yonigozlan, thanks for the follow up ! Finally got some time again to put into the PR, I've addressed most of your comments but I'd like to reply to some here regarding the attention implementations that you suggest to rewrite. tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels PASSED [ 11%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_01_fp16_pad_left PASSED [ 11%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_02_fp16_pad_left_no_attn_mask_sdpa_kernels PASSED [ 12%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_03_fp16_pad_left_no_attn_mask PASSED [ 12%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels PASSED [ 13%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_05_fp16_pad_right PASSED [ 13%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_06_fp16_pad_right_no_attn_mask_sdpa_kernels PASSED [ 14%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_07_fp16_pad_right_no_attn_mask PASSED [ 15%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_08_fp32_pad_left_sdpa_kernels FAILED [ 15%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_09_fp32_pad_left FAILED [ 16%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_10_fp32_pad_left_no_attn_mask_sdpa_kernels FAILED [ 16%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_11_fp32_pad_left_no_attn_mask FAILED [ 17%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_12_fp32_pad_right_sdpa_kernels FAILED [ 17%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_13_fp32_pad_right FAILED [ 18%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_14_fp32_pad_right_no_attn_mask_sdpa_kernels FAILED [ 18%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_15_fp32_pad_right_no_attn_mask FAILED [ 19%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_16_bf16_pad_left_sdpa_kernels PASSED [ 20%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_17_bf16_pad_left PASSED [ 20%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_18_bf16_pad_left_no_attn_mask_sdpa_kernels PASSED [ 21%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_19_bf16_pad_left_no_attn_mask PASSED [ 21%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_20_bf16_pad_right_sdpa_kernels PASSED [ 22%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_21_bf16_pad_right PASSED [ 22%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_22_bf16_pad_right_no_attn_mask_sdpa_kernels PASSED [ 23%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_23_bf16_pad_right_no_attn_mask PASSED [ 24%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_24_fp32_pad_left_output_attentions FAILED [ 24%]Notice how it fails only on fp32 tests, have you had a similar problem in the past, what do you think would be a potential solution ? That being said, you can have another pass on the PR if you have some time ! 😃 |
yonigozlan
left a comment
There was a problem hiding this comment.
Hey @sbucaille, thank you for iterating on this! We're getting closer, but there are some things still left to address that I had miss in my first review, and some other changes that need to be made to follow the recent deprecations introduced for v5.
Also I'm getting some errors on test_sdpa_can_compile_dynamic and test_sdpa_can_dispatch_on_flash that look like they could be due to the loss implementation. Can you check on your side?
I thought about this when dealing with ViT as well as DeformableDetr (or other Detr models) (indeed that's a lot of work around in this PR for now) but thought that rewriting these models attentions would not belong to this PR. Although I'd be happy to contribute to updating these models attention implementations to follow Llama standards but I think it would belong to another PR, let me know what you think.
Completely agree, I'm currently refactoring the vision models to update to the same standards we have in language models, and did not realize the magnitude of the task, so don't worry too much about this here!
Finally can you please rebase/merge with main? There has been some important breaking changes since :)
| drop_path_rate = config.drop_path_rates[layer_idx] | ||
| self.drop_path = LwDetrViTDropPath(drop_path_rate) if drop_path_rate > 0.0 else nn.Identity() |
There was a problem hiding this comment.
No need to add drop path rate support here, it seems like the field is moving away from using this, and it should make the code a bit cleaner
There was a problem hiding this comment.
You are right, drop_path_rate is 0.0 by default anyway, I removed it in 1a49c2a
| def prune_heads(self, heads: set[int]): | ||
| if len(heads) == 0: | ||
| return | ||
| heads, index = find_pruneable_heads_and_indices( | ||
| heads, self.attention.num_attention_heads, self.attention.attention_head_size, self.pruned_heads | ||
| ) | ||
|
|
||
| # Prune linear layers | ||
| self.attention.query = prune_linear_layer(self.attention.query, index) | ||
| self.attention.key = prune_linear_layer(self.attention.key, index) | ||
| self.attention.value = prune_linear_layer(self.attention.value, index) | ||
| self.output.dense = prune_linear_layer(self.output.dense, index, dim=1) | ||
|
|
||
| # Update hyper params and store pruned heads | ||
| self.attention.num_attention_heads = self.attention.num_attention_heads - len(heads) | ||
| self.attention.all_head_size = self.attention.attention_head_size * self.attention.num_attention_heads | ||
| self.pruned_heads = self.pruned_heads.union(heads) |
There was a problem hiding this comment.
Same for pruning heads in attention, we're deprecating this everywhere for v5
There was a problem hiding this comment.
Same as above, this is copied from ViT using modular
| class LwDetrMultiscaleDeformableAttention(DeformableDetrMultiscaleDeformableAttention): | ||
| def forward( | ||
| self, | ||
| hidden_states: torch.Tensor, | ||
| attention_mask: Optional[torch.Tensor] = None, | ||
| encoder_hidden_states=None, | ||
| encoder_attention_mask=None, | ||
| position_embeddings: Optional[torch.Tensor] = None, | ||
| reference_points=None, | ||
| spatial_shapes=None, | ||
| spatial_shapes_list=None, | ||
| level_start_index=None, | ||
| **kwargs: Unpack[TransformersKwargs], | ||
| ): | ||
| return super().forward( | ||
| hidden_states=hidden_states, | ||
| attention_mask=attention_mask, | ||
| encoder_hidden_states=encoder_hidden_states, | ||
| encoder_attention_mask=encoder_attention_mask, | ||
| position_embeddings=position_embeddings, | ||
| reference_points=reference_points, | ||
| spatial_shapes=spatial_shapes, | ||
| spatial_shapes_list=spatial_shapes_list, | ||
| level_start_index=level_start_index, | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
Update: I'm currently refactoring a lot of vision models, so maybe it's better to do this as part of this refactoring than here :)
| self.activation_fn = ACT2FN[config.decoder_activation_function] | ||
| self.fc1 = nn.Linear(config.d_model, config.decoder_ffn_dim) | ||
| self.fc2 = nn.Linear(config.decoder_ffn_dim, config.d_model) | ||
| self.layer_norm = nn.LayerNorm(config.d_model) |
There was a problem hiding this comment.
Let's put the layer norm outside the MLP module
| def forward( | ||
| self, | ||
| hidden_states: torch.Tensor, | ||
| head_mask: Optional[torch.Tensor] = None, |
| def forward( | ||
| self, | ||
| hidden_states: torch.Tensor, | ||
| head_mask: Optional[torch.Tensor] = None, | ||
| **kwargs: Unpack[TransformersKwargs], | ||
| ) -> tuple[torch.Tensor, torch.Tensor]: | ||
| batch_size = hidden_states.shape[0] | ||
| new_shape = batch_size, -1, self.num_attention_heads, self.attention_head_size | ||
|
|
||
| key_layer = self.key(hidden_states).view(*new_shape).transpose(1, 2) | ||
| value_layer = self.value(hidden_states).view(*new_shape).transpose(1, 2) | ||
| query_layer = self.query(hidden_states).view(*new_shape).transpose(1, 2) | ||
|
|
||
| attention_interface: Callable = eager_attention_forward | ||
| if self.config._attn_implementation != "eager": | ||
| attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation] | ||
|
|
||
| context_layer, attention_probs = attention_interface( | ||
| self, | ||
| query_layer, | ||
| key_layer, | ||
| value_layer, | ||
| head_mask, | ||
| is_causal=self.is_causal, | ||
| scaling=self.scaling, | ||
| dropout=0.0 if not self.training else self.dropout_prob, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,) | ||
| context_layer = context_layer.reshape(new_context_layer_shape) | ||
|
|
||
| return context_layer, attention_probs |
There was a problem hiding this comment.
Update: I'll also do that in the global refactor, no need to do it here
| @unittest.skip(reason="RTDetr does not use inputs_embeds") | ||
| def test_inputs_embeds(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="RTDetr does not use test_inputs_embeds_matches_input_ids") | ||
| def test_inputs_embeds_matches_input_ids(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="RTDetr does not support input and output embeddings") | ||
| def test_model_get_set_embeddings(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="RTDetr does not support input and output embeddings") | ||
| def test_model_common_attributes(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="RTDetr does not use token embeddings") | ||
| def test_resize_tokens_embeddings(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="Feed forward chunking is not implemented") | ||
| def test_feed_forward_chunking(self): | ||
| pass |
There was a problem hiding this comment.
Let's update the model names
| self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads | ||
| self.scaling = self.head_dim**-0.5 | ||
| self.attention_dropout = config.attention_dropout | ||
| self.is_causal = True |
There was a problem hiding this comment.
Should be false here! let's override in the modular file. It's probably why you're getting errors in the sdpa eager inference tests
There was a problem hiding this comment.
Ahhh yes thank you, not the first time I get tricked by this line 😅 Fixed in 4716a12
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
| *This model was released on {release_date} and added to Hugging Face Transformers on 2025-08-28.* |
There was a problem hiding this comment.
Let's find a release date for this model :).
running make repo-consistency or python utils/add_dates.py after merging with main should generate one, and update the transformers release date as well
f35fb75 to
c4f2893
Compare
|
Hey @yonigozlan thanks for the comments, I addressed them. Regarding the vision models refactor, do you have a branch somewhere so that I can anticipate the changes on my side ? |
|
Managed to fix the |
yonigozlan
left a comment
There was a problem hiding this comment.
Hey @sbucaille, sorry for the delay! Some very small things left to address, but overall it's looking almost ready to merge for me! Pinging @ArthurZucker and @Cyrilvallez for core maintainer review
| attribute_map = { | ||
| "hidden_size": "d_model", | ||
| "num_attention_heads": "decoder_self_attention_heads", | ||
| "num_key_value_heads": "decoder_self_attention_heads", |
There was a problem hiding this comment.
Ok, let's keep this for now, and once the vision models refactoring PR is merged we can inherit from a more appropriate attention module.
| """ | ||
| batch_size = enc_output.shape[0] | ||
| proposals = [] | ||
| _cur = 0 |
There was a problem hiding this comment.
Ok I see, adding it to the need-refactoring list then 😅
| self.model.enc_out_bbox_embed = _get_clones(self.bbox_embed, config.group_detr) | ||
| self.model.enc_out_class_embed = _get_clones(self.class_embed, config.group_detr) |
There was a problem hiding this comment.
On a second look why is this needed at all? these modules are already instantiated in self.model
There was a problem hiding this comment.
Leftovers from initial implementation, it does not make sense to keep it indeed, removed in 8bd0b33
| model_batched_output[key] = model_batched_output[key][1:] | ||
| model_row_output[key] = model_row_output[key][1:] | ||
| recursive_check(model_batched_output[key], model_row_output[key], model_name, key) | ||
|
|
There was a problem hiding this comment.
Unclear why some of the basic tests above need to be overridden. Could you add comments explaining why? It makes it easier to maintain
There was a problem hiding this comment.
Actually, I forgot to add the _prepare_for_class method like in other ObjectDetection test modeling, now it's added and I removed this batching_equivalence overwrite, see in d2c8df8
There was a problem hiding this comment.
Btw, I don't mind doing the refactoring of this prepare_for_class "special case for head models" that is present in a lot of object detection test modeling files, in another PR
|
@yonigozlan addressed the few comments, also added an additional test to cover large models which have differences in the |
284b51c to
47c1cb4
Compare
…ion parameters from LwDetrConfig
|
Hey @Cyrilvallez , happy new year ! I addressed most of your comments. When you removed |
|
run-slow: lw_detr |
|
This comment contains models: ["models/lw_detr"] |
CI ResultsModel CI Report❌ Failed tests
|
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, lw_detr |
Cyrilvallez
left a comment
There was a problem hiding this comment.
All right! Thanks for bearing with us! Nice job!
I've pushed the final change, this is now ready to be merged 🤗🚀
Congrats again!
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=40991&sha=131245 |
* feat: add LWDetr model * fix: changed LwDetrVit base classes from VitDet to ViT * tests: added tests for LWDetr * refactor: fix all issues and created docs * tests: added missing lw_detr_vit tests * docs: add lwdetr docs * fix: fixed implementation error and associated tests * chore: removed testing lib in imports * refactor: replace LwDetrImageProcessor with DeformableDetrImageProcessor * refactor: remove two-stage detection and bounding box reparameterization parameters from LwDetrConfig * refactor: rename LwDetrCSPRepLayer to LwDetrC2FLayer * refactor: introduce LwDetrMLP for feedforward layers in decoder * refactor: replace build_position_encoding with LwDetrSinePositionEmbedding * refactor: remove use_cae parameter and related logic from configuration and modeling files in LwDetrVit * refactor: remove unused variables and simplify certain instructions * refactor: removed unnecessary one line instruction method with_pos_embed * refactor: use llama attention formatting for hidden shape * docs: add comments about group detr * fix: removed wrong sigmoid and fixed init for class_embed * refactor: removed unused positional embeddings classes and weights from backbone * chore: removed unused import * chore: make style and repo-consistency after positional embeddings removal * refactor: removed unused drop path rate * fix: ingest latest changes from rebase * fix: attn_implementation setter * fix: is causal set to False * refactor: renamed ffn to mlp and moved layer norm out of mlp * fix: check model inputs * fix: moved super init call in LwDetrConfig * fix: super class in GradientCheckpointingLayer * fix: replaced RTDetr occurences by LwDetr in test modeling file * refactor: removed head_mask from LwDetrViT * docs: added release date in docs * fix: added missing attention mask argument * chore: make style & repo-consistency * fix: ensure tensor dtype consistency in loss calculations and test cases * docs: fixed model release date * refactor: removed unnecessary module cloning * tests: added missing _prepare_for_class method and removed batching_equivalence overwrite * tests: added xlarge integration test * chore: added lw_detr reference in image processing auto * chore: removed unnecessary properties from LwDetrConfig * fix: fix for latest main changes * fix: apply modular changes from mail * docs: update model doc and docstrings * fix: style * fix: update output values in convert script * feat: added proper last_hidden_states in LwDetrDecoderOutput and separated logits and pred_boxes from outputs_class and outputs_coord * fix: guard accelerate imports * fix: removed LWDetrConfig attribute map and changed LwDetrAttention init to reflect * fix: parameterize amap based on config * fix: remove redundant decorator * chore: moved LwDetrViT to LwDetr single modular file * fix: remove unnecessary attribute_map in LwDetrViT * chore: simplified LwDetr modules methods with proper hidden_states return tuple * fix: replaced hardcoded value by variable * tests: added VitDet and attention tests * fix: modular conversion * tests: moved LwDetrViT tests to test_modeling_lw_detr file * docs: add lwdetr advances in docs * refactor: removed arguments to classes as much as possible and rely on config * reapply style, remove LlamaAttention inheritance to remove decorator * chore: updated licence and year * fix: removed torch.nn.functional from modular * docs: removed redundant docstring arguments covered by autodocstring decorator * refactor: removed backbone api statements * fix: added back num_key_value_groups in LwDetrAttention * chore: removed unnecessary copied from statement * chore: moved LwDetrViT modules above LwDetr modules * tests: removed unnecessary overwrite and “test_” attributes * docs: added missing docs * style: remove unnecessary parentheses * docs: added back logits docstring * docs: added docs dates * style details * unessecary utf8 * might as well skip all config checks * embeddings are large, increase model_split_percents * fix device issue * update logits * set device in expectations * add to toctree --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
What does this PR do?
Adds LWDetr model.
In #36895 I started working on adding RFDetr, but after putting some work I realized that it relies a LOT on LWDetr.
Adding RFDetr will essentially replace the ViT encoder by Dino, so the biggest part of the work is on the implementation of LWDetr, which could also be a good alternative for people to use for their use cases.
Who can review?
Still work in progress but since @yonigozlan asked for an update here it is.
All the inference code is implemented. A lot of refactoring/renaming is still needed and I'm writing the tests to be able to do that safely. In the meantime you can check the code and let me know if you have comments.
@qubvel