Add LWDetr model by sbucaille · Pull Request #40991 · huggingface/transformers

sbucaille · 2025-09-19T02:54:30Z

What does this PR do?

Adds LWDetr model.
In #36895 I started working on adding RFDetr, but after putting some work I realized that it relies a LOT on LWDetr.
Adding RFDetr will essentially replace the ViT encoder by Dino, so the biggest part of the work is on the implementation of LWDetr, which could also be a good alternative for people to use for their use cases.

Who can review?

Still work in progress but since @yonigozlan asked for an update here it is.
All the inference code is implemented. A lot of refactoring/renaming is still needed and I'm writing the tests to be able to do that safely. In the meantime you can check the code and let me know if you have comments.

@qubvel

sbucaille · 2025-09-22T21:28:13Z

@yonigozlan @qubvel ready for a first review

yonigozlan

Thank you for working on this @sbucaille, very clean PR! There a few things to change, but mostly formatting related.
I haven't checked thoroughly the docs or the tests yet, but It would be great to have some integration tests as well.

yonigozlan · 2025-09-29T16:36:21Z

+logger = logging.get_logger(__name__)
+
+
+class LwDetrImageProcessor(DeformableDetrImageProcessor):


No need to add new image processors if they are exactly the same as the deformable detr ones, let's just use the existing ones.
Also, as we move away from slow image processors for v5, let's have LWDetr support only the deformable detr fast image processor in the auto file.

I removed the LwDetrImageProcessors in 70ae6c7
But I also removed mentions of lw detr in image_processing_auto file, am I right ?

Thanks! We still need to have an auto mapping to DeformableDetrImageProcessorFast in image_processing_auto though.

yonigozlan · 2025-09-29T22:45:54Z

+            The dropout ratio for activations inside the fully connected layer.
+        position_embedding_type (`str`, *optional*, defaults to `"sine"`):
+            Type of position embeddings to be used on top of the image features. One of `"sine"` or `"learned"`.
+        two_stage (`bool`, *optional*, defaults to `True`):


Is this ever not True? Otherwise, let's remove it and all the related logic paths to reduce complexity

Fixed in 1ff561a

yonigozlan · 2025-09-29T22:53:06Z

+    attribute_map = {
+        "hidden_size": "d_model",
+        "num_attention_heads": "decoder_self_attention_heads",
+        "num_key_value_heads": "decoder_self_attention_heads",


Why have num_key_value_heads in the attribute map and properties at all?

This is because of the LlamaAttention requires these attributes to be properly initialized, with these two lines :

self.head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads) self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads

Ok, let's keep this for now, and once the vision models refactoring PR is merged we can inherit from a more appropriate attention module.

No, let's not do that, way too complicated! We instead redefine Llama's init if needed! But we don't want to pollute the config!

yonigozlan · 2025-09-29T23:04:34Z

+        return y
+
+
+class LwDetrCSPRepLayer(nn.Module):


From the paper, this seems to be a Cross-Stage Partial with 2F connections (C2F) and not a CSP. Let's rename it :)

Indeed, renamed in 0f2e2c2

yonigozlan · 2025-09-30T00:16:28Z

+    def with_pos_embed(self, tensor: torch.Tensor, position_embeddings: Optional[torch.Tensor]):
+        return tensor if position_embeddings is None else tensor + position_embeddings


I know it's present in deformable attention in existing modeling files. but there's really no need to have this in a separate function

Fixed in 46da0f7

yonigozlan · 2025-09-30T01:01:10Z

+            many stages the model has). If unset and `out_features` is set, will default to the corresponding stages.
+            If unset and `out_features` is unset, will default to the last stage. Must be in the
+            same order as defined in the `stage_names` attribute.
+        use_cae (`bool`, *optional*, defaults to `True`):


Is this ever not True?

In the checkpoints provided by the authors, indeed it is always true, I removed the logic in b6e08e9

yonigozlan · 2025-09-30T01:03:27Z

+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        head_mask: Optional[torch.Tensor] = None,
+        **kwargs: Unpack[TransformersKwargs],
+    ) -> tuple[torch.Tensor, torch.Tensor]:
+        batch_size = hidden_states.shape[0]
+        new_shape = batch_size, -1, self.num_attention_heads, self.attention_head_size
+
+        key_layer = self.key(hidden_states).view(*new_shape).transpose(1, 2)
+        value_layer = self.value(hidden_states).view(*new_shape).transpose(1, 2)
+        query_layer = self.query(hidden_states).view(*new_shape).transpose(1, 2)
+
+        attention_interface: Callable = eager_attention_forward
+        if self.config._attn_implementation != "eager":
+            attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
+
+        context_layer, attention_probs = attention_interface(
+            self,
+            query_layer,
+            key_layer,
+            value_layer,
+            head_mask,
+            is_causal=self.is_causal,
+            scaling=self.scaling,
+            dropout=0.0 if not self.training else self.dropout_prob,
+            **kwargs,
+        )
+
+        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
+        context_layer = context_layer.reshape(new_context_layer_shape)
+
+        return context_layer, attention_probs


Let's update ViTSelfAttention forward instead :)

Update: I'll also do that in the global refactor, no need to do it here

yonigozlan · 2025-09-30T01:05:24Z

+
+        self.window = layer_idx in config.window_block_indices
+        self.num_windows = config.num_windows
+        self.num_windows_side = int(math.sqrt(self.num_windows))


self.num_windows_side is not used I think?

Indeed, fixed in ca9f4bc

yonigozlan · 2025-09-30T01:08:23Z

+        list_hidden_states = []
+        list_hidden_states.append(hidden_states)


Suggested change

list_hidden_states = []

list_hidden_states.append(hidden_states)

list_hidden_states = [hidden_states]

Fixed in ca9f4bc

yonigozlan · 2025-09-30T01:10:53Z

+        "attentions": LwDetrViTSelfAttention,
+    }
+
+    def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None:


Easier and more accurate to not type this

Suggested change

def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None:

def _init_weights(self, module):

Fixed in ca9f4bc

yonigozlan · 2025-10-18T17:12:40Z

hey @sbucaille ! Just checking in to see if I should make another pass at this. Don't hesitate if you need any help!

sbucaille · 2025-10-19T00:30:00Z

Hey @yonigozlan, thanks for the follow up ! Finally got some time again to put into the PR, I've addressed most of your comments but I'd like to reply to some here regarding the attention implementations that you suggest to rewrite.
I thought about this when dealing with ViT as well as DeformableDetr (or other Detr models) (indeed that's a lot of work around in this PR for now) but thought that rewriting these models attentions would not belong to this PR. Although I'd be happy to contribute to updating these models attention implementations to follow Llama standards but I think it would belong to another PR, let me know what you think.
Regarding the tests, I have some issue with the following :

tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels PASSED                                           [ 11%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_01_fp16_pad_left PASSED                                                        [ 11%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_02_fp16_pad_left_no_attn_mask_sdpa_kernels PASSED                              [ 12%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_03_fp16_pad_left_no_attn_mask PASSED                                           [ 12%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels PASSED                                          [ 13%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_05_fp16_pad_right PASSED                                                       [ 13%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_06_fp16_pad_right_no_attn_mask_sdpa_kernels PASSED                             [ 14%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_07_fp16_pad_right_no_attn_mask PASSED                                          [ 15%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_08_fp32_pad_left_sdpa_kernels FAILED                                           [ 15%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_09_fp32_pad_left FAILED                                                        [ 16%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_10_fp32_pad_left_no_attn_mask_sdpa_kernels FAILED                              [ 16%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_11_fp32_pad_left_no_attn_mask FAILED                                           [ 17%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_12_fp32_pad_right_sdpa_kernels FAILED                                          [ 17%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_13_fp32_pad_right FAILED                                                       [ 18%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_14_fp32_pad_right_no_attn_mask_sdpa_kernels FAILED                             [ 18%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_15_fp32_pad_right_no_attn_mask FAILED                                          [ 19%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_16_bf16_pad_left_sdpa_kernels PASSED                                           [ 20%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_17_bf16_pad_left PASSED                                                        [ 20%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_18_bf16_pad_left_no_attn_mask_sdpa_kernels PASSED                              [ 21%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_19_bf16_pad_left_no_attn_mask PASSED                                           [ 21%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_20_bf16_pad_right_sdpa_kernels PASSED                                          [ 22%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_21_bf16_pad_right PASSED                                                       [ 22%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_22_bf16_pad_right_no_attn_mask_sdpa_kernels PASSED                             [ 23%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_23_bf16_pad_right_no_attn_mask PASSED                                          [ 24%]
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelTest::test_eager_matches_sdpa_inference_24_fp32_pad_left_output_attentions FAILED                                      [ 24%]

Notice how it fails only on fp32 tests, have you had a similar problem in the past, what do you think would be a potential solution ?

That being said, you can have another pass on the PR if you have some time ! 😃

yonigozlan

Hey @sbucaille, thank you for iterating on this! We're getting closer, but there are some things still left to address that I had miss in my first review, and some other changes that need to be made to follow the recent deprecations introduced for v5.
Also I'm getting some errors on test_sdpa_can_compile_dynamic and test_sdpa_can_dispatch_on_flash that look like they could be due to the loss implementation. Can you check on your side?

I thought about this when dealing with ViT as well as DeformableDetr (or other Detr models) (indeed that's a lot of work around in this PR for now) but thought that rewriting these models attentions would not belong to this PR. Although I'd be happy to contribute to updating these models attention implementations to follow Llama standards but I think it would belong to another PR, let me know what you think.

Completely agree, I'm currently refactoring the vision models to update to the same standards we have in language models, and did not realize the magnitude of the task, so don't worry too much about this here!

Finally can you please rebase/merge with main? There has been some important breaking changes since :)

yonigozlan · 2025-10-20T13:16:30Z

+        drop_path_rate = config.drop_path_rates[layer_idx]
+        self.drop_path = LwDetrViTDropPath(drop_path_rate) if drop_path_rate > 0.0 else nn.Identity()


No need to add drop path rate support here, it seems like the field is moving away from using this, and it should make the code a bit cleaner

You are right, drop_path_rate is 0.0 by default anyway, I removed it in 1a49c2a

yonigozlan · 2025-10-20T13:17:09Z

+    def prune_heads(self, heads: set[int]):
+        if len(heads) == 0:
+            return
+        heads, index = find_pruneable_heads_and_indices(
+            heads, self.attention.num_attention_heads, self.attention.attention_head_size, self.pruned_heads
+        )
+
+        # Prune linear layers
+        self.attention.query = prune_linear_layer(self.attention.query, index)
+        self.attention.key = prune_linear_layer(self.attention.key, index)
+        self.attention.value = prune_linear_layer(self.attention.value, index)
+        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
+
+        # Update hyper params and store pruned heads
+        self.attention.num_attention_heads = self.attention.num_attention_heads - len(heads)
+        self.attention.all_head_size = self.attention.attention_head_size * self.attention.num_attention_heads
+        self.pruned_heads = self.pruned_heads.union(heads)


Same for pruning heads in attention, we're deprecating this everywhere for v5

Same as above, this is copied from ViT using modular

yonigozlan · 2025-10-20T13:45:52Z

+class LwDetrMultiscaleDeformableAttention(DeformableDetrMultiscaleDeformableAttention):
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_mask: Optional[torch.Tensor] = None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        position_embeddings: Optional[torch.Tensor] = None,
+        reference_points=None,
+        spatial_shapes=None,
+        spatial_shapes_list=None,
+        level_start_index=None,
+        **kwargs: Unpack[TransformersKwargs],
+    ):
+        return super().forward(
+            hidden_states=hidden_states,
+            attention_mask=attention_mask,
+            encoder_hidden_states=encoder_hidden_states,
+            encoder_attention_mask=encoder_attention_mask,
+            position_embeddings=position_embeddings,
+            reference_points=reference_points,
+            spatial_shapes=spatial_shapes,
+            spatial_shapes_list=spatial_shapes_list,
+            level_start_index=level_start_index,
+            **kwargs,
+        )


Update: I'm currently refactoring a lot of vision models, so maybe it's better to do this as part of this refactoring than here :)

yonigozlan · 2025-10-20T13:46:30Z

+        self.activation_fn = ACT2FN[config.decoder_activation_function]
+        self.fc1 = nn.Linear(config.d_model, config.decoder_ffn_dim)
+        self.fc2 = nn.Linear(config.decoder_ffn_dim, config.d_model)
+        self.layer_norm = nn.LayerNorm(config.d_model)


Let's put the layer norm outside the MLP module

Fixed in 2b2ab68

yonigozlan · 2025-10-20T14:30:41Z

+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        head_mask: Optional[torch.Tensor] = None,


Let's remove all the logic related to head_mask, as we're deprecating it everywhere in the library for v5: #41076

All head_mask occurences removed in ee52daf

yonigozlan · 2025-10-20T14:31:10Z

+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        head_mask: Optional[torch.Tensor] = None,
+        **kwargs: Unpack[TransformersKwargs],
+    ) -> tuple[torch.Tensor, torch.Tensor]:
+        batch_size = hidden_states.shape[0]
+        new_shape = batch_size, -1, self.num_attention_heads, self.attention_head_size
+
+        key_layer = self.key(hidden_states).view(*new_shape).transpose(1, 2)
+        value_layer = self.value(hidden_states).view(*new_shape).transpose(1, 2)
+        query_layer = self.query(hidden_states).view(*new_shape).transpose(1, 2)
+
+        attention_interface: Callable = eager_attention_forward
+        if self.config._attn_implementation != "eager":
+            attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
+
+        context_layer, attention_probs = attention_interface(
+            self,
+            query_layer,
+            key_layer,
+            value_layer,
+            head_mask,
+            is_causal=self.is_causal,
+            scaling=self.scaling,
+            dropout=0.0 if not self.training else self.dropout_prob,
+            **kwargs,
+        )
+
+        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
+        context_layer = context_layer.reshape(new_context_layer_shape)
+
+        return context_layer, attention_probs


Update: I'll also do that in the global refactor, no need to do it here

yonigozlan · 2025-10-20T14:32:05Z

+    @unittest.skip(reason="RTDetr does not use inputs_embeds")
+    def test_inputs_embeds(self):
+        pass
+
+    @unittest.skip(reason="RTDetr does not use test_inputs_embeds_matches_input_ids")
+    def test_inputs_embeds_matches_input_ids(self):
+        pass
+
+    @unittest.skip(reason="RTDetr does not support input and output embeddings")
+    def test_model_get_set_embeddings(self):
+        pass
+
+    @unittest.skip(reason="RTDetr does not support input and output embeddings")
+    def test_model_common_attributes(self):
+        pass
+
+    @unittest.skip(reason="RTDetr does not use token embeddings")
+    def test_resize_tokens_embeddings(self):
+        pass
+
+    @unittest.skip(reason="Feed forward chunking is not implemented")
+    def test_feed_forward_chunking(self):
+        pass


Let's update the model names

Fixed in d3fb2af

yonigozlan · 2025-10-20T14:35:58Z

+        self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads
+        self.scaling = self.head_dim**-0.5
+        self.attention_dropout = config.attention_dropout
+        self.is_causal = True


Should be false here! let's override in the modular file. It's probably why you're getting errors in the sdpa eager inference tests

Ahhh yes thank you, not the first time I get tricked by this line 😅 Fixed in 4716a12

yonigozlan · 2025-10-20T14:46:05Z

+rendered properly in your Markdown viewer.
+
+-->
+*This model was released on {release_date} and added to Hugging Face Transformers on 2025-08-28.* 


Let's find a release date for this model :).
running make repo-consistency or python utils/add_dates.py after merging with main should generate one, and update the transformers release date as well

Release date added ! 234aae2

sbucaille · 2025-10-21T04:23:33Z

Hey @yonigozlan thanks for the comments, I addressed them. Regarding the vision models refactor, do you have a branch somewhere so that I can anticipate the changes on my side ?
The branch is rebased on main and I dealt with the latest changes !
For the flash attention tests I'll try to get access to a CUDA machine to investigate it

sbucaille · 2025-10-21T17:27:48Z

Managed to fix the test_sdpa_can_compile_dynamic and test_sdpa_can_dispatch_on_flash tests, it was a matter of dtype in the loss function, please let me know if the solution I found is correct in d2fed66. I still have flash attention failures on my own, I don't know if it is setup related but I have this kind of error for test_flash_attn_2_inference_equivalence_right_padding and test_flash_attn_2_inference_equivalence:

E       RuntimeError: schema_.has_value() INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.h":80, please report a bug to PyTorch. Tried to access the schema for  which doesn't have a schema registered yet

yonigozlan

Hey @sbucaille, sorry for the delay! Some very small things left to address, but overall it's looking almost ready to merge for me! Pinging @ArthurZucker and @Cyrilvallez for core maintainer review

yonigozlan · 2025-10-29T13:51:02Z

+    attribute_map = {
+        "hidden_size": "d_model",
+        "num_attention_heads": "decoder_self_attention_heads",
+        "num_key_value_heads": "decoder_self_attention_heads",


Ok, let's keep this for now, and once the vision models refactoring PR is merged we can inherit from a more appropriate attention module.

yonigozlan · 2025-10-29T14:06:34Z

+        """
+        batch_size = enc_output.shape[0]
+        proposals = []
+        _cur = 0


Ok I see, adding it to the need-refactoring list then 😅

yonigozlan · 2025-10-29T14:10:47Z

+        self.model.enc_out_bbox_embed = _get_clones(self.bbox_embed, config.group_detr)
+        self.model.enc_out_class_embed = _get_clones(self.class_embed, config.group_detr)


On a second look why is this needed at all? these modules are already instantiated in self.model

Leftovers from initial implementation, it does not make sense to keep it indeed, removed in 8bd0b33

yonigozlan · 2025-10-29T14:15:53Z

+                    model_batched_output[key] = model_batched_output[key][1:]
+                    model_row_output[key] = model_row_output[key][1:]
+                recursive_check(model_batched_output[key], model_row_output[key], model_name, key)
+


Unclear why some of the basic tests above need to be overridden. Could you add comments explaining why? It makes it easier to maintain

Actually, I forgot to add the _prepare_for_class method like in other ObjectDetection test modeling, now it's added and I removed this batching_equivalence overwrite, see in d2c8df8

Btw, I don't mind doing the refactoring of this prepare_for_class "special case for head models" that is present in a lot of object detection test modeling files, in another PR

sbucaille · 2025-10-30T14:11:50Z

@yonigozlan addressed the few comments, also added an additional test to cover large models which have differences in the MultiScaleProjector module

…ion parameters from LwDetrConfig

sbucaille · 2026-01-10T23:32:48Z

Hey @Cyrilvallez , happy new year ! I addressed most of your comments. When you removed LlamaAttention dependency, you also removed num_key_value_groups which is required by the llama eager_attention_forward function to run, was this intended ? or should I use an eager function from another model ?

Cyrilvallez · 2026-01-12T16:14:28Z

run-slow: lw_detr

github-actions · 2026-01-12T16:15:43Z

This comment contains run-slow, running the specified jobs:

models: ["models/lw_detr"]
quantizations: []

github-actions · 2026-01-12T16:29:50Z

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

lw_detr:
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelIntegrationTest::test_inference_object_detection_head_tiny
tests/models/lw_detr/test_modeling_lw_detr.py::LwDetrModelIntegrationTest::test_inference_object_detection_head_xlarge

github-actions · 2026-01-12T16:39:57Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, lw_detr

Cyrilvallez

All right! Thanks for bearing with us! Nice job!
I've pushed the final change, this is now ready to be merged 🤗🚀
Congrats again!

HuggingFaceDocBuilderDev · 2026-01-12T16:54:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-01-12T16:54:51Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=40991&sha=131245

* feat: add LWDetr model * fix: changed LwDetrVit base classes from VitDet to ViT * tests: added tests for LWDetr * refactor: fix all issues and created docs * tests: added missing lw_detr_vit tests * docs: add lwdetr docs * fix: fixed implementation error and associated tests * chore: removed testing lib in imports * refactor: replace LwDetrImageProcessor with DeformableDetrImageProcessor * refactor: remove two-stage detection and bounding box reparameterization parameters from LwDetrConfig * refactor: rename LwDetrCSPRepLayer to LwDetrC2FLayer * refactor: introduce LwDetrMLP for feedforward layers in decoder * refactor: replace build_position_encoding with LwDetrSinePositionEmbedding * refactor: remove use_cae parameter and related logic from configuration and modeling files in LwDetrVit * refactor: remove unused variables and simplify certain instructions * refactor: removed unnecessary one line instruction method with_pos_embed * refactor: use llama attention formatting for hidden shape * docs: add comments about group detr * fix: removed wrong sigmoid and fixed init for class_embed * refactor: removed unused positional embeddings classes and weights from backbone * chore: removed unused import * chore: make style and repo-consistency after positional embeddings removal * refactor: removed unused drop path rate * fix: ingest latest changes from rebase * fix: attn_implementation setter * fix: is causal set to False * refactor: renamed ffn to mlp and moved layer norm out of mlp * fix: check model inputs * fix: moved super init call in LwDetrConfig * fix: super class in GradientCheckpointingLayer * fix: replaced RTDetr occurences by LwDetr in test modeling file * refactor: removed head_mask from LwDetrViT * docs: added release date in docs * fix: added missing attention mask argument * chore: make style & repo-consistency * fix: ensure tensor dtype consistency in loss calculations and test cases * docs: fixed model release date * refactor: removed unnecessary module cloning * tests: added missing _prepare_for_class method and removed batching_equivalence overwrite * tests: added xlarge integration test * chore: added lw_detr reference in image processing auto * chore: removed unnecessary properties from LwDetrConfig * fix: fix for latest main changes * fix: apply modular changes from mail * docs: update model doc and docstrings * fix: style * fix: update output values in convert script * feat: added proper last_hidden_states in LwDetrDecoderOutput and separated logits and pred_boxes from outputs_class and outputs_coord * fix: guard accelerate imports * fix: removed LWDetrConfig attribute map and changed LwDetrAttention init to reflect * fix: parameterize amap based on config * fix: remove redundant decorator * chore: moved LwDetrViT to LwDetr single modular file * fix: remove unnecessary attribute_map in LwDetrViT * chore: simplified LwDetr modules methods with proper hidden_states return tuple * fix: replaced hardcoded value by variable * tests: added VitDet and attention tests * fix: modular conversion * tests: moved LwDetrViT tests to test_modeling_lw_detr file * docs: add lwdetr advances in docs * refactor: removed arguments to classes as much as possible and rely on config * reapply style, remove LlamaAttention inheritance to remove decorator * chore: updated licence and year * fix: removed torch.nn.functional from modular * docs: removed redundant docstring arguments covered by autodocstring decorator * refactor: removed backbone api statements * fix: added back num_key_value_groups in LwDetrAttention * chore: removed unnecessary copied from statement * chore: moved LwDetrViT modules above LwDetr modules * tests: removed unnecessary overwrite and “test_” attributes * docs: added missing docs * style: remove unnecessary parentheses * docs: added back logits docstring * docs: added docs dates * style details * unessecary utf8 * might as well skip all config checks * embeddings are large, increase model_split_percents * fix device issue * update logits * set device in expectations * add to toctree --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

sbucaille marked this pull request as draft September 19, 2025 02:56

sbucaille force-pushed the add_lw_detr branch from 6a4faac to 15aa78c Compare September 22, 2025 21:27

sbucaille marked this pull request as ready for review September 22, 2025 21:27

sbucaille changed the title ~~[WIP] Add LWDetr model~~ Add LWDetr model Sep 22, 2025

github-actions Bot requested review from ArthurZucker and Rocketknight1 September 22, 2025 21:28

sbucaille force-pushed the add_lw_detr branch from f198fbc to 03e3d83 Compare September 26, 2025 02:54

yonigozlan reviewed Sep 30, 2025

View reviewed changes

sbucaille force-pushed the add_lw_detr branch from 0c1f23b to f35fb75 Compare October 19, 2025 00:26

yonigozlan reviewed Oct 20, 2025

View reviewed changes

sbucaille force-pushed the add_lw_detr branch from f35fb75 to c4f2893 Compare October 21, 2025 02:50

yonigozlan approved these changes Oct 29, 2025

View reviewed changes

sbucaille force-pushed the add_lw_detr branch 2 times, most recently from 284b51c to 47c1cb4 Compare November 22, 2025 03:15

sbucaille and others added 10 commits December 3, 2025 12:51

feat: add LWDetr model

23631b1

fix: changed LwDetrVit base classes from VitDet to ViT

0eb69e6

tests: added tests for LWDetr

12fa5e2

refactor: fix all issues and created docs

aceb10c

tests: added missing lw_detr_vit tests

bd48206

docs: add lwdetr docs

06c7d70

fix: fixed implementation error and associated tests

a2ef8c3

chore: removed testing lib in imports

9faeaee

refactor: replace LwDetrImageProcessor with DeformableDetrImageProcessor

0fad340

refactor: remove two-stage detection and bounding box reparameterizat…

a89f8f2

…ion parameters from LwDetrConfig

sbucaille added 10 commits January 10, 2026 22:18

refactor: removed backbone api statements

11727ab

fix: added back num_key_value_groups in LwDetrAttention

8b8feb8

chore: removed unnecessary copied from statement

c68e713

chore: moved LwDetrViT modules above LwDetr modules

d0cfb7a

tests: removed unnecessary overwrite and “test_” attributes

c03d5ee

docs: added missing docs

3342ea8

Merge remote-tracking branch 'upstream/main' into add_lw_detr

b01b7c5

style: remove unnecessary parentheses

8e2753e

docs: added back logits docstring

06ac9fb

docs: added docs dates

b150f23

Cyrilvallez and others added 6 commits January 12, 2026 16:26

Merge branch 'main' into add_lw_detr

6f89388

style details

8a7818f

unessecary utf8

e5c20a0

might as well skip all config checks

c745f90

embeddings are large, increase model_split_percents

098fb4d

fix device issue

93ba55a

Cyrilvallez added 2 commits January 12, 2026 17:34

update logits

aabdb76

set device in expectations

0493e06

Cyrilvallez approved these changes Jan 12, 2026

View reviewed changes

add to toctree

1312458

Cyrilvallez merged commit d8541c3 into huggingface:main Jan 12, 2026
22 of 25 checks passed

sbucaille deleted the add_lw_detr branch January 12, 2026 20:47

		logger = logging.get_logger(__name__)


		class LwDetrImageProcessor(DeformableDetrImageProcessor):

		def with_pos_embed(self, tensor: torch.Tensor, position_embeddings: Optional[torch.Tensor]):
		return tensor if position_embeddings is None else tensor + position_embeddings

		list_hidden_states = []
		list_hidden_states.append(hidden_states)

	list_hidden_states = []
	list_hidden_states.append(hidden_states)
	list_hidden_states = [hidden_states]

	def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None:
	def _init_weights(self, module):

		drop_path_rate = config.drop_path_rates[layer_idx]
		self.drop_path = LwDetrViTDropPath(drop_path_rate) if drop_path_rate > 0.0 else nn.Identity()

		self.model.enc_out_bbox_embed = _get_clones(self.bbox_embed, config.group_detr)
		self.model.enc_out_class_embed = _get_clones(self.class_embed, config.group_detr)

Conversation

sbucaille commented Sep 19, 2025

What does this PR do?

Who can review?

Uh oh!

sbucaille commented Sep 22, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan commented Oct 18, 2025

Uh oh!

sbucaille commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sbucaille commented Oct 19, 2025 •

edited

Loading