[Model] Add SLANet Model Support by zhang-prog · Pull Request #45532 · huggingface/transformers

zhang-prog · 2026-04-20T12:41:39Z

No description provided.

vasqu

Some initial comments from my side, already looks pretty good! It's mostly smaller details and a new way we cache images for our tests

vasqu · 2026-04-20T20:01:54Z

+<hfoption id="AutoModel">
+
+```py
+import requests


Might have missed these the last models, but would be nice if we could use httpx instead - we cleaned our code base to use httpx internally so would be nice to have it in examples as well

vasqu · 2026-04-20T20:02:45Z

Neat, already used the auto mappings :D

vasqu · 2026-04-20T20:05:08Z

+    out_channels: int = 50
+    hidden_size: int = 256
+    max_text_length: int = 500


Suggested change

out_channels: int = 50

hidden_size: int = 256

max_text_length: int = 500

hidden_size: int = 256

should be inherited instead, they have the same values if I see it correctly

vasqu · 2026-04-20T20:06:35Z

+
+    hidden_act: str = "hardswish"
+    csp_kernel_size: int = 5
+    csp_blocks_num: int = 1


Suggested change

csp_blocks_num: int = 1

csp_num_blocks: int = 1

nit: naming to be consistent with the layer kwarg

vasqu · 2026-04-20T20:06:58Z

+    csp_kernel_size (`int`, *optional*, defaults to 5):
+        The kernel size of the CSP layer.
+    csp_blocks_num (`int`, *optional*, defaults to 1):
+        Number of the CSP layer.


I think in the docs we can at least also have the full name behind what CSP means

vasqu · 2026-04-20T20:28:16Z

+                )
+            )
+
+    def forward(self, hidden_states):


vasqu · 2026-04-20T20:43:04Z

+class SLANetModel(SLANetPreTrainedModel):
+    def __init__(self, config: SLANetConfig):
+        super().__init__(config)
+        self.backbone = load_backbone(config)
+        self.neck = SLANetCSPPAN(self.backbone.num_features[2:], config)
+
+        self.post_init()
+
+    @merge_with_config_defaults
+    @capture_outputs
+    def forward(
+        self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
+    ) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
+        outputs = self.backbone(hidden_states, **kwargs)
+        hidden_states = self.neck(outputs.feature_maps)
+        return BaseModelOutputWithNoAttention(
+            last_hidden_state=hidden_states,
+            hidden_states=outputs.hidden_states,
+        )


Imo, this should be SLANetBackbone instead, similar to SLANeXtBackbone

Not that it needs the same naming but from the structure it should be the same

class SLANetBackbone(SLANetPreTrainedModel): def __init__(self, config: SLANetConfig): super().__init__(config) self.vision_backbone = load_backbone(config) self.post_cs_pan = SLANetCSPPAN(self.backbone.num_features[2:], config) self.post_init() @can_return_tuple @auto_docstring def forward( self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs] ) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput: outputs = self.vision_backbone(hidden_states, **kwargs) hidden_states = self.post_cs_pan(outputs.feature_maps) return BaseModelOutputWithNoAttention( last_hidden_state=hidden_states, hidden_states=outputs.hidden_states, )

Just as an example

vasqu · 2026-04-20T20:44:31Z

+class SLANetForTableRecognition(SLANetPreTrainedModel):
+    _keys_to_ignore_on_load_missing = ["num_batches_tracked"]
+
+    def __init__(self, config: SLANetConfig):
+        super().__init__(config)
+        self.model = SLANetModel(config=config)
+        self.head = SLANetSLAHead(config=config)
+        self.post_init()
+
+    @can_return_tuple
+    @auto_docstring
+    def forward(
+        self, pixel_values: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
+    ) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
+        outputs = self.model(pixel_values, **kwargs)
+        head_outputs = self.head(outputs.last_hidden_state, **kwargs)
+        return SLANetForTableRecognitionOutput(
+            last_hidden_state=head_outputs.last_hidden_state,
+            hidden_states=outputs.hidden_states,
+            head_hidden_states=head_outputs.hidden_states,
+            head_attentions=head_outputs.attentions,
+        )


Suggested change

class SLANetForTableRecognition(SLANetPreTrainedModel):

_keys_to_ignore_on_load_missing = ["num_batches_tracked"]

def __init__(self, config: SLANetConfig):

super().__init__(config)

self.model = SLANetModel(config=config)

self.head = SLANetSLAHead(config=config)

self.post_init()

@can_return_tuple

@auto_docstring

def forward(

self, pixel_values: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]

) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:

outputs = self.model(pixel_values, **kwargs)

head_outputs = self.head(outputs.last_hidden_state, **kwargs)

return SLANetForTableRecognitionOutput(

last_hidden_state=head_outputs.last_hidden_state,

hidden_states=outputs.hidden_states,

head_hidden_states=head_outputs.hidden_states,

head_attentions=head_outputs.attentions,

)

class SLANetForTableRecognition(SLANeXtForTableRecognition):

_keys_to_ignore_on_load_missing = ["num_batches_tracked"]

@can_return_tuple

@auto_docstring

def forward(

self, pixel_values: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]

) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:

outputs = self.model(pixel_values, **kwargs)

head_outputs = self.head(outputs.last_hidden_state, **kwargs)

# Key difference: no attentions in its vision model

return SLANetForTableRecognitionOutput(

last_hidden_state=head_outputs.last_hidden_state,

hidden_states=outputs.hidden_states,

head_hidden_states=head_outputs.hidden_states,

head_attentions=head_outputs.attentions,

)

Then we also don't override base_model_prefix (i.e. base_model_prefix = "backbone" )and use the same namings

vasqu · 2026-04-20T20:47:31Z

+            expected_arg_names = ["pixel_values"]
+            self.assertListEqual(arg_names[:1], expected_arg_names)
+
+    def test_hidden_states_output(self):


lets add a small docstring for why we override

vasqu · 2026-04-20T20:50:53Z

+        url = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg"
+        self.image = Image.open(requests.get(url, stream=True).raw)


We changed the logic a bit to cache more, can you try to refactor

Add to the list of images

transformers/utils/fetch_hub_objects_for_ci.py

Line 40 in ef97a75

URLS_FOR_TESTING_DATA = [

Then use url_to_local_path to get the appropriate (cached) image path

transformers/tests/models/gemma4/test_modeling_gemma4.py

Lines 431 to 433 in ef97a75

self.url1 = url_to_local_path(

"https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/cow_beach_1.png"

)

Tbh, might be better to do for all PP models that have been added (in a separate PR)

Done. If this change looks good to you, I’ll open a new PR to update the previous models as well.

vasqu

Looks super good! Just some last nits, but nothing big

Re: the image caching, yes would be nice if you could another PR for that for other models 🤗

vasqu · 2026-04-21T12:28:01Z

I think we need to add the slanext image processor to the auto mappings in image_processing_auto (under MISSING_IMAGE_PROCESSOR_MAPPING_NAMES)

Seems like the auto mappings didnt pick it up

vasqu · 2026-04-21T12:29:44Z

+    csp_kernel_size (`int`, *optional*, defaults to 5):
+        The kernel size of the Cross Stage Partial (CSP) layer.
+    csp_num_blocks (`int`, *optional*, defaults to 1):
+        Number of the Cross Stage Partial (CSP) layer.


Suggested change

Number of the Cross Stage Partial (CSP) layer.

Number of blocks within the Cross Stage Partial (CSP) layer.

vasqu · 2026-04-21T12:32:11Z

+    def __init__(self, config: SLANetConfig):
+        super().__init__(config)
+        self.vision_backbone = load_backbone(config)
+        self.post_csp_pan = SLANetCSPPAN(self.vision_backbone.num_features[2:], config)


Super nit: the order is a bit weird, would change it to have the config as 1st arg

vasqu · 2026-04-21T12:33:14Z

+    @merge_with_config_defaults
+    @capture_outputs
+    def forward(
+        self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
+    ) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
+        outputs = self.vision_backbone(hidden_states, **kwargs)
+        hidden_states = self.post_csp_pan(outputs.feature_maps)
+        return BaseModelOutputWithNoAttention(
+            last_hidden_state=hidden_states,
+            hidden_states=outputs.hidden_states,
+        )


Suggested change

@merge_with_config_defaults

@capture_outputs

def forward(

self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]

) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:

outputs = self.vision_backbone(hidden_states, **kwargs)

hidden_states = self.post_csp_pan(outputs.feature_maps)

return BaseModelOutputWithNoAttention(

last_hidden_state=hidden_states,

hidden_states=outputs.hidden_states,

)

@can_return_tuple

@auto_docstring

def forward(

self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]

) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:

outputs = self.vision_backbone(hidden_states, **kwargs)

hidden_states = self.post_csp_pan(outputs.feature_maps)

return BaseModelOutputWithNoAttention(

last_hidden_state=hidden_states,

hidden_states=outputs.hidden_states,

)

I think we don't need those decorators, we rely on the backbone to collect the hidden states and it does so by itself

vasqu · 2026-04-21T12:34:57Z

+class SLANetForTableRecognitionOutput(SLANeXtForTableRecognitionOutput):
+    pass


I think we should not inherit here and inherit from BaseModelOutputWithNoAttention instead but with the same additions as SLANeXt; just to avoid confusion about attentions being in the output

u r right, done

github-actions · 2026-04-22T03:16:26Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, slanet

zhang-prog · 2026-04-22T03:59:56Z

Re: the image caching, yes would be nice if you could another PR for that for other models 🤗

#45562 PTAL.🤗

vasqu · 2026-04-22T10:55:05Z

run-slow: slanet

github-actions · 2026-04-22T10:56:24Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/slanet"]
quantizations: []

github-actions · 2026-04-22T11:04:46Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	a90c5a81	workflow commit (merge commit)
PR	682a5b16	branch commit (from PR)
main	f048e845	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

vasqu

Nice work, merging now

HuggingFaceDocBuilderDev · 2026-04-22T11:16:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zhang-prog added 2 commits April 20, 2026 20:40

init

6ed646b

add model_doc

e9ca4e8

vasqu reviewed Apr 20, 2026

View reviewed changes

zhang-prog added 2 commits April 21, 2026 14:27

fix

9b8a9fd

fix ci

3cd985b

vasqu approved these changes Apr 21, 2026

View reviewed changes

update

c63bdae

update

682a5b1

vasqu approved these changes Apr 22, 2026

View reviewed changes

vasqu enabled auto-merge April 22, 2026 11:06

vasqu added this pull request to the merge queue Apr 22, 2026

Merged via the queue into huggingface:main with commit 3d77972 Apr 22, 2026
29 checks passed

ArthurZucker added the New model label Apr 22, 2026

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

		url = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg"
		self.image = Image.open(requests.get(url, stream=True).raw)

	self.url1 = url_to_local_path(
	"https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/cow_beach_1.png"
	)

	Number of the Cross Stage Partial (CSP) layer.
	Number of blocks within the Cross Stage Partial (CSP) layer.

		class SLANetForTableRecognitionOutput(SLANeXtForTableRecognitionOutput):
		pass

Conversation

zhang-prog commented Apr 20, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

zhang-prog commented Apr 22, 2026

Uh oh!

vasqu commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

CI Results

Commit Info