Skip to content

[Model] Add SLANet Model Support#45532

Merged
vasqu merged 6 commits intohuggingface:mainfrom
zhang-prog:feat/slanet
Apr 22, 2026
Merged

[Model] Add SLANet Model Support#45532
vasqu merged 6 commits intohuggingface:mainfrom
zhang-prog:feat/slanet

Conversation

@zhang-prog
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments from my side, already looks pretty good! It's mostly smaller details and a new way we cache images for our tests

Comment thread docs/source/en/model_doc/slanet.md Outdated
<hfoption id="AutoModel">

```py
import requests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have missed these the last models, but would be nice if we could use httpx instead - we cleaned our code base to use httpx internally so would be nice to have it in examples as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, already used the auto mappings :D

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +71 to +73
out_channels: int = 50
hidden_size: int = 256
max_text_length: int = 500
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out_channels: int = 50
hidden_size: int = 256
max_text_length: int = 500
hidden_size: int = 256

should be inherited instead, they have the same values if I see it correctly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


hidden_act: str = "hardswish"
csp_kernel_size: int = 5
csp_blocks_num: int = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
csp_blocks_num: int = 1
csp_num_blocks: int = 1

nit: naming to be consistent with the layer kwarg

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

csp_kernel_size (`int`, *optional*, defaults to 5):
The kernel size of the CSP layer.
csp_blocks_num (`int`, *optional*, defaults to 1):
Number of the CSP layer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the docs we can at least also have the full name behind what CSP means

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)
)

def forward(self, hidden_states):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +347 to +365
class SLANetModel(SLANetPreTrainedModel):
def __init__(self, config: SLANetConfig):
super().__init__(config)
self.backbone = load_backbone(config)
self.neck = SLANetCSPPAN(self.backbone.num_features[2:], config)

self.post_init()

@merge_with_config_defaults
@capture_outputs
def forward(
self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
outputs = self.backbone(hidden_states, **kwargs)
hidden_states = self.neck(outputs.feature_maps)
return BaseModelOutputWithNoAttention(
last_hidden_state=hidden_states,
hidden_states=outputs.hidden_states,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, this should be SLANetBackbone instead, similar to SLANeXtBackbone

Not that it needs the same naming but from the structure it should be the same

class SLANetBackbone(SLANetPreTrainedModel):
    def __init__(self, config: SLANetConfig):
        super().__init__(config)
        self.vision_backbone = load_backbone(config)
        self.post_cs_pan = SLANetCSPPAN(self.backbone.num_features[2:], config)

        self.post_init()

    @can_return_tuple
    @auto_docstring
    def forward(
        self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
    ) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
        outputs = self.vision_backbone(hidden_states, **kwargs)
        hidden_states = self.post_cs_pan(outputs.feature_maps)
        return BaseModelOutputWithNoAttention(
            last_hidden_state=hidden_states,
            hidden_states=outputs.hidden_states,
        )

Just as an example

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see

Comment on lines +374 to +395
class SLANetForTableRecognition(SLANetPreTrainedModel):
_keys_to_ignore_on_load_missing = ["num_batches_tracked"]

def __init__(self, config: SLANetConfig):
super().__init__(config)
self.model = SLANetModel(config=config)
self.head = SLANetSLAHead(config=config)
self.post_init()

@can_return_tuple
@auto_docstring
def forward(
self, pixel_values: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
outputs = self.model(pixel_values, **kwargs)
head_outputs = self.head(outputs.last_hidden_state, **kwargs)
return SLANetForTableRecognitionOutput(
last_hidden_state=head_outputs.last_hidden_state,
hidden_states=outputs.hidden_states,
head_hidden_states=head_outputs.hidden_states,
head_attentions=head_outputs.attentions,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class SLANetForTableRecognition(SLANetPreTrainedModel):
_keys_to_ignore_on_load_missing = ["num_batches_tracked"]
def __init__(self, config: SLANetConfig):
super().__init__(config)
self.model = SLANetModel(config=config)
self.head = SLANetSLAHead(config=config)
self.post_init()
@can_return_tuple
@auto_docstring
def forward(
self, pixel_values: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
outputs = self.model(pixel_values, **kwargs)
head_outputs = self.head(outputs.last_hidden_state, **kwargs)
return SLANetForTableRecognitionOutput(
last_hidden_state=head_outputs.last_hidden_state,
hidden_states=outputs.hidden_states,
head_hidden_states=head_outputs.hidden_states,
head_attentions=head_outputs.attentions,
)
class SLANetForTableRecognition(SLANeXtForTableRecognition):
_keys_to_ignore_on_load_missing = ["num_batches_tracked"]
@can_return_tuple
@auto_docstring
def forward(
self, pixel_values: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
outputs = self.model(pixel_values, **kwargs)
head_outputs = self.head(outputs.last_hidden_state, **kwargs)
# Key difference: no attentions in its vision model
return SLANetForTableRecognitionOutput(
last_hidden_state=head_outputs.last_hidden_state,
hidden_states=outputs.hidden_states,
head_hidden_states=head_outputs.hidden_states,
head_attentions=head_outputs.attentions,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we also don't override base_model_prefix (i.e. base_model_prefix = "backbone" )and use the same namings

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

expected_arg_names = ["pixel_values"]
self.assertListEqual(arg_names[:1], expected_arg_names)

def test_hidden_states_output(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add a small docstring for why we override

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +203 to +204
url = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/table_recognition.jpg"
self.image = Image.open(requests.get(url, stream=True).raw)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed the logic a bit to cache more, can you try to refactor

  1. Add to the list of images
    URLS_FOR_TESTING_DATA = [
  2. Then use url_to_local_path to get the appropriate (cached) image path
    self.url1 = url_to_local_path(
    "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/cow_beach_1.png"
    )

Tbh, might be better to do for all PP models that have been added (in a separate PR)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. If this change looks good to you, I’ll open a new PR to update the previous models as well.

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks super good! Just some last nits, but nothing big

Re: the image caching, yes would be nice if you could another PR for that for other models 🤗

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add the slanext image processor to the auto mappings in image_processing_auto (under MISSING_IMAGE_PROCESSOR_MAPPING_NAMES)

Seems like the auto mappings didnt pick it up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

csp_kernel_size (`int`, *optional*, defaults to 5):
The kernel size of the Cross Stage Partial (CSP) layer.
csp_num_blocks (`int`, *optional*, defaults to 1):
Number of the Cross Stage Partial (CSP) layer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Number of the Cross Stage Partial (CSP) layer.
Number of blocks within the Cross Stage Partial (CSP) layer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def __init__(self, config: SLANetConfig):
super().__init__(config)
self.vision_backbone = load_backbone(config)
self.post_csp_pan = SLANetCSPPAN(self.vision_backbone.num_features[2:], config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: the order is a bit weird, would change it to have the config as 1st arg

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +326 to +336
@merge_with_config_defaults
@capture_outputs
def forward(
self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
outputs = self.vision_backbone(hidden_states, **kwargs)
hidden_states = self.post_csp_pan(outputs.feature_maps)
return BaseModelOutputWithNoAttention(
last_hidden_state=hidden_states,
hidden_states=outputs.hidden_states,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@merge_with_config_defaults
@capture_outputs
def forward(
self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
outputs = self.vision_backbone(hidden_states, **kwargs)
hidden_states = self.post_csp_pan(outputs.feature_maps)
return BaseModelOutputWithNoAttention(
last_hidden_state=hidden_states,
hidden_states=outputs.hidden_states,
)
@can_return_tuple
@auto_docstring
def forward(
self, hidden_states: torch.FloatTensor, **kwargs: Unpack[TransformersKwargs]
) -> tuple[torch.FloatTensor] | SLANetForTableRecognitionOutput:
outputs = self.vision_backbone(hidden_states, **kwargs)
hidden_states = self.post_csp_pan(outputs.feature_maps)
return BaseModelOutputWithNoAttention(
last_hidden_state=hidden_states,
hidden_states=outputs.hidden_states,
)

I think we don't need those decorators, we rely on the backbone to collect the hidden states and it does so by itself

Comment on lines +123 to +124
class SLANetForTableRecognitionOutput(SLANeXtForTableRecognitionOutput):
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not inherit here and inherit from BaseModelOutputWithNoAttention instead but with the same additions as SLANeXt; just to avoid confusion about attentions being in the output

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u r right, done

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, slanet

@zhang-prog
Copy link
Copy Markdown
Contributor Author

Re: the image caching, yes would be nice if you could another PR for that for other models 🤗

#45562 PTAL.🤗

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 22, 2026

run-slow: slanet

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/slanet"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN a90c5a81 workflow commit (merge commit)
PR 682a5b16 branch commit (from PR)
main f048e845 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, merging now

@vasqu vasqu enabled auto-merge April 22, 2026 11:06
@vasqu vasqu added this pull request to the merge queue Apr 22, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Merged via the queue into huggingface:main with commit 3d77972 Apr 22, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants