Allow kernel modules to declare their preferred mask function by dacorvo · Pull Request #44680 · huggingface/transformers

dacorvo · 2026-03-13T17:55:54Z

Fixes #44679

Summary

Custom attention kernels registered via load_and_register_attn_kernel currently get hardcoded flash_attention_2 mask dispatch, which produces 2D or None masks
Kernels that need SDPA-style 4D boolean masks (e.g., device-specific SDPA implementations) have no way to declare this
Add support for a MASK_FUNCTION module-level attribute on kernel packages — falls back to "flash_attention_2" for backward compatibility

Changes

src/transformers/integrations/hub_kernels.py: check getattr(kernel, "MASK_FUNCTION", "flash_attention_2") instead of hardcoding "flash_attention_2"
tests/kernels/test_kernels.py: 2 new tests — default fallback and custom MASK_FUNCTION="sdpa" dispatch

Test plan

python -m pytest tests/kernels/test_kernels.py::TestAttentionKernelRegistration -xvs — all 5 tests pass (3 existing + 2 new)

AI disclosure

This PR was developed with AI assistance (Claude). All changes reviewed and validated by a human contributor.

🤖 Generated with Claude Code

`load_and_register_attn_kernel` hardcodes the mask function to `flash_attention_2` for all custom attention kernels. This is incorrect for kernels that need a different mask type (e.g., SDPA-style masks). Add support for a `MASK_FUNCTION` module-level attribute on kernel packages. If present, it specifies which mask type to use (e.g., "sdpa", "eager"). Falls back to "flash_attention_2" for backward compatibility when the attribute is absent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a way for hub-loaded custom attention kernels to choose which attention-mask factory function Transformers should use, instead of always defaulting to the flash_attention_2 mask behavior.

Changes:

Update hub kernel registration to read an optional MASK_FUNCTION attribute from the kernel module, defaulting to "flash_attention_2".
Add tests covering the default fallback behavior and the custom MASK_FUNCTION="sdpa" dispatch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/transformers/integrations/hub_kernels.py`	Uses kernel-provided `MASK_FUNCTION` to select the mask interface at registration time.
`tests/kernels/test_kernels.py`	Adds unit tests ensuring the new dispatch behavior works and remains backward compatible.

dacorvo · 2026-03-13T18:15:15Z

+    ALL_MASK_ATTENTION_FUNCTIONS.register(attn_implementation, ALL_MASK_ATTENTION_FUNCTIONS[mask_type])



This looks overly cautious: the pull-request just replicates the way the kernel_function is registered.

HuggingFaceDocBuilderDev · 2026-03-13T18:05:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu

Some initial comments from my side, unsure if we really want to save the mask function within the kernel - maybe an optional kwarg would do the job in a better way

vasqu · 2026-03-19T05:54:21Z

+            try:
+                ALL_ATTENTION_FUNCTIONS.pop(attn_impl, None)
+                ALL_MASK_ATTENTION_FUNCTIONS.pop(attn_impl, None)
+            except Exception as e:


Can we cleanup as in the other tests

transformers/tests/kernels/test_kernels.py

Lines 412 to 420 in 16a5b09

# Cleanup registration to avoid leaking functions across tests

try:

ALL_ATTENTION_FUNCTIONS.pop(attn_impl, None)

except Exception as e:

print(f"Could not clean up `ALL_ATTENTION_FUNCTIONS`: {e}")

try:

ALL_MASK_ATTENTION_FUNCTIONS.pop(attn_impl, None)

except Exception as e:

print(f"Could not clean up `ALL_MASK_ATTENTION_FUNCTIONS`: {e}")

just for concistency sakes

vasqu · 2026-03-19T05:54:39Z

+                ALL_MASK_ATTENTION_FUNCTIONS[attn_impl],
+                ALL_MASK_ATTENTION_FUNCTIONS["sdpa"],
+            )
+            try:


vasqu · 2026-03-19T05:58:40Z

+
+    # Allow the kernel module to declare its preferred mask function (e.g., MASK_FUNCTION = "sdpa").
+    # Falls back to "flash_attention_2" for backward compatibility with existing kernels.
+    mask_type = getattr(kernel, "MASK_FUNCTION", "flash_attention_2")


This is a very heavy restriction that relies on the kernels to have this defined in there init as constant. Wouldn't it make more sense that we can pass this as optional kwarg?

Is this the way we want to go with this @danieldk @drbh

If the MASK_FUNCTION is not defined it will default to "flash_attention_2", so it is compatible with existing kernels. But maybe you meant something else ?

The fallback is fine. The issue is how do we let kernels "register" their mask. The way this is currently done, it is kind of expected to be naturally integrated within the kernel or more explicit: In the __init__ of the kernel with the exact constant MASK_FUNCTION = "your_prefered_attn_mask_type"

This seems a bit extreme to me and maybe the proper way is to allow a kwarg (within this function) instead and register a new mask that way

dacorvo · 2026-04-14T12:10:09Z

@vasqu @danieldk how can we make progress on this one ?

vasqu · 2026-04-14T19:29:06Z

@dacorvo Imo, the current solution is too reliant on the kernel to have the mask included as constant. Would it be possible to rewrite to have an optional kwarg that could be used instead?

Copilot AI review requested due to automatic review settings March 13, 2026 17:55

Copilot started reviewing on behalf of dacorvo March 13, 2026 17:56 View session

dacorvo requested a review from danieldk March 13, 2026 17:57

Copilot AI reviewed Mar 13, 2026

View reviewed changes

dacorvo mentioned this pull request Mar 16, 2026

[Neuron] Improve transformers compatibility with AWS Neuron devices #44741

Closed

vasqu reviewed Mar 19, 2026

View reviewed changes

test(kernels): align kernel mask funciton cleanup

2fd4cdf

dacorvo added 2 commits April 14, 2026 14:10

Merge branch 'main' into kernel-mask-function

22edeab

Merge branch 'main' into kernel-mask-function

8c60b19

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow kernel modules to declare their preferred mask function#44680

Allow kernel modules to declare their preferred mask function#44680
dacorvo wants to merge 4 commits intomainfrom
kernel-mask-function

dacorvo commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

dacorvo Mar 13, 2026

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

vasqu left a comment

Uh oh!

vasqu Mar 19, 2026

Uh oh!

vasqu Mar 19, 2026

Uh oh!

Uh oh!

vasqu Mar 19, 2026

Uh oh!

dacorvo Mar 19, 2026 •

edited

Loading

Uh oh!

vasqu Mar 19, 2026

Uh oh!

dacorvo commented Apr 14, 2026 •

edited

Loading

Uh oh!

vasqu commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		ALL_MASK_ATTENTION_FUNCTIONS.register(attn_implementation, ALL_MASK_ATTENTION_FUNCTIONS[mask_type])

	# Cleanup registration to avoid leaking functions across tests
	try:
	ALL_ATTENTION_FUNCTIONS.pop(attn_impl, None)
	except Exception as e:
	print(f"Could not clean up `ALL_ATTENTION_FUNCTIONS`: {e}")
	try:
	ALL_MASK_ATTENTION_FUNCTIONS.pop(attn_impl, None)
	except Exception as e:
	print(f"Could not clean up `ALL_MASK_ATTENTION_FUNCTIONS`: {e}")

Conversation

dacorvo commented Mar 13, 2026

Summary

Changes

Test plan

AI disclosure

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

dacorvo Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

dacorvo Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

dacorvo commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dacorvo Mar 19, 2026 •

edited

Loading

dacorvo commented Apr 14, 2026 •

edited

Loading