Proper_flex by ArthurZucker · Pull Request #36643 · huggingface/transformers

ArthurZucker · 2025-03-11T08:55:24Z

What does this PR do?

Update proper flex

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…r_flex

ArthurZucker · 2025-03-11T09:24:05Z

Failing test should be unrelated

HuggingFaceDocBuilderDev · 2025-03-11T09:34:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

geronimi73 · 2025-03-12T06:39:56Z

i think this commit broke something. SmolVLM2-2.2B doesn't load anymore.

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText

device = "cuda"
dtype = torch.bfloat16
repo = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"

processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForImageTextToText.from_pretrained(
    repo, 
    torch_dtype=dtype,
)

above fails with RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback): name 'BlockMask' is not defined

tried torch2.1 and torch2.4, both fail
cloning transformers and resetting to the commit before this PR was merged fixes it

NameError                                 Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1885](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1884), in _LazyModule._get_module(self, module_name)
   1884 try:
-> 1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:

File [/usr/lib/python3.10/importlib/__init__.py:126](http://192.168.1.73:8888/usr/lib/python3.10/importlib/__init__.py#line=125), in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File [~/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py:34](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py#line=33)
     33 from ...modeling_outputs import BaseModelOutput, ModelOutput
---> 34 from ...modeling_utils import PreTrainedModel
     35 from ...utils import (
     36     add_start_docstrings,
     37     add_start_docstrings_to_model_forward,
   (...)
     41     replace_return_docstrings,
     42 )

File [~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:55](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/modeling_utils.py#line=54)
     54 from .integrations.flash_attention import flash_attention_forward
---> 55 from .integrations.flex_attention import flex_attention_forward
     56 from .integrations.sdpa_attention import sdpa_attention_forward

File [~/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py:74](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py#line=73)
     71         return self._compiled_flex_attention
---> 74 def make_flex_block_causal_mask(attention_mask_2d: torch.Tensor) -> BlockMask:
     75     """
     76     Create a block causal document mask for a batch of sequences, both packed and unpacked.
     77     Create Block causal logic and passing it into :func:`torch.nn.attention.flex_attention.create_block_mask`.
   (...)
     95         BlockMask
     96     """

NameError: name 'BlockMask' is not defined

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 9
      6 repo = "HuggingFaceTB[/SmolVLM2-2.2B-Instruct](http://192.168.1.73:8888/SmolVLM2-2.2B-Instruct)"
      8 processor = AutoProcessor.from_pretrained(repo)
----> 9 model = AutoModelForImageTextToText.from_pretrained(
     10     repo, 
     11     torch_dtype=dtype,
     12     _attn_implementation="eager"
     13 ).to(device)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=562), in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    559     return model_class.from_pretrained(
    560         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    561     )
    562 elif type(config) in cls._model_mapping.keys():
--> 563     model_class = _get_model_class(config, cls._model_mapping)
    564     return model_class.from_pretrained(
    565         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    566     )
    567 raise ValueError(
    568     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    569     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    570 )

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:388](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=387), in _get_model_class(config, model_mapping)
    387 def _get_model_class(config, model_mapping):
--> 388     supported_models = model_mapping[type(config)]
    389     if not isinstance(supported_models, (list, tuple)):
    390         return supported_models

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:763](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=762), in _LazyAutoMapping.__getitem__(self, key)
    761 if model_type in self._model_mapping:
    762     model_name = self._model_mapping[model_type]
--> 763     return self._load_attr_from_module(model_type, model_name)
    765 # Maybe there was several model types associated with this config.
    766 model_types = [k for k, v in self._config_mapping.items() if v == key.__name__]

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:777](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=776), in _LazyAutoMapping._load_attr_from_module(self, model_type, attr)
    775 if module_name not in self._modules:
    776     self._modules[module_name] = importlib.import_module(f".{module_name}", "transformers.models")
--> 777 return getattribute_from_module(self._modules[module_name], attr)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:693](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=692), in getattribute_from_module(module, attr)
    691 if isinstance(attr, tuple):
    692     return tuple(getattribute_from_module(module, a) for a in attr)
--> 693 if hasattr(module, attr):
    694     return getattr(module, attr)
    695 # Some of the mappings have entries model_type -> object of another model type. In that case we try to grab the
    696 # object at the top level.

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1873](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1872), in _LazyModule.__getattr__(self, name)
   1871     value = Placeholder
   1872 elif name in self._class_to_module.keys():
-> 1873     module = self._get_module(self._class_to_module[name])
   1874     value = getattr(module, name)
   1875 elif name in self._modules:

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1887](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1886), in _LazyModule._get_module(self, module_name)
   1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:
-> 1887     raise RuntimeError(
   1888         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1889         f" traceback):\n{e}"
   1890     ) from e

RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback):
name 'BlockMask' is not defined

WelkinYang · 2025-03-12T08:36:06Z

i think this commit broke something. SmolVLM2-2.2B doesn't load anymore.

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText

device = "cuda"
dtype = torch.bfloat16
repo = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"

processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForImageTextToText.from_pretrained(
    repo, 
    torch_dtype=dtype,
)

above fails with RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback): name 'BlockMask' is not defined

tried torch2.1 and torch2.4, both fail
cloning transformers and resetting to the commit before this PR was merged fixes it

NameError                                 Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1885](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1884), in _LazyModule._get_module(self, module_name)
   1884 try:
-> 1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:

File [/usr/lib/python3.10/importlib/__init__.py:126](http://192.168.1.73:8888/usr/lib/python3.10/importlib/__init__.py#line=125), in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File [~/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py:34](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py#line=33)
     33 from ...modeling_outputs import BaseModelOutput, ModelOutput
---> 34 from ...modeling_utils import PreTrainedModel
     35 from ...utils import (
     36     add_start_docstrings,
     37     add_start_docstrings_to_model_forward,
   (...)
     41     replace_return_docstrings,
     42 )

File [~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:55](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/modeling_utils.py#line=54)
     54 from .integrations.flash_attention import flash_attention_forward
---> 55 from .integrations.flex_attention import flex_attention_forward
     56 from .integrations.sdpa_attention import sdpa_attention_forward

File [~/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py:74](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py#line=73)
     71         return self._compiled_flex_attention
---> 74 def make_flex_block_causal_mask(attention_mask_2d: torch.Tensor) -> BlockMask:
     75     """
     76     Create a block causal document mask for a batch of sequences, both packed and unpacked.
     77     Create Block causal logic and passing it into :func:`torch.nn.attention.flex_attention.create_block_mask`.
   (...)
     95         BlockMask
     96     """

NameError: name 'BlockMask' is not defined

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 9
      6 repo = "HuggingFaceTB[/SmolVLM2-2.2B-Instruct](http://192.168.1.73:8888/SmolVLM2-2.2B-Instruct)"
      8 processor = AutoProcessor.from_pretrained(repo)
----> 9 model = AutoModelForImageTextToText.from_pretrained(
     10     repo, 
     11     torch_dtype=dtype,
     12     _attn_implementation="eager"
     13 ).to(device)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=562), in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    559     return model_class.from_pretrained(
    560         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    561     )
    562 elif type(config) in cls._model_mapping.keys():
--> 563     model_class = _get_model_class(config, cls._model_mapping)
    564     return model_class.from_pretrained(
    565         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    566     )
    567 raise ValueError(
    568     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    569     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    570 )

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:388](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=387), in _get_model_class(config, model_mapping)
    387 def _get_model_class(config, model_mapping):
--> 388     supported_models = model_mapping[type(config)]
    389     if not isinstance(supported_models, (list, tuple)):
    390         return supported_models

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:763](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=762), in _LazyAutoMapping.__getitem__(self, key)
    761 if model_type in self._model_mapping:
    762     model_name = self._model_mapping[model_type]
--> 763     return self._load_attr_from_module(model_type, model_name)
    765 # Maybe there was several model types associated with this config.
    766 model_types = [k for k, v in self._config_mapping.items() if v == key.__name__]

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:777](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=776), in _LazyAutoMapping._load_attr_from_module(self, model_type, attr)
    775 if module_name not in self._modules:
    776     self._modules[module_name] = importlib.import_module(f".{module_name}", "transformers.models")
--> 777 return getattribute_from_module(self._modules[module_name], attr)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:693](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=692), in getattribute_from_module(module, attr)
    691 if isinstance(attr, tuple):
    692     return tuple(getattribute_from_module(module, a) for a in attr)
--> 693 if hasattr(module, attr):
    694     return getattr(module, attr)
    695 # Some of the mappings have entries model_type -> object of another model type. In that case we try to grab the
    696 # object at the top level.

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1873](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1872), in _LazyModule.__getattr__(self, name)
   1871     value = Placeholder
   1872 elif name in self._class_to_module.keys():
-> 1873     module = self._get_module(self._class_to_module[name])
   1874     value = getattr(module, name)
   1875 elif name in self._modules:

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1887](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1886), in _LazyModule._get_module(self, module_name)
   1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:
-> 1887     raise RuntimeError(
   1888         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1889         f" traceback):\n{e}"
   1890     ) from e

RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback):
name 'BlockMask' is not defined

i face this problem as well, any updates?

vasqu · 2025-03-12T18:43:46Z

@WelkinYang @geronimi73 I think unintentionally some incompatible typing was added in this PR, should be fixed in #36661.

ArthurZucker · 2025-03-24T17:02:38Z

Typing issue is fixed on main!

xingchensong · 2025-04-16T09:03:59Z

+@torch.compiler.disable(recursive=False)
+def compile_friendly_flex_attention(
+    query: torch.Tensor,
+    key: torch.Tensor,
+    value: torch.Tensor,
+    **kwargs,
+) -> torch.Tensor:
+    # First call initialise singleton wrapper object, second call invokes the object method to return compiled flex attention
+    flex_attention_compiled = WrappedFlexAttention()()
+    return flex_attention_compiled(
+        query,
+        key,
+        value,
+        **kwargs,
+    )


thanks for your great work! just curious, why don't we just return a pre-compiled flex_attention (just like torchtune https://github.com/pytorch/torchtune/blob/main/torchtune/modules/attention_utils.py#L44-L57 ) ？

The idea is to only precompile once the first time we really use flex attn. See the respective singleton class WrappedFlexAttention in the integrations file.

Otherwise you would always compile (once) when you have torch 2.5.x or higher. We shouldn't force that on the user when he might not even use it 👀

bursteratom and others added 23 commits March 4, 2025 09:29

proper performant flex attention implementation

800a7e7

wrapper for flex attention to compile only when triggered

c331bb3

wrapper for flex attention to compile only when triggered

e1438ad

attention mask type detection

68bd4e6

Update src/transformers/integrations/flex_attention.py

cf0ad12

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

nit

2afa102

nit

a78f7bc

nit

9d1ee83

nit

c269190

gemma2 support

4e58c63

add citation for torchtune

6237ae4

Update src/transformers/models/llama/modeling_llama.py

eb254a8

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update flex_attention.py

f593d3a

nit

6cf7ea9

nit

4da7947

nit

743ab13

reset gemma2 modifications

5ab3582

nit

ad63890

nit

f3b3bae

nit

43871f7

licencing

f094580

Merge branch 'main' of github.com:huggingface/transformers into prope…

e85f118

…r_flex

apply changes to other models

bb1c527

ArthurZucker marked this pull request as ready for review March 11, 2025 08:59

github-actions Bot requested a review from Rocketknight1 March 11, 2025 08:59

ArthurZucker mentioned this pull request Mar 11, 2025

Proper performant flex attention implementation #36103

Closed

6 tasks

safe import

93c471d

ArthurZucker merged commit d126f35 into main Mar 11, 2025

ArthurZucker deleted the proper_flex branch March 11, 2025 09:24

ArthurZucker mentioned this pull request Mar 11, 2025

ModernBERT FlexAttention #35423

Closed

bursteratom mentioned this pull request Mar 11, 2025

Flex Attention + Packing with BlockMask support axolotl-ai-cloud/axolotl#2363

Merged

ccdv-ai mentioned this pull request Mar 13, 2025

Support Flex Attention for encoder only models (XLMRoberta, ModernBERT etc...) #36697

Open

xingchensong reviewed Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper_flex#36643

Proper_flex#36643
ArthurZucker merged 24 commits intomainfrom
proper_flex

ArthurZucker commented Mar 11, 2025

Uh oh!

ArthurZucker commented Mar 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 11, 2025

Uh oh!

geronimi73 commented Mar 12, 2025

Uh oh!

WelkinYang commented Mar 12, 2025

Uh oh!

vasqu commented Mar 12, 2025

Uh oh!

ArthurZucker commented Mar 24, 2025

Uh oh!

xingchensong Apr 16, 2025

Uh oh!

vasqu Apr 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

ArthurZucker commented Mar 11, 2025

What does this PR do?

Uh oh!

ArthurZucker commented Mar 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 11, 2025

Uh oh!

geronimi73 commented Mar 12, 2025

Uh oh!

WelkinYang commented Mar 12, 2025

Uh oh!

vasqu commented Mar 12, 2025

Uh oh!

ArthurZucker commented Mar 24, 2025

Uh oh!

xingchensong Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

vasqu Apr 16, 2025 •

edited

Loading