Skip to content

Proper_flex#36643

Merged
ArthurZucker merged 24 commits intomainfrom
proper_flex
Mar 11, 2025
Merged

Proper_flex#36643
ArthurZucker merged 24 commits intomainfrom
proper_flex

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

What does this PR do?

Update proper flex

@ArthurZucker ArthurZucker marked this pull request as ready for review March 11, 2025 08:59
@github-actions github-actions Bot requested a review from Rocketknight1 March 11, 2025 08:59
@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

Failing test should be unrelated

@ArthurZucker ArthurZucker merged commit d126f35 into main Mar 11, 2025
@ArthurZucker ArthurZucker deleted the proper_flex branch March 11, 2025 09:24
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@geronimi73
Copy link
Copy Markdown

i think this commit broke something. SmolVLM2-2.2B doesn't load anymore.

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText

device = "cuda"
dtype = torch.bfloat16
repo = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"

processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForImageTextToText.from_pretrained(
    repo, 
    torch_dtype=dtype,
)

above fails with RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback): name 'BlockMask' is not defined

  • tried torch2.1 and torch2.4, both fail
  • cloning transformers and resetting to the commit before this PR was merged fixes it
NameError                                 Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1885](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1884), in _LazyModule._get_module(self, module_name)
   1884 try:
-> 1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:

File [/usr/lib/python3.10/importlib/__init__.py:126](http://192.168.1.73:8888/usr/lib/python3.10/importlib/__init__.py#line=125), in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File [~/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py:34](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py#line=33)
     33 from ...modeling_outputs import BaseModelOutput, ModelOutput
---> 34 from ...modeling_utils import PreTrainedModel
     35 from ...utils import (
     36     add_start_docstrings,
     37     add_start_docstrings_to_model_forward,
   (...)
     41     replace_return_docstrings,
     42 )

File [~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:55](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/modeling_utils.py#line=54)
     54 from .integrations.flash_attention import flash_attention_forward
---> 55 from .integrations.flex_attention import flex_attention_forward
     56 from .integrations.sdpa_attention import sdpa_attention_forward

File [~/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py:74](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py#line=73)
     71         return self._compiled_flex_attention
---> 74 def make_flex_block_causal_mask(attention_mask_2d: torch.Tensor) -> BlockMask:
     75     """
     76     Create a block causal document mask for a batch of sequences, both packed and unpacked.
     77     Create Block causal logic and passing it into :func:`torch.nn.attention.flex_attention.create_block_mask`.
   (...)
     95         BlockMask
     96     """

NameError: name 'BlockMask' is not defined

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 9
      6 repo = "HuggingFaceTB[/SmolVLM2-2.2B-Instruct](http://192.168.1.73:8888/SmolVLM2-2.2B-Instruct)"
      8 processor = AutoProcessor.from_pretrained(repo)
----> 9 model = AutoModelForImageTextToText.from_pretrained(
     10     repo, 
     11     torch_dtype=dtype,
     12     _attn_implementation="eager"
     13 ).to(device)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=562), in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    559     return model_class.from_pretrained(
    560         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    561     )
    562 elif type(config) in cls._model_mapping.keys():
--> 563     model_class = _get_model_class(config, cls._model_mapping)
    564     return model_class.from_pretrained(
    565         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    566     )
    567 raise ValueError(
    568     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    569     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    570 )

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:388](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=387), in _get_model_class(config, model_mapping)
    387 def _get_model_class(config, model_mapping):
--> 388     supported_models = model_mapping[type(config)]
    389     if not isinstance(supported_models, (list, tuple)):
    390         return supported_models

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:763](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=762), in _LazyAutoMapping.__getitem__(self, key)
    761 if model_type in self._model_mapping:
    762     model_name = self._model_mapping[model_type]
--> 763     return self._load_attr_from_module(model_type, model_name)
    765 # Maybe there was several model types associated with this config.
    766 model_types = [k for k, v in self._config_mapping.items() if v == key.__name__]

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:777](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=776), in _LazyAutoMapping._load_attr_from_module(self, model_type, attr)
    775 if module_name not in self._modules:
    776     self._modules[module_name] = importlib.import_module(f".{module_name}", "transformers.models")
--> 777 return getattribute_from_module(self._modules[module_name], attr)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:693](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=692), in getattribute_from_module(module, attr)
    691 if isinstance(attr, tuple):
    692     return tuple(getattribute_from_module(module, a) for a in attr)
--> 693 if hasattr(module, attr):
    694     return getattr(module, attr)
    695 # Some of the mappings have entries model_type -> object of another model type. In that case we try to grab the
    696 # object at the top level.

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1873](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1872), in _LazyModule.__getattr__(self, name)
   1871     value = Placeholder
   1872 elif name in self._class_to_module.keys():
-> 1873     module = self._get_module(self._class_to_module[name])
   1874     value = getattr(module, name)
   1875 elif name in self._modules:

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1887](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1886), in _LazyModule._get_module(self, module_name)
   1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:
-> 1887     raise RuntimeError(
   1888         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1889         f" traceback):\n{e}"
   1890     ) from e

RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback):
name 'BlockMask' is not defined

@WelkinYang
Copy link
Copy Markdown

i think this commit broke something. SmolVLM2-2.2B doesn't load anymore.

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText

device = "cuda"
dtype = torch.bfloat16
repo = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"

processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForImageTextToText.from_pretrained(
    repo, 
    torch_dtype=dtype,
)

above fails with RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback): name 'BlockMask' is not defined

  • tried torch2.1 and torch2.4, both fail
  • cloning transformers and resetting to the commit before this PR was merged fixes it
NameError                                 Traceback (most recent call last)
File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1885](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1884), in _LazyModule._get_module(self, module_name)
   1884 try:
-> 1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:

File [/usr/lib/python3.10/importlib/__init__.py:126](http://192.168.1.73:8888/usr/lib/python3.10/importlib/__init__.py#line=125), in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File [~/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py:34](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/smolvlm/modeling_smolvlm.py#line=33)
     33 from ...modeling_outputs import BaseModelOutput, ModelOutput
---> 34 from ...modeling_utils import PreTrainedModel
     35 from ...utils import (
     36     add_start_docstrings,
     37     add_start_docstrings_to_model_forward,
   (...)
     41     replace_return_docstrings,
     42 )

File [~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:55](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/modeling_utils.py#line=54)
     54 from .integrations.flash_attention import flash_attention_forward
---> 55 from .integrations.flex_attention import flex_attention_forward
     56 from .integrations.sdpa_attention import sdpa_attention_forward

File [~/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py:74](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/integrations/flex_attention.py#line=73)
     71         return self._compiled_flex_attention
---> 74 def make_flex_block_causal_mask(attention_mask_2d: torch.Tensor) -> BlockMask:
     75     """
     76     Create a block causal document mask for a batch of sequences, both packed and unpacked.
     77     Create Block causal logic and passing it into :func:`torch.nn.attention.flex_attention.create_block_mask`.
   (...)
     95         BlockMask
     96     """

NameError: name 'BlockMask' is not defined

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 9
      6 repo = "HuggingFaceTB[/SmolVLM2-2.2B-Instruct](http://192.168.1.73:8888/SmolVLM2-2.2B-Instruct)"
      8 processor = AutoProcessor.from_pretrained(repo)
----> 9 model = AutoModelForImageTextToText.from_pretrained(
     10     repo, 
     11     torch_dtype=dtype,
     12     _attn_implementation="eager"
     13 ).to(device)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=562), in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    559     return model_class.from_pretrained(
    560         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    561     )
    562 elif type(config) in cls._model_mapping.keys():
--> 563     model_class = _get_model_class(config, cls._model_mapping)
    564     return model_class.from_pretrained(
    565         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    566     )
    567 raise ValueError(
    568     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    569     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    570 )

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:388](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=387), in _get_model_class(config, model_mapping)
    387 def _get_model_class(config, model_mapping):
--> 388     supported_models = model_mapping[type(config)]
    389     if not isinstance(supported_models, (list, tuple)):
    390         return supported_models

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:763](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=762), in _LazyAutoMapping.__getitem__(self, key)
    761 if model_type in self._model_mapping:
    762     model_name = self._model_mapping[model_type]
--> 763     return self._load_attr_from_module(model_type, model_name)
    765 # Maybe there was several model types associated with this config.
    766 model_types = [k for k, v in self._config_mapping.items() if v == key.__name__]

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:777](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=776), in _LazyAutoMapping._load_attr_from_module(self, model_type, attr)
    775 if module_name not in self._modules:
    776     self._modules[module_name] = importlib.import_module(f".{module_name}", "transformers.models")
--> 777 return getattribute_from_module(self._modules[module_name], attr)

File [~/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:693](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py#line=692), in getattribute_from_module(module, attr)
    691 if isinstance(attr, tuple):
    692     return tuple(getattribute_from_module(module, a) for a in attr)
--> 693 if hasattr(module, attr):
    694     return getattr(module, attr)
    695 # Some of the mappings have entries model_type -> object of another model type. In that case we try to grab the
    696 # object at the top level.

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1873](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1872), in _LazyModule.__getattr__(self, name)
   1871     value = Placeholder
   1872 elif name in self._class_to_module.keys():
-> 1873     module = self._get_module(self._class_to_module[name])
   1874     value = getattr(module, name)
   1875 elif name in self._modules:

File [~/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py:1887](http://192.168.1.73:8888/home/g/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1886), in _LazyModule._get_module(self, module_name)
   1885     return importlib.import_module("." + module_name, self.__name__)
   1886 except Exception as e:
-> 1887     raise RuntimeError(
   1888         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1889         f" traceback):\n{e}"
   1890     ) from e

RuntimeError: Failed to import transformers.models.smolvlm.modeling_smolvlm because of the following error (look up to see its traceback):
name 'BlockMask' is not defined

i face this problem as well, any updates?

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 12, 2025

@WelkinYang @geronimi73 I think unintentionally some incompatible typing was added in this PR, should be fixed in #36661.

@ArthurZucker
Copy link
Copy Markdown
Collaborator Author

Typing issue is fixed on main!

Comment on lines +130 to +144
@torch.compiler.disable(recursive=False)
def compile_friendly_flex_attention(
query: torch.Tensor,
key: torch.Tensor,
value: torch.Tensor,
**kwargs,
) -> torch.Tensor:
# First call initialise singleton wrapper object, second call invokes the object method to return compiled flex attention
flex_attention_compiled = WrappedFlexAttention()()
return flex_attention_compiled(
query,
key,
value,
**kwargs,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your great work! just curious, why don't we just return a pre-compiled flex_attention (just like torchtune https://github.com/pytorch/torchtune/blob/main/torchtune/modules/attention_utils.py#L44-L57 ) ?

Copy link
Copy Markdown
Contributor

@vasqu vasqu Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to only precompile once the first time we really use flex attn. See the respective singleton class WrappedFlexAttention in the integrations file.

Otherwise you would always compile (once) when you have torch 2.5.x or higher. We shouldn't force that on the user when he might not even use it 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants