Conversation
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
Failing test should be unrelated |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
i think this commit broke something. import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
device = "cuda"
dtype = torch.bfloat16
repo = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForImageTextToText.from_pretrained(
repo,
torch_dtype=dtype,
)above fails with
|
i face this problem as well, any updates? |
|
@WelkinYang @geronimi73 I think unintentionally some incompatible typing was added in this PR, should be fixed in #36661. |
|
Typing issue is fixed on main! |
| @torch.compiler.disable(recursive=False) | ||
| def compile_friendly_flex_attention( | ||
| query: torch.Tensor, | ||
| key: torch.Tensor, | ||
| value: torch.Tensor, | ||
| **kwargs, | ||
| ) -> torch.Tensor: | ||
| # First call initialise singleton wrapper object, second call invokes the object method to return compiled flex attention | ||
| flex_attention_compiled = WrappedFlexAttention()() | ||
| return flex_attention_compiled( | ||
| query, | ||
| key, | ||
| value, | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
thanks for your great work! just curious, why don't we just return a pre-compiled flex_attention (just like torchtune https://github.com/pytorch/torchtune/blob/main/torchtune/modules/attention_utils.py#L44-L57 ) ?
There was a problem hiding this comment.
The idea is to only precompile once the first time we really use flex attn. See the respective singleton class WrappedFlexAttention in the integrations file.
Otherwise you would always compile (once) when you have torch 2.5.x or higher. We shouldn't force that on the user when he might not even use it 👀
What does this PR do?
Update proper flex