[Gemma2] Support FA2 softcapping#31887
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
LysandreJik
left a comment
There was a problem hiding this comment.
OK, looks good to me! 2.6.0 was released 3 hours ago, let's go
amyeroberts
left a comment
There was a problem hiding this comment.
LGTM - thanks for adding!
|
Good to see this. Can we use it for model fine-tuning, or is it just for inference? Google recommends fine-tuning in 'eager' mode. |
|
Now you can use it for finetuning as well if you have the correct version of FA2. Not sure if finetuning "requires" it |
|
Great! Any plans for sdpa support as well? |
|
Sdpa is a bit more complicated, we need to use flex attention, did not have time to implement. Do you want to open a PR? |
|
Hi @ArthurZucker, should we also add the sliding window and soft-capping to transformers/src/transformers/models/gemma2/modeling_gemma2.py Lines 466 to 469 in fc35907 just like transformers/src/transformers/models/mistral/modeling_mistral.py Lines 519 to 528 in fc35907 |
|
It should be here on main: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L361 we updated the whole FA2 integration . On the release branch it was there AFAIK |
|
Get it, thanks for replying! |
What does this PR do?
Adds support for the new FA2 softcapping following Dao-AILab/flash-attention#1025