Gemma2: eager attention by default#32865
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| config._attn_implementation = "sdpa" | ||
| model = Gemma2Model(config) |
There was a problem hiding this comment.
Here we should check that you can set the attention in the canonical way - rather than overriding the private attribute
| config._attn_implementation = "sdpa" | |
| model = Gemma2Model(config) | |
| model = Gemma2Model(config, attn_implementation="sdpa") |
There was a problem hiding this comment.
@amyeroberts hehe I had the same idea, but this actually doesn't work :D
The API you're suggesting is for .from_pretrained(), not for __init__()
There was a problem hiding this comment.
we can pass _attn_implementation I think no?
There was a problem hiding this comment.
The API you're suggesting is for .from_pretrained(), not for init()
Oh, true!
we can pass _attn_implementation I think no?
I don't think we can for the model init as it just accepts config as an input arg
| config._attn_implementation = "sdpa" | ||
| model = Gemma2Model(config) |
There was a problem hiding this comment.
we can pass _attn_implementation I think no?
What does this PR do?
See title :)
We know that SDPA yields inferior modeling results, so we should use
eagerby default. This has been the source of some model quality GH issues, e.g. #32848Slow tests for gemma 2 ran, no regressions