Gemma2: eager attention by default by gante · Pull Request #32865 · huggingface/transformers

gante · 2024-08-17T16:38:02Z

What does this PR do?

See title :)

We know that SDPA yields inferior modeling results, so we should use eager by default. This has been the source of some model quality GH issues, e.g. #32848

Slow tests for gemma 2 ran, no regressions

HuggingFaceDocBuilderDev · 2024-08-17T16:56:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts · 2024-08-19T08:52:23Z

+        config._attn_implementation = "sdpa"
+        model = Gemma2Model(config)


Here we should check that you can set the attention in the canonical way - rather than overriding the private attribute

Suggested change

config._attn_implementation = "sdpa"

model = Gemma2Model(config)

model = Gemma2Model(config, attn_implementation="sdpa")

@amyeroberts hehe I had the same idea, but this actually doesn't work :D

The API you're suggesting is for .from_pretrained(), not for __init__()

we can pass _attn_implementation I think no?

The API you're suggesting is for .from_pretrained(), not for init()

Oh, true!

we can pass _attn_implementation I think no?

I don't think we can for the model init as it just accepts config as an input arg

ArthurZucker · 2024-08-22T13:32:46Z

+        config._attn_implementation = "sdpa"
+        model = Gemma2Model(config)


we can pass _attn_implementation I think no?

gante added 2 commits August 17, 2024 16:14

eager by default

1acec6f

nits

887d8c4

gante requested a review from ArthurZucker August 17, 2024 16:41

amyeroberts reviewed Aug 19, 2024

View reviewed changes

ArthurZucker mentioned this pull request Aug 19, 2024

Add logit scaling sdpa using FlexAttention for Gemma2 #32877

Closed

3 tasks

ArthurZucker approved these changes Aug 22, 2024

View reviewed changes

gante merged commit 975b988 into huggingface:main Aug 22, 2024

gante deleted the gemma_2_eager_default branch August 22, 2024 14:59

ArthurZucker mentioned this pull request Sep 6, 2024

Bug with finetuning Gemma 2 models #33333

Closed

4 tasks

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Gemma2: eager attention by default (huggingface#32865)

8c12803

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma2: eager attention by default#32865

Gemma2: eager attention by default#32865
gante merged 2 commits intohuggingface:mainfrom
gante:gemma_2_eager_default

gante commented Aug 17, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 17, 2024

Uh oh!

amyeroberts Aug 19, 2024

Uh oh!

gante Aug 21, 2024

Uh oh!

ArthurZucker Aug 22, 2024

Uh oh!

amyeroberts Aug 22, 2024

Uh oh!

ArthurZucker Aug 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		config._attn_implementation = "sdpa"
		model = Gemma2Model(config)

	config._attn_implementation = "sdpa"
	model = Gemma2Model(config)
	model = Gemma2Model(config, attn_implementation="sdpa")

Conversation

gante commented Aug 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 17, 2024

Uh oh!

amyeroberts Aug 19, 2024

Choose a reason for hiding this comment

Uh oh!

gante Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gante commented Aug 17, 2024 •

edited

Loading