Skip to content

[GPTNeoX] Flex Attention + Refactor#34896

Merged
ArthurZucker merged 20 commits intohuggingface:mainfrom
vasqu:flex-gptneox
Dec 4, 2024
Merged

[GPTNeoX] Flex Attention + Refactor#34896
ArthurZucker merged 20 commits intohuggingface:mainfrom
vasqu:flex-gptneox

Conversation

@vasqu
Copy link
Copy Markdown
Contributor

@vasqu vasqu commented Nov 23, 2024

What does this PR do?

Adds flex attention and the refactor according to #34809

However, I discovered several issues in the current version of gemma2 (#34282):

  • It seems like that flex attention needs a transpose afterwards like sdpa
  • Loading flex attn with from pretrained didn't work and hence, current tests use another attn implementation (eager or sdpa not sure again)
  • Tests could gain from similar tests like sdpa :D for now it's a bit of a hassle to always have some integration test added when it could be a more general test for all subsequent models
  • I'm not familiar with better transformers or limitations of flex attn --> added some todos in case we need to check in
  • Flex attn doesn't support dropout (or maybe I've overlooked something)
  • Setting model.config._attn_implementation = ... should be tracked somewhere and checked for sanity as done the first time - for now it silently overwrites and could cause some ugly errors (tested with changing to flash attention 2 while not having fa2 installed)
  • Documentation should be added somewhere (prolly perf or something else)

So tbh, I'm not sure whether to split this PR into several ones, e.g. a gemma fix, general loading, general tests, docs, and then subsequent models, or not

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants