Add Flash Attention 2 to Persimmon by jeromeku · Pull Request #27685 · huggingface/transformers

jeromeku · 2023-11-24T02:02:32Z

What does this PR do?

Integrates FA2 to Persimmon per #26350, #27052 (former branch was messed up after trying to rebase, so PR'ing a new branch).

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@younesbelkada @ArthurZucker

Notes

Fixed comments per Persimmon fa2 attention4d #27052. Requesting new PR as former branch was messed up after trying to rebase on main.
Tried making changes as suggested in [FA-2] Add Flash Attention to Phi #27661 for generate_padding_right test. However, Persimmon tokenizer configs do not have either eos or pad tokens (both are set to null see here), so simply copying the LlamaModelTest generate_padding_right test override does not work.
- Also tried running dummy inputs on the full pretrained model for the generate_padding_right test, no luck either -- this is left as the current implementation in test_persimmon_modeling.py.
- Ran some additional experiments on the generate_padding_test for other models for FA2 -- see comments.
- Marking generate_padding_right test as skip for now.
Files other than those related to persimmon were changed in this PR due to fixes from running make {quality, style, fixup}

younesbelkada

Looks very nice thanks a lot !
Some changes in the PR seem unrelated (e.g. changes on Phi, etc) I think that you need to install ruff==0.1.5 and run make style again
I'll also run the benchmarks later with FA2-Phi and update in this PR !
There are also some strange failing CI, can you try to rebase with main again?

jeromeku · 2023-11-24T22:23:37Z

@younesbelkada

Re-based, installed ruff==0.1.5, and re-ran make style, still getting test failure for PhiModelTest.test_pipeline_text_generation.

xhluca

I left some comments regarding the target_dtype inference

xhluca · 2023-11-29T22:41:53Z

+            if hasattr(self.config, "_pre_quantization_dtype"):
+                target_dtype = self.config._pre_quantization_dtype
+            else:
+                target_dtype = self.q_proj.weight.dtype


This line will give you an error because self.q_proj was never defined here (it is defined in Llama's __init__, which is why it worked). I am not sure exactly what this is trying to achieve, but you might try some other module that is defined in the __init__ of the PersimmonAttention class.

Yes, you should use the self.query_key_value here.

ArthurZucker · 2023-11-30T09:16:51Z

cc @molbap as younes is offline

ArthurZucker

Thanks for the update! Let's make sure to rebase on main and only include changes for persimmon!

github-actions · 2024-02-24T08:04:54Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker requested a review from younesbelkada November 24, 2023 11:20

younesbelkada reviewed Nov 24, 2023

View reviewed changes

jeromeku added 8 commits November 24, 2023 21:40

add fa2 implementation

7302d53

add attention masking / padding

77bf8ba

resolve attention mask

2a29cac

additional mask checks

bd40918

skip generate_padding_right test

61cee1e

run quality

5084b85

run quality

6e355c5

run fix-copies:

5590ead

jeromeku force-pushed the persimmon-flash-attn2 branch from 1222223 to 5590ead Compare November 24, 2023 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flash Attention 2 to Persimmon#27685

Add Flash Attention 2 to Persimmon#27685
jeromeku wants to merge 8 commits intohuggingface:mainfrom
jeromeku:persimmon-flash-attn2

jeromeku commented Nov 24, 2023

Uh oh!

younesbelkada left a comment

Uh oh!

jeromeku commented Nov 24, 2023

Uh oh!

xhluca left a comment

Uh oh!

xhluca Nov 29, 2023

Uh oh!

susnato Dec 3, 2023

Uh oh!

ArthurZucker commented Nov 30, 2023

Uh oh!

ArthurZucker left a comment

Uh oh!

github-actions Bot commented Feb 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jeromeku commented Nov 24, 2023

What does this PR do?

Before submitting

Who can review?

Notes

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

jeromeku commented Nov 24, 2023

Uh oh!

xhluca left a comment

Choose a reason for hiding this comment

Uh oh!

xhluca Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

susnato Dec 3, 2023

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Nov 30, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants