Skip to content

Add Flash Attention 2 to Persimmon#27685

Closed
jeromeku wants to merge 8 commits intohuggingface:mainfrom
jeromeku:persimmon-flash-attn2
Closed

Add Flash Attention 2 to Persimmon#27685
jeromeku wants to merge 8 commits intohuggingface:mainfrom
jeromeku:persimmon-flash-attn2

Conversation

@jeromeku
Copy link
Copy Markdown

What does this PR do?

Integrates FA2 to Persimmon per #26350, #27052 (former branch was messed up after trying to rebase, so PR'ing a new branch).

Before submitting

Who can review?

@younesbelkada @ArthurZucker

Notes

  • Fixed comments per Persimmon fa2 attention4d #27052. Requesting new PR as former branch was messed up after trying to rebase on main.
  • Tried making changes as suggested in [FA-2] Add Flash Attention to Phi #27661 for generate_padding_right test. However, Persimmon tokenizer configs do not have either eos or pad tokens (both are set to null see here), so simply copying the LlamaModelTest generate_padding_right test override does not work.
    • Also tried running dummy inputs on the full pretrained model for the generate_padding_right test, no luck either -- this is left as the current implementation in test_persimmon_modeling.py.
    • Ran some additional experiments on the generate_padding_test for other models for FA2 -- see comments.
    • Marking generate_padding_right test as skip for now.
  • Files other than those related to persimmon were changed in this PR due to fixes from running make {quality, style, fixup}

Copy link
Copy Markdown
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice thanks a lot !
Some changes in the PR seem unrelated (e.g. changes on Phi, etc) I think that you need to install ruff==0.1.5 and run make style again
I'll also run the benchmarks later with FA2-Phi and update in this PR !
There are also some strange failing CI, can you try to rebase with main again?

@jeromeku jeromeku force-pushed the persimmon-flash-attn2 branch from 1222223 to 5590ead Compare November 24, 2023 21:43
@jeromeku
Copy link
Copy Markdown
Author

@younesbelkada

Re-based, installed ruff==0.1.5, and re-ran make style, still getting test failure for PhiModelTest.test_pipeline_text_generation.

Copy link
Copy Markdown
Contributor

@xhluca xhluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments regarding the target_dtype inference

if hasattr(self.config, "_pre_quantization_dtype"):
target_dtype = self.config._pre_quantization_dtype
else:
target_dtype = self.q_proj.weight.dtype
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will give you an error because self.q_proj was never defined here (it is defined in Llama's __init__, which is why it worked). I am not sure exactly what this is trying to achieve, but you might try some other module that is defined in the __init__ of the PersimmonAttention class.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you should use the self.query_key_value here.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

cc @molbap as younes is offline

@huggingface huggingface deleted a comment from github-actions Bot Jan 2, 2024
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Let's make sure to rebase on main and only include changes for persimmon!

@huggingface huggingface deleted a comment from github-actions Bot Jan 30, 2024
@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot closed this Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants