Update flash_attention to version 2.0.1 by kkk55596 · Pull Request #4323 · hpcaitech/ColossalAI

kkk55596 · 2023-07-25T08:50:37Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

fixed #4322

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

According to Flash Attention, I modify flash_attention.py and requirements-test.txt for supporting flashattn v2.0.1

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

clyang · 2023-07-25T09:12:53Z

wow. According to Flash Attention's webiste

FlashAttention-2 is about 2x faster than its previous version, reaching up to 230 TFLOPs/s on A100 GPUs (FP16/BF16).

It would be great if this PR can pass the review and be accepted.

kurisusnowdeng · 2023-07-25T09:39:14Z

@kkk55596 Thank you so much for contribution. We will actually provide the general ColoAttention interface and deprecate the other ones very soon. Thus, could you please try to replace the xformers part that we currently use with flash attention 2?

Here are few tips for you:

We could use flash_attn_func for attention with no paddings
Attention with paddings will need get_seq_info_from_mask, unpad, repad in order to work with flash_attn_varlen_func
Flash attention 2 only supports fp16/bf16 on Ampere or better GPUs. For other precisions or hardwares, we still need xformers to accelerate attention;
(Optional) flash attention's CUDA version does not support attention bias, while its triton version does. We would really appreciate it if you are able to help ColoAttention to support attention bias.

kurisusnowdeng · 2023-07-25T09:41:36Z

-        flash_attn_unpadded_func,
-        flash_attn_unpadded_kvpacked_func,
-        flash_attn_unpadded_qkvpacked_func,
+        flash_attn_varlen_func,


As commented in conversations

github-actions · 2023-07-25T12:52:50Z

The code coverage for the changed files is 11%.

Click me to view the complete report

Name                                               Stmts   Miss  Cover
----------------------------------------------------------------------
colossalai/kernel/cuda_native/flash_attention.py     292    261    11%
----------------------------------------------------------------------
TOTAL                                                292    261    11%

kurisusnowdeng · 2023-08-04T07:32:37Z

Closed since the same feature has been completed by #4347

Update flash_attention to version 2.0.1

df73a4b

flybird11111 requested a review from kurisusnowdeng July 25, 2023 08:56

kurisusnowdeng suggested changes Jul 25, 2023

View reviewed changes

kurisusnowdeng closed this Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update flash_attention to version 2.0.1#4323

Update flash_attention to version 2.0.1#4323
kkk55596 wants to merge 1 commit intohpcaitech:mainfrom
kkk55596:update_flashattn

kkk55596 commented Jul 25, 2023 •

edited

Loading

Uh oh!

clyang commented Jul 25, 2023

Uh oh!

kurisusnowdeng commented Jul 25, 2023

Uh oh!

kurisusnowdeng Jul 25, 2023

Uh oh!

github-actions Bot commented Jul 25, 2023

Uh oh!

kurisusnowdeng commented Aug 4, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kkk55596 commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

clyang commented Jul 25, 2023

Uh oh!

kurisusnowdeng commented Jul 25, 2023

Uh oh!

kurisusnowdeng Jul 25, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 25, 2023

Uh oh!

kurisusnowdeng commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kkk55596 commented Jul 25, 2023 •

edited

Loading

kurisusnowdeng commented Aug 4, 2023 •

edited

Loading