Skip to content

[coloattention] coloattention support flash attention 2#4347

Merged
kurisusnowdeng merged 1 commit intohpcaitech:mainfrom
flybird11111:update-coloattention
Aug 4, 2023
Merged

[coloattention] coloattention support flash attention 2#4347
kurisusnowdeng merged 1 commit intohpcaitech:mainfrom
flybird11111:update-coloattention

Conversation

@flybird11111
Copy link
Copy Markdown
Contributor

@flybird11111 flybird11111 commented Jul 28, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

#4322

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Improved ColoAttention to utilize the latest flash attention 2:

  1. flash_attn_func is used for attention with no paddings
  2. Attention with paddings use SeqLenInfo, unpad, repad in order to work with flash_attn_varlen_func
  3. Flash attention 2 only supports fp16/bf16 on Ampere or better GPUs. For other precisions or hardwares, we still use xformers to accelerate attention

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 15%.

Click me to view the complete report
Name                                                       Stmts   Miss  Cover
------------------------------------------------------------------------------
colossalai/kernel/cuda_native/flash_attention.py             298    298     0%
colossalai/kernel/cuda_native/flash_attn/flash_attn_2.py      36     19    47%
colossalai/kernel/cuda_native/flash_attn/mem_eff_attn.py      33     25    24%
colossalai/kernel/cuda_native/scaled_softmax.py               96     65    32%
tests/test_utils/test_flash_attention.py                      92     62    33%
------------------------------------------------------------------------------
TOTAL                                                        555    469    15%

@flybird11111 flybird11111 force-pushed the update-coloattention branch 2 times, most recently from 765bc08 to f161d8a Compare August 1, 2023 07:17
@flybird11111 flybird11111 force-pushed the update-coloattention branch from f161d8a to 5187c96 Compare August 1, 2023 07:34
@flybird11111 flybird11111 reopened this Aug 1, 2023
@flybird11111 flybird11111 reopened this Aug 1, 2023
@flybird11111 flybird11111 force-pushed the update-coloattention branch from 9d28850 to e3dccfe Compare August 1, 2023 07:48
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 1, 2023

The code coverage for the changed files is 33%.

Click me to view the complete report
Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
colossalai/kernel/cuda_native/fmha/flash_attn_2.py      36     19    47%
colossalai/kernel/cuda_native/fmha/mem_eff_attn.py      33     25    24%
colossalai/kernel/cuda_native/scaled_softmax.py         96     65    32%
tests/test_utils/test_flash_attention.py                92     62    33%
------------------------------------------------------------------------
TOTAL                                                  257    171    33%

Comment thread colossalai/kernel/cuda_native/fmha/fmha.py Outdated
Comment thread colossalai/kernel/cuda_native/fmha/fmha.py Outdated
Comment thread colossalai/kernel/cuda_native/fmha/fmha.py Outdated
Comment thread colossalai/kernel/cuda_native/fmha/utils.py Outdated
Comment thread colossalai/kernel/cuda_native/fmha/utils.py Outdated
Comment thread colossalai/kernel/cuda_native/fmha/flash_attn_2.py Outdated
Comment thread colossalai/kernel/cuda_native/fmha/mem_eff_attn.py Outdated
@flybird11111 flybird11111 force-pushed the update-coloattention branch from e3dccfe to 4b8df44 Compare August 2, 2023 08:01
Comment thread colossalai/kernel/cuda_native/mha/mem_eff_attn.py Outdated
@flybird11111 flybird11111 force-pushed the update-coloattention branch from 4b8df44 to 91f57e6 Compare August 2, 2023 10:05
Comment thread colossalai/kernel/cuda_native/mha/mha.py Outdated
Comment thread colossalai/kernel/cuda_native/mha/mha.py Outdated
Comment thread colossalai/kernel/cuda_native/mha/flash_attn_2.py
Comment thread colossalai/kernel/cuda_native/mha/utils.py Outdated
Comment thread colossalai/kernel/cuda_native/mha/mha.py Outdated
Comment thread colossalai/kernel/cuda_native/mha/flash_attn_2.py Outdated
@flybird11111 flybird11111 force-pushed the update-coloattention branch 5 times, most recently from 63fbeda to 9478c96 Compare August 4, 2023 03:47
[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2

[shardformer] coloattention support flash attention 2
@flybird11111 flybird11111 force-pushed the update-coloattention branch from 9478c96 to d604dd6 Compare August 4, 2023 03:56
@flybird11111
Copy link
Copy Markdown
Contributor Author

image All tests have passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants