Skip to content

[sp] : fix the attention kernel for sp#6061

Merged
wangbluo merged 11 commits intohpcaitech:mainfrom
wangbluo:sp_fix
Sep 14, 2024
Merged

[sp] : fix the attention kernel for sp#6061
wangbluo merged 11 commits intohpcaitech:mainfrom
wangbluo:sp_fix

Conversation

@wangbluo
Copy link
Copy Markdown
Contributor

📝 What does this PR do?

For cases where s_q * s_kv * element size >= 10 GB, dispatch only to the FlashAttentionDaoLoader kernel, and use an empty tensor as a placeholder for the attention_mask. Additionally, only causal and padded causal modes are supported.

@wangbluo wangbluo requested a review from a team as a code owner September 13, 2024 02:36
Comment thread colossalai/shardformer/layer/attn.py Outdated
Comment thread colossalai/shardformer/layer/attn.py Outdated
Comment thread colossalai/shardformer/layer/attn.py Outdated
Comment thread colossalai/shardformer/layer/attn.py Outdated
Comment thread colossalai/shardformer/layer/attn.py Outdated
Comment thread colossalai/shardformer/layer/attn.py Outdated
Comment thread colossalai/shardformer/layer/attn.py Outdated
Comment thread colossalai/shardformer/layer/attn.py Outdated
@wangbluo wangbluo merged commit 37e3523 into hpcaitech:main Sep 14, 2024
@wangbluo wangbluo deleted the sp_fix branch September 26, 2024 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants