Improve and fix bugs about fused softmax layer by hyunwoongko · Pull Request #133 · NVIDIA/Megatron-LM

hyunwoongko · 2021-08-12T21:39:11Z

Fix bugs about ELEMENTS_PER_LDG_STG (reported in Error in fused softmax kernel result #132)
Add test codes for all fused cuda kernel using huggingface transformers
Add constraint about 0 <= length_key <= 2048 (originally it was in the header file as TORCH_INTERNAL_ASSERT)
Add constraint about batch_per_block (originally it was in the header file as TORCH_INTERNAL_ASSERT)
Refactor python fused sacle mask softmax layer codes

hyunwoongko · 2021-08-12T21:49:48Z

Everything works well.

kvareddy · 2021-08-16T23:55:11Z

    int softmax_elements_stride, 
    int attn_batches)
 {
-    TORCH_INTERNAL_ASSERT(softmax_elements >= 0 && softmax_elements <= 2048 );


I think we should retain these asserts if someone wants to use the cuda code directly.

Ok I see. I agree with you.

kvareddy · 2021-08-17T00:01:50Z

+            self.scaled_masked_softmax_fusion  # user want to fuse
+            and self.input_in_float16  # input must be fp16
+            and mask is not None  # mask tensor must not be None
+            and 16 < sq <= 2048  # sq must be 16 ~ 2048


it should be 16 < sk <= 2048 and sq % 4 == 0

custom_kernel_constraint = key_seq_len > 16 and key_seq_len <= 2048 and \ query_seq_len % 4 == 0 and attn_batch_size % 4 == 0

Yes. You are right,. It was my mistake. I will change these things.

hyunwoongko · 2021-08-18T02:40:15Z

@kvareddy I fixed codes.:)

Fused softmax checks and additions from Github (#133) See merge request ADLR/megatron-lm!312

jaredcasper · 2021-09-01T22:50:32Z

These changes should all be merged in now. Thanks again for the PR!

* [SW-212054] W/A for dtype mismatch with CAG * Narrow wa for CAG enabled only * add TODO remove comment * Reduced impact of workround to mixtral failing case only * Revert not needed changes * Remove empty spaces * Remove one more empty space * More generic local usage of get_args inside LinearWithGradAccumulationAndAsyncCommunication's backward * Fix local import of get_args * Remove usage of get_args in megatron/core * Remove not needed empty line * Style fixes * Reorder import in layers.py * Add/modify headers regarding 2025 year * Reorder copyright headers, link todo with jira ticket

* [feat] add yi & llama sparse upcycling model * add run script * tokenizer may be empty for megatron

Improve and fix bugs about fused softmax layer

71b7f98

hyunwoongko mentioned this pull request Aug 12, 2021

Error in fused softmax kernel result #132

Closed

hyunwoongko added 2 commits August 13, 2021 06:42

Remove useless include

7ef72e7

fix bugs in fused softmax

ee9059b

kvareddy reviewed Aug 16, 2021

View reviewed changes

kvareddy reviewed Aug 17, 2021

View reviewed changes

chagne PR by reviews

ea97f11

Modify comments

20522f3

kvareddy approved these changes Aug 18, 2021

View reviewed changes

jaredcasper added a commit that referenced this pull request Sep 1, 2021

Merge branch 'github_fused_softmax' into 'main'

0be4052

Fused softmax checks and additions from Github (#133) See merge request ADLR/megatron-lm!312

jaredcasper closed this Sep 1, 2021

stas00 mentioned this pull request Sep 2, 2021

syncing with the upstream? deepspeedai/Megatron-DeepSpeed#12

Open

sdtblck mentioned this pull request Jan 29, 2022

Fix fused softmax kernel EleutherAI/gpt-neox#496

Merged

andresnowak pushed a commit to andresnowak/SwissAi-Megatron-LM that referenced this pull request Oct 28, 2025

addd yi & llama sparse upcycling model (NVIDIA#133)

22a2224

* [feat] add yi & llama sparse upcycling model * add run script * tokenizer may be empty for megatron

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve and fix bugs about fused softmax layer#133

Improve and fix bugs about fused softmax layer#133
hyunwoongko wants to merge 5 commits intoNVIDIA:mainfrom
hyunwoongko:main

hyunwoongko commented Aug 12, 2021 •

edited

Loading

Uh oh!

hyunwoongko commented Aug 12, 2021

Uh oh!

kvareddy Aug 16, 2021

Uh oh!

hyunwoongko Aug 18, 2021

Uh oh!

kvareddy Aug 17, 2021

Uh oh!

hyunwoongko Aug 18, 2021

Uh oh!

hyunwoongko commented Aug 18, 2021

Uh oh!

jaredcasper commented Sep 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hyunwoongko commented Aug 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyunwoongko commented Aug 12, 2021

Uh oh!

kvareddy Aug 16, 2021

Choose a reason for hiding this comment

Uh oh!

hyunwoongko Aug 18, 2021

Choose a reason for hiding this comment

Uh oh!

kvareddy Aug 17, 2021

Choose a reason for hiding this comment

Uh oh!

hyunwoongko Aug 18, 2021

Choose a reason for hiding this comment

Uh oh!

hyunwoongko commented Aug 18, 2021

Uh oh!

jaredcasper commented Sep 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hyunwoongko commented Aug 12, 2021 •

edited

Loading