Enable Megatron-LM workload on ROCm by rraminen · Pull Request #68 · deepspeedai/Megatron-DeepSpeed

rraminen · 2022-07-26T18:27:25Z

This PR contains the changes to

Enable Megatron workload on ROCm
Add extra_include_paths to hipify header files
Fix to resolve the rsqrtf() call to device function from host function error on ROCm
auto -> int conversion on ROCm as a workaround for hipify error

* Enable Megatron workload on ROCm * Added ds_pretrain_gpt_350M_dense_pipeclean.sh * removed a file * Removed an extra line * Fix to resolve the below rsqrtf() error on ROCm /root/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip:298:10: error: no matching function for call to 'rsqrtf' return rsqrtf(v); ^~~~~~ /opt/rocm-5.2.0/llvm/lib/clang/14.0.0/include/__clang_hip_math.h:521:7: note: candidate function not viable: call to __device__ function from __host__ function float rsqrtf(float __x) { return __ocml_rsqrt_f32(__x); } ^

rraminen · 2022-08-02T21:26:22Z

@jeffra, could you please review this PR?

jithunnair-amd · 2022-08-11T20:42:25Z

@jeffra @tjruwase Please let us know if you have any other comments.

tjruwase · 2022-08-12T15:57:43Z

@jeffra @tjruwase Please let us know if you have any other comments.

Apologies for the delay. Looks good to me.

…orpus

rraminen requested review from RezaYazdaniAminabadi, ShadenSmith, arashb, awan-10, cli99, conglongli, duli2012, eltonzheng, jeffra, minjiaz, mrwyattii, samyam, tjruwase, xiaoxiawu-microsoft and yaozhewei as code owners July 26, 2022 18:27

rraminen changed the title ~~Enable Megatron-LM workload on ROCm (#1)~~ Enable Megatron-LM workload on ROCm Jul 26, 2022

jithunnair-amd approved these changes Aug 2, 2022

View reviewed changes

tjruwase reviewed Aug 2, 2022

View reviewed changes

Comment thread megatron/fused_kernels/__init__.py

rraminen added 3 commits August 3, 2022 21:10

Simplified code

d59dd01

Simplified the code

262cdac

Removed extra spaces

f82a12f

tjruwase approved these changes Aug 12, 2022

View reviewed changes

jeffra approved these changes Aug 12, 2022

View reviewed changes

tjruwase merged commit b4d4a0e into deepspeedai:main Aug 12, 2022

NouamaneTazi mentioned this pull request Nov 16, 2022

Layer Norm kernel fails for ROCm #95

Closed

saforem2 added a commit to saforem2/Megatron-DeepSpeed that referenced this pull request Nov 15, 2024

Merge pull request deepspeedai#68 from argonne-lcf/feature/blending_c…

dfd0643

…orpus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Megatron-LM workload on ROCm#68

Enable Megatron-LM workload on ROCm#68
tjruwase merged 4 commits intodeepspeedai:mainfrom
ROCm:rocm_microsoft

rraminen commented Jul 26, 2022

Uh oh!

rraminen commented Aug 2, 2022

Uh oh!

Uh oh!

jithunnair-amd commented Aug 11, 2022

Uh oh!

tjruwase commented Aug 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rraminen commented Jul 26, 2022

Uh oh!

rraminen commented Aug 2, 2022

Uh oh!

Uh oh!

jithunnair-amd commented Aug 11, 2022

Uh oh!

tjruwase commented Aug 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants