Skip to content

Enable Megatron-LM workload on ROCm#68

Merged
tjruwase merged 4 commits intodeepspeedai:mainfrom
ROCm:rocm_microsoft
Aug 12, 2022
Merged

Enable Megatron-LM workload on ROCm#68
tjruwase merged 4 commits intodeepspeedai:mainfrom
ROCm:rocm_microsoft

Conversation

@rraminen
Copy link
Copy Markdown

This PR contains the changes to

  • Enable Megatron workload on ROCm
  • Add extra_include_paths to hipify header files
  • Fix to resolve the rsqrtf() call to device function from host function error on ROCm
  • auto -> int conversion on ROCm as a workaround for hipify error

* Enable Megatron workload on ROCm

* Added ds_pretrain_gpt_350M_dense_pipeclean.sh

* removed a file

* Removed an extra line

* Fix to resolve the below rsqrtf() error on ROCm

/root/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip:298:10: error: no matching function for call to 'rsqrtf'
  return rsqrtf(v);
         ^~~~~~
/opt/rocm-5.2.0/llvm/lib/clang/14.0.0/include/__clang_hip_math.h:521:7: note: candidate function not viable: call to __device__ function from __host__ function
float rsqrtf(float __x) { return __ocml_rsqrt_f32(__x); }
      ^
@rraminen rraminen changed the title Enable Megatron-LM workload on ROCm (#1) Enable Megatron-LM workload on ROCm Jul 26, 2022
@rraminen
Copy link
Copy Markdown
Author

rraminen commented Aug 2, 2022

@jeffra, could you please review this PR?

Comment thread megatron/fused_kernels/__init__.py
@jithunnair-amd
Copy link
Copy Markdown

@jeffra @tjruwase Please let us know if you have any other comments.

@tjruwase
Copy link
Copy Markdown

@jeffra @tjruwase Please let us know if you have any other comments.

Apologies for the delay. Looks good to me.

@tjruwase tjruwase merged commit b4d4a0e into deepspeedai:main Aug 12, 2022
saforem2 added a commit to saforem2/Megatron-DeepSpeed that referenced this pull request Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants