LLVM codegen changes for SPIR backend #7940

Solaryee · 2023-12-20T05:45:32Z

This PR aims to add LLVM codegen changes for SPIRV backend. It contains SPIRV target and related intrinsic.

akuegel · 2023-12-20T09:01:42Z

xla/service/gpu/parallel_loop_emitter.cc

  // for that dimensions.  This helps LLVM generate vectorized codes
  // in that cases.
  llvm::Value* row_index = nullptr;
-  if (!launch_config_.row_vectorized) {


As far as I understand, these changes are just a NFC refactoring, and are not related to the real changes being done in this PR? If yes, maybe split them into their own PR?
I would say the new code is not necessarily more easy to read. I agree with the changes to swap if/else blocks and get rid of the negation, but I would prefer if the loop still starts from 1 and we handle the i = 0 case outside the loop.

Yes, this is a NFC refactoring, I can put it in another PR.

Done. This PR only contains SPIR target/intrinsic now.

cheshire

Is it possible to test those changes? How do we plan to trigger Intel codegen codepath?

penpornk

Thank you for the PR!

xla/service/gpu/ir_emission_utils.cc

penpornk · 2023-12-21T17:05:06Z

xla/service/gpu/target_util.cc

+      return {
+          llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x,
+          llvm::Intrinsic::amdgcn_workitem_id_x,
+          [](llvm::IRBuilder<>* b_) -> llvm::CallInst* {
+            return EmitDeviceFunctionCall(
+                "_Z32__spirv_BuiltInLocalInvocationIdi", {b_->getInt32(0)},
+                {U32}, U64, {b_->getContext()}, b_);
+          },


I think this can be refactored to a templated struct because we only need the code for one device backend at the same time. But I can look into that after this PR.

+1, there's quite a bit of duplication across those EmitDeviceFunctionCall, maybe just extracting it to a function would be better?

That would make it cleaner too :)

penpornk · 2023-12-21T17:12:11Z

Re: @cheshire:

Is it possible to test those changes? How do we plan to trigger Intel codegen codepath?

@ShengYang1 Do you already have a CI running? Are the results publicly available now? If so, could you please share the CI link here for the time being? (We can look into adding the CI link to our github check dashboard later.)

penpornk

More feedback from our code check. (For these functions, we prefer the std version.)

xla/service/gpu/target_util.cc

Zantares · 2023-12-22T08:02:05Z

Re: @cheshire:

Is it possible to test those changes? How do we plan to trigger Intel codegen codepath?

@ShengYang1 Do you already have a CI running? Are the results publicly available now? If so, could you please share the CI link here for the time being? (We can look into adding the CI link to our github check dashboard later.)

According to the suggestion in RFC, we have initial CI plan here and welcome your suggestions:

We plan to enable post-CI as AMD, but we also would like to support more capabilities (trigger GitHub action on demand) if OpenXLA is interested :)
Current public Intel CI can't be used to track OpenXLA main branch because it based on Extension with fixed OpenXLA commit. Consider the rebase effort, we hope to fully enable the CI after merged Runtime PR (it's the largest and most important PR), then we can directly build Intel GPU target and check it without Extension. Furthermore, make pending code-gen PRs waiting till Runtime PR is submitted, and test them together is an alternative solution
We will make sure these PR won't impact existing target (check by current OpenXLA pre-CI)

penpornk

Thank you for the changes!
@cheshire Are you okay with us going ahead and merging the PR now?

Imported from GitHub PR #7940 This PR aims to add LLVM codegen changes for SPIRV backend. It contains SPIRV target and related intrinsic. Copybara import of the project: -- db35320 by Sheng, Yang <yang.sheng@intel.com>: SPIR changes Merging this change closes #7940 FUTURE_COPYBARA_INTEGRATE_REVIEW=#7940 from Intel-tensorflow:yang/spir db35320 PiperOrigin-RevId: 595934626

cheshire · 2024-01-05T11:28:33Z

I think we should not merge this just yet; without tests and an integration plan the diff by itself makes little sense.

Is there a working branch where all this code works end-to-end? How many tests are passing now? What is the desired end state? What are the CI plans?

Imported from GitHub PR #7980 This PR is originally submited as part of #7940. It aims to make load/store logic more general so that it can be optimized to vector load/store pattern. It changes IR from ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1 %linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2 %linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3 %21 = getelementptr inbounds float, ptr %0, i32 %linear_index_plus_base %22 = load float, ptr %21, align 4, !invariant.load !4 %26 = getelementptr inbounds float, ptr %0, i32 %linear_index1 %27 = load float, ptr %26, align 4, !invariant.load !4 %31 = getelementptr inbounds float, ptr %0, i32 %linear_index2 %32 = load float, ptr %31, align 4, !invariant.load !4 %36 = getelementptr inbounds float, ptr %0, i32 %linear_index3 %37 = load float, ptr %36, align 4, !invariant.load !4 ``` to ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %21 = getelementptr float, ptr %0, i32 %linear_index_plus_base %22 = getelementptr inbounds float, ptr %21, i32 0 %23 = load float, ptr %22, align 4, !invariant.load !4 %29 = getelementptr float, ptr %0, i32 %linear_index_plus_base %30 = getelementptr inbounds float, ptr %29, i32 1 %31 = load float, ptr %30, align 4, !invariant.load !4 %37 = getelementptr float, ptr %0, i32 %linear_index_plus_base %38 = getelementptr inbounds float, ptr %37, i32 2 %39 = load float, ptr %38, align 4, !invariant.load !4 %45 = getelementptr float, ptr %0, i32 %linear_index_plus_base %46 = getelementptr inbounds float, ptr %45, i32 3 %47 = load float, ptr %46, align 4, !invariant.load !4 ``` The former one does not always work for different backends since it needs additional pass to handle GEP pattern. There are only ~20 lines of core code changes, the others are all UTs changes. Copybara import of the project: -- df1286a by Sheng, Yang <yang.sheng@intel.com>: Make vector load/store logic more general -- 66f0782 by Sheng, Yang <yang.sheng@intel.com>: fix IR in UTs -- b944ee0 by Sheng, Yang <yang.sheng@intel.com>: Add comments Merging this change closes #7980 COPYBARA_INTEGRATE_REVIEW=#7980 from Intel-tensorflow:yang/vec b944ee0 PiperOrigin-RevId: 595953368

Imported from GitHub PR openxla/xla#7980 This PR is originally submited as part of openxla/xla#7940. It aims to make load/store logic more general so that it can be optimized to vector load/store pattern. It changes IR from ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1 %linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2 %linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3 %21 = getelementptr inbounds float, ptr %0, i32 %linear_index_plus_base %22 = load float, ptr %21, align 4, !invariant.load !4 %26 = getelementptr inbounds float, ptr %0, i32 %linear_index1 %27 = load float, ptr %26, align 4, !invariant.load !4 %31 = getelementptr inbounds float, ptr %0, i32 %linear_index2 %32 = load float, ptr %31, align 4, !invariant.load !4 %36 = getelementptr inbounds float, ptr %0, i32 %linear_index3 %37 = load float, ptr %36, align 4, !invariant.load !4 ``` to ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %21 = getelementptr float, ptr %0, i32 %linear_index_plus_base %22 = getelementptr inbounds float, ptr %21, i32 0 %23 = load float, ptr %22, align 4, !invariant.load !4 %29 = getelementptr float, ptr %0, i32 %linear_index_plus_base %30 = getelementptr inbounds float, ptr %29, i32 1 %31 = load float, ptr %30, align 4, !invariant.load !4 %37 = getelementptr float, ptr %0, i32 %linear_index_plus_base %38 = getelementptr inbounds float, ptr %37, i32 2 %39 = load float, ptr %38, align 4, !invariant.load !4 %45 = getelementptr float, ptr %0, i32 %linear_index_plus_base %46 = getelementptr inbounds float, ptr %45, i32 3 %47 = load float, ptr %46, align 4, !invariant.load !4 ``` The former one does not always work for different backends since it needs additional pass to handle GEP pattern. There are only ~20 lines of core code changes, the others are all UTs changes. Copybara import of the project: -- df1286a48461dd6337c841d4a353b694ce60bf86 by Sheng, Yang <yang.sheng@intel.com>: Make vector load/store logic more general -- 66f0782ad62e90e01fd8de7b41d20d607f974844 by Sheng, Yang <yang.sheng@intel.com>: fix IR in UTs -- b944ee015c177d4180f9cba6b564cf72e1c80bbc by Sheng, Yang <yang.sheng@intel.com>: Add comments Merging this change closes #7980 PiperOrigin-RevId: 595953368

Zantares · 2024-01-08T13:26:10Z

Hi @cheshire , hope below info can answer your question:

The integration plan is listed in RFC https://github.com/openxla/community/blob/49541f1e8b0bdba6ca003797a5ae01f4a2f8cbb7/rfcs/20231102-intel-gpu.md#overview. The basic functionality can be enabled with PR1 & PR3, that's why we suggest testing it after these 2 PRs are merged.
- SPIRV target (this PR)
- OneDNN lib call. This part will refer the 3rd library call in CPU RFC [RFC] OpenXLA CPU Strategy community#96, so it will be deferred after the related CPU changes are ready
- SYCL runtime (To be submitted)
The CI plan is also roughly listed in RFC https://github.com/openxla/community/blob/49541f1e8b0bdba6ca003797a5ae01f4a2f8cbb7/rfcs/20231102-intel-gpu.md#engineering-impact. Furthermore, we have enabled such public CI based on Intel® Extension for OpenXLA* and shared it in OpenXLA community meeting. Right now, it can only work with specific OpenXLA commit and Intel Extension. Once the related PRs are merged, it can work with OpenXLA main branch.
Is there a working branch where all this code works end-to-end? How many tests are passing now?
Yes, as I mentioned above we have all these changes in specific OpenXLA commit (worked with Intel Extension) and we can pass 95 JAX native UT from public JAX repo and several popular models here
What is the desired end state?
Any hardware based on generic SPIRV/SYCL can be functionality enabled after all PRs are merged. To get advanced performance, plug-in/extension are still needed for critical operations.

cheshire · 2024-01-11T19:17:52Z

OK!

cheshire

Two nits.

cheshire · 2024-01-11T19:18:45Z

xla/service/gpu/target_util.cc

+      return {
+          llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x,
+          llvm::Intrinsic::amdgcn_workitem_id_x,
+          [](llvm::IRBuilder<>* b_) -> llvm::CallInst* {
+            return EmitDeviceFunctionCall(
+                "_Z32__spirv_BuiltInLocalInvocationIdi", {b_->getInt32(0)},
+                {U32}, U64, {b_->getContext()}, b_);
+          },


+1, there's quite a bit of duplication across those EmitDeviceFunctionCall, maybe just extracting it to a function would be better?

cheshire · 2024-01-11T19:19:19Z

xla/service/gpu/target_util.cc

+          gpu_root_names.spir_root == "_Z15__spirv_ocl_pow" ||
+          gpu_root_names.spir_root == "_Z17__spirv_ocl_atan2" ||
+          gpu_root_names.spir_root == "_Z16__spirv_ocl_fmod")
+        return StrCat(gpu_root_names.spir_root, "ff");


Nit: XLA-preferred style is to use braces everywhere, even for single lines.

akuegel · 2024-01-15T06:59:55Z

xla/service/gpu/ir_emission_utils.cc

+// Helper function to emit call to SPIR shfl_down intrinsic.
+llvm::Value* EmitSPIRShflDown(llvm::Value* value, llvm::Value* offset,
+                              llvm::IRBuilder<>* b) {
+  llvm::Module* module = b->GetInsertBlock()->getModule();


Internally the build is broken due to this unused variable and the one below. Can you please remove these two lines?

akuegel

Thank you for the quick fix :)

Imported from GitHub PR openxla/xla#7940 This PR aims to add LLVM codegen changes for SPIRV backend. It contains SPIRV target and related intrinsic. Copybara import of the project: -- 5e2116a277b01777cd319ecb484106b4848a920b by Sheng, Yang <yang.sheng@intel.com>: SPIR changes -- 3695b57a08c8e2f353c670596b691a9479ed2012 by Sheng, Yang <yang.sheng@intel.com>: Remove unused variable Merging this change closes #7940 PiperOrigin-RevId: 598587122

github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 20, 2023

github-actions bot assigned kamaljeeti and xla-rotation Dec 20, 2023

kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 20, 2023

Zantares mentioned this pull request Dec 20, 2023

[RFC] Intel GPU integration openxla/community#99

Merged

kamaljeeti requested a review from sergeykozub December 20, 2023 07:18

akuegel reviewed Dec 20, 2023

View reviewed changes

sergeykozub approved these changes Dec 20, 2023

View reviewed changes

Solaryee force-pushed the yang/spir branch from e0f2700 to 43edcc7 Compare December 21, 2023 02:41

github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 21, 2023

kokoro-team removed kokoro:force-run Forces CI to rerun labels Dec 21, 2023

Solaryee mentioned this pull request Dec 21, 2023

[XLA:GPU] Make load/store logic more general for vectorization #7980

Closed

akuegel approved these changes Dec 21, 2023

View reviewed changes

cheshire suggested changes Dec 21, 2023

View reviewed changes

penpornk requested changes Dec 21, 2023

View reviewed changes

xla/service/gpu/target_util.cc Outdated Show resolved Hide resolved

xla/service/gpu/target_util.cc Outdated Show resolved Hide resolved

xla/service/gpu/target_util.cc Outdated Show resolved Hide resolved

github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 22, 2023

kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 22, 2023

Solaryee force-pushed the yang/spir branch from 592fafa to db35320 Compare December 26, 2023 11:11

github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 26, 2023

kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 26, 2023

tdanyluk approved these changes Jan 4, 2024

View reviewed changes

penpornk approved these changes Jan 5, 2024

View reviewed changes

copybara-service bot mentioned this pull request Jan 5, 2024

PR #7940: LLVM codegen changes for SPIR backend #8206

Closed

cheshire suggested changes Jan 11, 2024

View reviewed changes

Solaryee force-pushed the yang/spir branch from db35320 to aa3f325 Compare January 12, 2024 06:46

github-actions bot added the kokoro:force-run Forces CI to rerun label Jan 12, 2024

Solaryee force-pushed the yang/spir branch from aa3f325 to 30cbda3 Compare January 12, 2024 06:58

kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 12, 2024

SPIR changes

5e2116a

Solaryee force-pushed the yang/spir branch from 30cbda3 to 5e2116a Compare January 15, 2024 03:03

github-actions bot added the kokoro:force-run Forces CI to rerun label Jan 15, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 15, 2024

akuegel suggested changes Jan 15, 2024

View reviewed changes

Remove unused variable

3695b57

github-actions bot added the kokoro:force-run Forces CI to rerun label Jan 15, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 15, 2024

akuegel approved these changes Jan 15, 2024

View reviewed changes

copybara-service bot closed this in b8f07cf Jan 15, 2024

LLVM codegen changes for SPIR backend #7940

LLVM codegen changes for SPIR backend #7940

Uh oh!

Conversation

Solaryee commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cheshire left a comment

Choose a reason for hiding this comment

Uh oh!

penpornk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penpornk commented Dec 21, 2023

Uh oh!

penpornk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zantares commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

penpornk left a comment

Choose a reason for hiding this comment

Uh oh!

cheshire commented Jan 5, 2024

Uh oh!

Zantares commented Jan 8, 2024

Uh oh!

cheshire commented Jan 11, 2024

Uh oh!

cheshire left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akuegel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Solaryee commented Dec 20, 2023 •

edited

Loading

Zantares commented Dec 22, 2023 •

edited

Loading