-
Notifications
You must be signed in to change notification settings - Fork 719
LLVM codegen changes for SPIR backend #7940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // for that dimensions. This helps LLVM generate vectorized codes | ||
| // in that cases. | ||
| llvm::Value* row_index = nullptr; | ||
| if (!launch_config_.row_vectorized) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, these changes are just a NFC refactoring, and are not related to the real changes being done in this PR? If yes, maybe split them into their own PR?
I would say the new code is not necessarily more easy to read. I agree with the changes to swap if/else blocks and get rid of the negation, but I would prefer if the loop still starts from 1 and we handle the i = 0 case outside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a NFC refactoring, I can put it in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This PR only contains SPIR target/intrinsic now.
cheshire
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to test those changes? How do we plan to trigger Intel codegen codepath?
penpornk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
| return { | ||
| llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x, | ||
| llvm::Intrinsic::amdgcn_workitem_id_x, | ||
| [](llvm::IRBuilder<>* b_) -> llvm::CallInst* { | ||
| return EmitDeviceFunctionCall( | ||
| "_Z32__spirv_BuiltInLocalInvocationIdi", {b_->getInt32(0)}, | ||
| {U32}, U64, {b_->getContext()}, b_); | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be refactored to a templated struct because we only need the code for one device backend at the same time. But I can look into that after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, there's quite a bit of duplication across those EmitDeviceFunctionCall, maybe just extracting it to a function would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make it cleaner too :)
|
Re: @cheshire:
@ShengYang1 Do you already have a CI running? Are the results publicly available now? If so, could you please share the CI link here for the time being? (We can look into adding the CI link to our github check dashboard later.) |
penpornk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More feedback from our code check. (For these functions, we prefer the std version.)
According to the suggestion in RFC, we have initial CI plan here and welcome your suggestions:
|
penpornk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes!
@cheshire Are you okay with us going ahead and merging the PR now?
Imported from GitHub PR #7940 This PR aims to add LLVM codegen changes for SPIRV backend. It contains SPIRV target and related intrinsic. Copybara import of the project: -- db35320 by Sheng, Yang <yang.sheng@intel.com>: SPIR changes Merging this change closes #7940 FUTURE_COPYBARA_INTEGRATE_REVIEW=#7940 from Intel-tensorflow:yang/spir db35320 PiperOrigin-RevId: 595934626
|
I think we should not merge this just yet; without tests and an integration plan the diff by itself makes little sense. Is there a working branch where all this code works end-to-end? How many tests are passing now? What is the desired end state? What are the CI plans? |
Imported from GitHub PR #7980 This PR is originally submited as part of #7940. It aims to make load/store logic more general so that it can be optimized to vector load/store pattern. It changes IR from ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1 %linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2 %linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3 %21 = getelementptr inbounds float, ptr %0, i32 %linear_index_plus_base %22 = load float, ptr %21, align 4, !invariant.load !4 %26 = getelementptr inbounds float, ptr %0, i32 %linear_index1 %27 = load float, ptr %26, align 4, !invariant.load !4 %31 = getelementptr inbounds float, ptr %0, i32 %linear_index2 %32 = load float, ptr %31, align 4, !invariant.load !4 %36 = getelementptr inbounds float, ptr %0, i32 %linear_index3 %37 = load float, ptr %36, align 4, !invariant.load !4 ``` to ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %21 = getelementptr float, ptr %0, i32 %linear_index_plus_base %22 = getelementptr inbounds float, ptr %21, i32 0 %23 = load float, ptr %22, align 4, !invariant.load !4 %29 = getelementptr float, ptr %0, i32 %linear_index_plus_base %30 = getelementptr inbounds float, ptr %29, i32 1 %31 = load float, ptr %30, align 4, !invariant.load !4 %37 = getelementptr float, ptr %0, i32 %linear_index_plus_base %38 = getelementptr inbounds float, ptr %37, i32 2 %39 = load float, ptr %38, align 4, !invariant.load !4 %45 = getelementptr float, ptr %0, i32 %linear_index_plus_base %46 = getelementptr inbounds float, ptr %45, i32 3 %47 = load float, ptr %46, align 4, !invariant.load !4 ``` The former one does not always work for different backends since it needs additional pass to handle GEP pattern. There are only ~20 lines of core code changes, the others are all UTs changes. Copybara import of the project: -- df1286a by Sheng, Yang <yang.sheng@intel.com>: Make vector load/store logic more general -- 66f0782 by Sheng, Yang <yang.sheng@intel.com>: fix IR in UTs -- b944ee0 by Sheng, Yang <yang.sheng@intel.com>: Add comments Merging this change closes #7980 COPYBARA_INTEGRATE_REVIEW=#7980 from Intel-tensorflow:yang/vec b944ee0 PiperOrigin-RevId: 595953368
Imported from GitHub PR openxla/xla#7980 This PR is originally submited as part of openxla/xla#7940. It aims to make load/store logic more general so that it can be optimized to vector load/store pattern. It changes IR from ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1 %linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2 %linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3 %21 = getelementptr inbounds float, ptr %0, i32 %linear_index_plus_base %22 = load float, ptr %21, align 4, !invariant.load !4 %26 = getelementptr inbounds float, ptr %0, i32 %linear_index1 %27 = load float, ptr %26, align 4, !invariant.load !4 %31 = getelementptr inbounds float, ptr %0, i32 %linear_index2 %32 = load float, ptr %31, align 4, !invariant.load !4 %36 = getelementptr inbounds float, ptr %0, i32 %linear_index3 %37 = load float, ptr %36, align 4, !invariant.load !4 ``` to ```llvm %linear_index_plus_base = add nuw nsw i32 %linear_index_base, %loop.indvar %21 = getelementptr float, ptr %0, i32 %linear_index_plus_base %22 = getelementptr inbounds float, ptr %21, i32 0 %23 = load float, ptr %22, align 4, !invariant.load !4 %29 = getelementptr float, ptr %0, i32 %linear_index_plus_base %30 = getelementptr inbounds float, ptr %29, i32 1 %31 = load float, ptr %30, align 4, !invariant.load !4 %37 = getelementptr float, ptr %0, i32 %linear_index_plus_base %38 = getelementptr inbounds float, ptr %37, i32 2 %39 = load float, ptr %38, align 4, !invariant.load !4 %45 = getelementptr float, ptr %0, i32 %linear_index_plus_base %46 = getelementptr inbounds float, ptr %45, i32 3 %47 = load float, ptr %46, align 4, !invariant.load !4 ``` The former one does not always work for different backends since it needs additional pass to handle GEP pattern. There are only ~20 lines of core code changes, the others are all UTs changes. Copybara import of the project: -- df1286a48461dd6337c841d4a353b694ce60bf86 by Sheng, Yang <yang.sheng@intel.com>: Make vector load/store logic more general -- 66f0782ad62e90e01fd8de7b41d20d607f974844 by Sheng, Yang <yang.sheng@intel.com>: fix IR in UTs -- b944ee015c177d4180f9cba6b564cf72e1c80bbc by Sheng, Yang <yang.sheng@intel.com>: Add comments Merging this change closes #7980 PiperOrigin-RevId: 595953368
|
Hi @cheshire , hope below info can answer your question:
|
|
OK! |
cheshire
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nits.
| return { | ||
| llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x, | ||
| llvm::Intrinsic::amdgcn_workitem_id_x, | ||
| [](llvm::IRBuilder<>* b_) -> llvm::CallInst* { | ||
| return EmitDeviceFunctionCall( | ||
| "_Z32__spirv_BuiltInLocalInvocationIdi", {b_->getInt32(0)}, | ||
| {U32}, U64, {b_->getContext()}, b_); | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, there's quite a bit of duplication across those EmitDeviceFunctionCall, maybe just extracting it to a function would be better?
| gpu_root_names.spir_root == "_Z15__spirv_ocl_pow" || | ||
| gpu_root_names.spir_root == "_Z17__spirv_ocl_atan2" || | ||
| gpu_root_names.spir_root == "_Z16__spirv_ocl_fmod") | ||
| return StrCat(gpu_root_names.spir_root, "ff"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: XLA-preferred style is to use braces everywhere, even for single lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
xla/service/gpu/ir_emission_utils.cc
Outdated
| // Helper function to emit call to SPIR shfl_down intrinsic. | ||
| llvm::Value* EmitSPIRShflDown(llvm::Value* value, llvm::Value* offset, | ||
| llvm::IRBuilder<>* b) { | ||
| llvm::Module* module = b->GetInsertBlock()->getModule(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally the build is broken due to this unused variable and the one below. Can you please remove these two lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
akuegel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick fix :)
Imported from GitHub PR openxla/xla#7940 This PR aims to add LLVM codegen changes for SPIRV backend. It contains SPIRV target and related intrinsic. Copybara import of the project: -- 5e2116a277b01777cd319ecb484106b4848a920b by Sheng, Yang <yang.sheng@intel.com>: SPIR changes -- 3695b57a08c8e2f353c670596b691a9479ed2012 by Sheng, Yang <yang.sheng@intel.com>: Remove unused variable Merging this change closes #7940 PiperOrigin-RevId: 598587122
This PR aims to add LLVM codegen changes for SPIRV backend. It contains SPIRV target and related intrinsic.