[TIR] Introduce Pass InjectPTXLDG32 #13973

andy-yang-1 · 2023-02-13T11:57:12Z

This PR introduces a new pass InjectPTXLDG32 to change the if_then_else call node to ptx_pred_ldg32 call node. When the store buffer is local and the load value is global, the pass can change the if_then_else pattern to a ptx pattern.

Test the pass with

with tvm.transform.PassContext(config={"tir.ptx_pred_ldg32": True}): 
    mod = tvm.build(f, target="cuda")

tvm-bot · 2023-02-13T11:57:16Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Hzfengsy, @junrushao, @quic-sanirudh, @shingjan _{See #10317 for details}

_{Generated by tvm-bot}

Hzfengsy · 2023-02-13T14:06:54Z

cc @spectrometerHBH

masahi · 2023-02-13T21:10:20Z

I hope TIR BufferLoad would natively support predication, rather than relying on intrinsics. See also https://discuss.tvm.apache.org/t/huge-pr-affecting-buffer-access-semantics-landed/12261/10. cc @wrongtest-intellif @vinx13 @junrushao

junrushao · 2023-02-14T05:40:40Z

src/target/source/codegen_cuda.cc

+    this->stream << "asm volatile (\n" ;
+    this->PrintIndent();
+    stream << "\"{.reg .pred p;\\n\"\n" ;
+    this->PrintIndent();
+    stream << "\" setp.ne.b32 p, %2, 0;\\n\"\n" ;
+    this->PrintIndent();
+    stream << "\" @!p mov.b32 %0, 0;\\n\"\n";
+    this->PrintIndent();
+    stream << "\" @p ld.global.nc.f32 %0, [%1];}\\n\"\n" ;
+    // stream << "\" @p ld.global.nc.L2::128B.f32 %0, [%1];}\\n\"\n" ;
+    this->PrintIndent();
+    stream << ": \"=f\"(" << reg << "[" << local_addr << "]" << ")\n" ; 
+    this->PrintIndent();
+    stream << ": \"l\"((void*)(" << global_buffer << "+" << global_addr << ")), \"r\"((int)" << guard << ")\n" ;
+    this->PrintIndent();
+    stream << ");\n" ;


nit: you may use multi-line string in C++

junrushao · 2023-02-14T05:41:19Z

src/tir/transforms/inject_ptx_ldg32.cc

+
+// The pass can now be invoked via the pass infrastructure, but we also add a Python binding for it
+TVM_REGISTER_GLOBAL("tir.transform.InjectPTXLDG32").set_body_typed(InjectPTXLDG32);
+


nit: you may use clang-format to somehow organize the file slightly better

junrushao · 2023-02-14T05:43:54Z

src/driver/driver_api.cc

 TVM_REGISTER_PASS_CONFIG_OPTION("tir.merge_async_commit_queue_scope", Bool);
 TVM_REGISTER_PASS_CONFIG_OPTION("tir.instrument_lwp", Bool);
 TVM_REGISTER_PASS_CONFIG_OPTION("tir.vtcm_capacity", Integer);
+TVM_REGISTER_PASS_CONFIG_OPTION("tir.ptx_pred_ldg32", Bool);


the name is a bit confusing, can you discuss with @rainy-memory and figure out together something more comprehensible?

our key objective is that users may need to set at most one flag (zero is the best if possible) so that they could deliver the best GEMM performance out of the box

junrushao · 2023-02-15T19:08:43Z

Let's fix the lint and merge it in asap. If you don't like that pylint claims about variable naming, just do:

# pylint: disable=invalid-name
you code
# pylint: enable=invalid-name

tqchen · 2023-02-16T18:23:06Z

include/tvm/tir/builtin.h

+ * \brief tvm intrinsic for ptx predicate load with 32-bit data type.
+ *
+ */
+TVM_DLL const Op& inject_ptx_ldg32();


naming: we do not need inject prefix as it can just be ptx_ldg32

This PR introduces a new pass InjectPTXLDG32 to change the `if_then_else` call node to `ptx_pred_ldg32` call node. When the store buffer is local and the load value is global, the pass can change the if_then_else pattern to a ptx pattern. Test the pass with: ```python with tvm.transform.PassContext(config={"tir.ptx_pred_ldg32": True}): mod = tvm.build(f, target="cuda") ````

junrushao · 2023-02-17T06:55:34Z

src/target/source/codegen_cuda.cc

+    this->stream << "asm volatile (\n";
+    this->stream << "\"{.reg .pred p;\\n\"\n";
+    this->stream << "\" setp.ne.b32 p, %2, 0;\\n\"\n";
+    this->stream << "\" @!p mov.b32 %0, 0;\\n\"\n";
+    this->stream << "\" @p ld.global.nc.f32 %0, [%1];}\\n\"\n";
+    // stream << "\" @p ld.global.nc.L2::128B.f32 %0, [%1];}\\n\"\n" ;
+    stream << ": \"=f\"(" << reg << "[" << local_addr << "]"
+           << ")\n";
+    stream << ": \"l\"((void*)(" << global_buffer << "+" << global_addr << ")), \"r\"((int)"
+           << guard << ")\n";
+    stream << ");\n";


perhaps it would be clearer to write this way:

https://github.com/apache/tvm/pull/13966/files#diff-28ce493acf6a737cd561f3996bd897c8c14edc056f9125503344453dcf390d49R668-R690

This PR introduces a new pass InjectPTXLDG32 to change the `if_then_else` call node to `ptx_pred_ldg32` call node. When the store buffer is local and the load value is global, the pass can change the if_then_else pattern to a ptx pattern. Test the pass with: ```python with tvm.transform.PassContext(config={"tir.ptx_pred_ldg32": True}): mod = tvm.build(f, target="cuda") ````

tqchen

my comments have been address, will let @junrushao handle this

junrushao · 2023-02-17T16:30:04Z

@andy-yang-1 please fix the unittests and we are good to go

This PR introduces a new pass InjectPTXLDG32 to change the `if_then_else` call node to `ptx_pred_ldg32` call node. When the store buffer is local and the load value is global, the pass can change the if_then_else pattern to a ptx pattern. Test the pass with ```python with tvm.transform.PassContext(config={"tir.ptx_pred_ldg32": True}): mod = tvm.build(f, target="cuda") ```

add ptx load global 32bit

0850d2b

Update inject_ptx_ldg32.cc

38c64df

junrushao reviewed Feb 14, 2023

View reviewed changes

andy-yang-1 added 2 commits February 16, 2023 14:16

test use clang-format

690db54

remove printIndent

78141e3

tqchen requested changes Feb 16, 2023

View reviewed changes

change inject_ptx_ldg32 to ptx_ldg32

bb7a70a

junrushao self-assigned this Feb 17, 2023

junrushao force-pushed the main branch from 2af333b to 36f9ad9 Compare February 17, 2023 06:42

andy-yang-1 added 2 commits February 17, 2023 14:54

Merge branch 'main' of https://github.com/andy-yang-1/tvm

89d2a7d

junrushao force-pushed the main branch from 36f9ad9 to 3b04bf3 Compare February 17, 2023 06:54

junrushao reviewed Feb 17, 2023

View reviewed changes

andy-yang-1 added 2 commits February 17, 2023 15:29

Merge branch 'main' of https://github.com/andy-yang-1/tvm

2259063

junrushao force-pushed the main branch from 3b04bf3 to a458707 Compare February 17, 2023 07:47

tqchen approved these changes Feb 17, 2023

View reviewed changes

junrushao assigned cyx-6 Feb 18, 2023

andy-yang-1 and others added 4 commits February 18, 2023 10:18

Update test_inject_ptx_ldg32.py

d86324a

Merge branch 'main' of https://github.com/andy-yang-1/tvm

baacf71

Update builtin.h

b72d775

Merge branch 'apache:main' into main

f47b518

junrushao merged commit 87bb8b1 into apache:main Feb 18, 2023

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Introduce Pass InjectPTXLDG32 #13973

[TIR] Introduce Pass InjectPTXLDG32 #13973

Uh oh!

andy-yang-1 commented Feb 13, 2023

Uh oh!

tvm-bot commented Feb 13, 2023

Uh oh!

Hzfengsy commented Feb 13, 2023

Uh oh!

masahi commented Feb 13, 2023

Uh oh!

junrushao Feb 14, 2023

Uh oh!

junrushao Feb 14, 2023

Uh oh!

junrushao Feb 14, 2023

Uh oh!

junrushao commented Feb 15, 2023

Uh oh!

tqchen Feb 16, 2023

Uh oh!

junrushao Feb 17, 2023

Uh oh!

tqchen left a comment

Uh oh!

junrushao commented Feb 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants


		// The pass can now be invoked via the pass infrastructure, but we also add a Python binding for it
		TVM_REGISTER_GLOBAL("tir.transform.InjectPTXLDG32").set_body_typed(InjectPTXLDG32);

[TIR] Introduce Pass InjectPTXLDG32 #13973

[TIR] Introduce Pass InjectPTXLDG32 #13973

Uh oh!

Conversation

andy-yang-1 commented Feb 13, 2023

Uh oh!

tvm-bot commented Feb 13, 2023

Uh oh!

Hzfengsy commented Feb 13, 2023

Uh oh!

masahi commented Feb 13, 2023

Uh oh!

junrushao Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

junrushao Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

junrushao Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

junrushao commented Feb 15, 2023

Uh oh!

tqchen Feb 16, 2023

Choose a reason for hiding this comment

Uh oh!

junrushao Feb 17, 2023

Choose a reason for hiding this comment

Uh oh!

tqchen left a comment

Choose a reason for hiding this comment

Uh oh!

junrushao commented Feb 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants