[initial build up] `mbarrier`: arrive wait barrier on smem by zasdfgbnm · Pull Request #995 · NVIDIA/Fuser

zasdfgbnm · 2023-09-29T19:43:09Z

Fixes: #992 Required by: #993

This PR introduces mbarrier, an arrive-wait barrier on shared memory. The code for mbarrier itself is ready-to-use, however, there is no passes in our lowering currently using this barrier. In future PR, I will explore changing our block syncs with mbarrier when makes sense.

In this PR, a new test MBarrierTest.Simple is added. This test is a simple gmem->smem->gmem copy kernel. The fusion is scheduled in a way that block sync is needed. And in the test, the lowered kernel is modified to replace the block sync with mbarrier. Because there is no lowering pass using mbarrier, this test is written in a hacky way that it lowers to a kernel first and then modifies the lowered kernel.

zasdfgbnm · 2023-09-30T06:18:45Z

test/test_mbarrier.cpp

+
+  FusionExecutor fe;
+
+  fe.registerPostLoweringHook([](kir::Kernel* kernel) {


Kernel after modification:

__global__ void kernel1(Tensor<float, 2, 2> T0, Tensor<float, 2, 2> T2) { alignas(16) extern __shared__ char array[]; const unsigned smem_offset = 0; nvfuser_index_t i0; i0 = ((nvfuser_index_t)threadIdx.y) + (32 * ((nvfuser_index_t)threadIdx.x)); nvfuser_index_t i1; i1 = ((nvfuser_index_t)threadIdx.x) + (32 * ((nvfuser_index_t)threadIdx.y)); float* T1 = reinterpret_cast<float*>(array + smem_offset + 0); uint64_t* T3 = reinterpret_cast<uint64_t*>(array + smem_offset + 4096); mbarrier::init(toSmem(T3), 1024); T1[i0] = T0[i0]; uint64_t i2; i2 = mbarrier::arrive(toSmem(T3)); mbarrier::wait(toSmem(T3), i2); T2[i1] = T1[i1]; mbarrier::inval(toSmem(T3)); }

Thanks for this. This is really a nice way of testing initial build-out features

zasdfgbnm · 2023-09-30T06:22:31Z

!build

zasdfgbnm · 2023-09-30T06:39:17Z

runtime/mbarrier.cu

+      "{\n"
+      ".reg .pred                P1;\n"
+      "LAB_WAIT:\n"
+      "mbarrier.try_wait.shared.b64 P1, [%0], %1;\n"


Try wait is only available on SM90

jacobhinkle

This is a great step! I will have a look at updating the smem allocator to recognize/use these. One question: I think we support sm75 so will these new kernel nodes work in that case and they fall back to a synchronous barrier?

test/test_mbarrier.cpp

jacobhinkle · 2023-09-30T22:26:23Z

csrc/type.h

-  struct DataTypeToNativeType<data_type> {                       \
-    using type = native_type;                                    \
-  };                                                             \


Was this unused? Could we use this in the switch statement below in getPrimDataTypeSize?

They were used. They were just a copy-paste of DEFINE_DATATYPE_TO_NATIVE_TYPE, so I replaced the copy-pasted code with DEFINE_DATATYPE_TO_NATIVE_TYPE.

This can not be used in primDataTypeSize either, because this requires the data type to be compile-time constant, which is not the case for primDataTypeSize.

zasdfgbnm · 2023-10-01T23:21:01Z

so will these new kernel nodes work in that case and they fall back to a synchronous barrier?

On sm < 80, we should not lower into code that uses mbarrier. It must use sync threads.

liqiangxl

LGTM. Just 2 minor comments.

liqiangxl · 2023-10-03T21:09:52Z

runtime/mbarrier.cu

+}
+
+__device__ inline void wait(uint32_t smem_barrier_ptr, uint64_t state) {
+#if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 900))


The sample code from the doc seems simpler:

Yeah, agree. Changed that.

liqiangxl · 2023-10-03T21:25:52Z

test/test_mbarrier.cpp

+  for (auto expr : fe.kernel()->topLevelExprs()) {
+    remaining_mbarrier_exprs.erase(&typeid(*expr));
+  }
+  EXPECT_TRUE(remaining_mbarrier_exprs.empty());


what's the purpose of this part? Does it ensure that all MBarrier expressions are correctly integrated into the kir? I saw other test cases are directly checking kernel string, e.g. FusionCodegenAllocatedScalars_CUDA

Yes, that's it. And directly checking kernel string is also a way.

jacobhinkle · 2023-10-04T08:32:54Z

@zasdfgbnm I merged #996 so you might want to retry a !build and check the code diff output.

xwang233 · 2023-10-05T18:33:38Z

!build

xwang233 · 2023-10-05T22:06:26Z

nvfuser-ci/job-70932017: codegen_diff_4/9

http://nv/e5M/nvfuser_github_ci/codegen_diff_p10125137_j70932017_1696543064832465742_codediff_48bda6c_5c3d61f_custom_command_20231005_140551.html

Seems like the codegen diff script created too many outputs to stdout that exceeded CI log size limit. I've fixed this in the CI. If it's a concern to you, feel free to restart a new build.

zasdfgbnm · 2023-10-05T23:28:56Z

!build

xwang233 · 2023-10-06T02:29:41Z

!build

xwang233 · 2023-10-06T03:47:28Z

!build

xwang233 · 2023-10-06T05:19:35Z

!build

naoyam

LGTM

csrc/kernel_ir.cpp

naoyam · 2023-10-09T23:52:33Z

test/test_mbarrier.cpp

+
+  FusionExecutor fe;
+
+  fe.registerPostLoweringHook([](kir::Kernel* kernel) {


Thanks for this. This is really a nice way of testing initial build-out features

drzejan2 · 2023-10-10T11:22:34Z

LGTM

zasdfgbnm added 7 commits September 29, 2023 12:42

mbarrier cu

96bcfbe

save

f40ea0d

save

0caca40

mbarrier

0e60a31

fix asm

ba156cf

save

407636f

save

5bd2b33

zasdfgbnm commented Sep 30, 2023

View reviewed changes

test

dc253c2

zasdfgbnm changed the title ~~[Not ready] mbarrier: arrive wait barrier on smem~~ [initial build up] mbarrier: arrive wait barrier on smem Sep 30, 2023

zasdfgbnm added 2 commits September 29, 2023 23:22

format

e0118ed

Merge branch 'main' into mbarrier

c033bb8

zasdfgbnm marked this pull request as ready for review September 30, 2023 06:23

zasdfgbnm requested review from drzejan2, jacobhinkle, liqiangxl, mmigdal-nv and naoyam September 30, 2023 06:36

zasdfgbnm changed the title ~~[initial build up] mbarrier: arrive wait barrier on smem~~ [initial build up] mbarrier: arrive wait barrier on smem Sep 30, 2023

zasdfgbnm commented Sep 30, 2023

View reviewed changes

jacobhinkle reviewed Sep 30, 2023

View reviewed changes

zasdfgbnm added 2 commits October 2, 2023 14:12

Merge branch 'main' into mbarrier

ca69766

test expr found

4e4b3aa

zasdfgbnm mentioned this pull request Oct 2, 2023

Should the codegen diff tool only check the kernel itself? #1007

Closed

liqiangxl approved these changes Oct 3, 2023

View reviewed changes

zasdfgbnm added 2 commits October 5, 2023 10:11

Merge branch 'main' into mbarrier

ef5a4a8

sm9

650656b

zasdfgbnm added 2 commits October 5, 2023 10:18

save

ffd4bd0

tidy

17ad354

Merge branch 'main' into mbarrier

e11b9a1

Merge branch 'main' into mbarrier

ffc0b35

naoyam approved these changes Oct 9, 2023

View reviewed changes

drzejan2 approved these changes Oct 10, 2023

View reviewed changes

error message

a8ca210

zasdfgbnm merged commit 877edeb into main Oct 10, 2023

zasdfgbnm deleted the mbarrier branch October 10, 2023 19:17


		FusionExecutor fe;

		fe.registerPostLoweringHook([](kir::Kernel* kernel) {

Conversation

zasdfgbnm commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm commented Sep 30, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm commented Oct 1, 2023

Uh oh!

liqiangxl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle commented Oct 4, 2023

Uh oh!

xwang233 commented Oct 5, 2023

Uh oh!

xwang233 commented Oct 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm commented Oct 5, 2023

Uh oh!

xwang233 commented Oct 6, 2023

Uh oh!

xwang233 commented Oct 6, 2023

Uh oh!

xwang233 commented Oct 6, 2023

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drzejan2 commented Oct 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zasdfgbnm commented Sep 29, 2023 •

edited

Loading

xwang233 commented Oct 5, 2023 •

edited

Loading