Separate the lowering and the execution of a communication. by wujingyue · Pull Request #2172 · NVIDIA/Fuser

wujingyue · 2024-05-01T17:50:00Z

This PR prepares for integrating Communication/Communicator into FusionExecutor. The lowering of a communication will be done by FusionExecutor::compileFusion, which is done <= once per shape and where sometimes only meta tensors are available. The execution will be done by ::runFusion, which is on the critical path.

wujingyue · 2024-05-01T17:54:19Z

!build --dist

wujingyue · 2024-05-01T19:10:08Z

!build --dist

wujingyue · 2024-05-01T23:31:28Z

!build --dist

wujingyue · 2024-05-01T23:35:51Z

!build --dist

wujingyue · 2024-05-02T06:17:26Z

!build --dist

wujingyue · 2024-05-02T22:27:57Z

!build --dist

csrc/multidevice/communication.cpp

samnordmann · 2024-05-03T07:57:54Z

csrc/multidevice/communication.cpp

-      doLocalCopy(params_.dst_bufs.at(0), params_.src_bufs.at(0));
+    if (params_.is_root_in_mesh) {
+      // Do a local copy and the subsequent broadcast will be in place.
+      doLocalCopy(output_tensor, input_tensor);


not for this pr, but in the future all those local copies should be inserted in the fusion as IRs during lowering, so they can potentially be optimized

csrc/multidevice/communication.cpp

csrc/multidevice/communication.h

samnordmann · 2024-05-03T08:27:57Z

csrc/multidevice/communication.h

-  std::vector<at::Tensor> src_bufs;
-  std::vector<at::Tensor> dst_bufs;
-  Team team; // should not have duplicate
+  bool is_root_in_mesh = true;


this shouldn't be a param's field that can be set, this field is entirely deduced from root and team members. There shouldn't be such a member in the struct

this field is entirely deduced from root and team members

team always contains root IIUC. Do you mean to let params store root and mesh instead?

Sorry my comment was unclear/confused.

team always contains root IIUC

Correct, my mistake

Do you mean to let params store root and mesh instead?

Yes, that would be great! Or even better: params could store a pointer to the resharding Expr and/or to the I/O TVs.
Wdyt?

Or even better: params could store a pointer to the resharding Expr and/or to the I/O TVs.

I'll think about it. One benefit of separating compilation from execution (and thus PR) is to avoid doing too much at runFusion time, e.g., having to analyze TVs and extract information needed for execution.

This is a good point. While it doesn't look nice to store these pre-computed values like scattered_axis and is_root_in_mesh, it will cut down on compute during execution. Personally, I'm ok with having them stored as parameters even if it's not the cleanest approach.

I'll think about it. One benefit of separating compilation from execution (and thus PR) is to avoid doing too much at runFusion time, e.g., having to analyze TVs and extract information needed for execution.

Agreed. And let's stick with that for this pr!

On the other hand (for the future) I also see motivation for storing the TV and Expr ptrs:

it contains the full info 1) which is useful for printing e.g. IO tensors, and, 2) it can be needed in the future when the lowering becomes more fleshed up

it is more clean/concise/structured way to encode the symbolic Communication

What we could do to achieve best of both world is to store (non-mutable) pointers to IO TensorViews and Expr and precompute as class private data whats needed at instantiation.

With #2185 , Communication will be an Expr which automatically contains input and output TensorView. That should address your concern?

I gave up on the team=>mesh change for this PR. It would cause even more divergence with #2185 , so we'll do that in a separate PR if needed.

With #2185 , Communication will be an Expr which automatically contains input and output TensorView. That should address your concern?

Yes! That would be a good way of doing it. If we follow this path, I think we can completely get rid of the struct CommunicationParams (we then only need to pass to the constructor the ReductionOpType, or a pointer to the set/reduction Expr*)

csrc/multidevice/lower_communication.cpp

tests/cpp/test_multidevice_communications.cpp

csrc/multidevice/communication.cpp

samnordmann

Looks good, thx! I only left minor comments

csrc/multidevice/communication.cpp

samnordmann · 2024-05-06T09:37:36Z

csrc/multidevice/communication.h

-  std::vector<at::Tensor> src_bufs;
-  std::vector<at::Tensor> dst_bufs;
-  Team team; // should not have duplicate
+  bool is_root_in_mesh = true;


Sorry my comment was unclear/confused.

team always contains root IIUC

Correct, my mistake

Do you mean to let params store root and mesh instead?

Yes, that would be great! Or even better: params could store a pointer to the resharding Expr and/or to the I/O TVs.
Wdyt?

samnordmann · 2024-05-06T09:41:45Z

csrc/multidevice/communication.h

+  Team team; // should not have duplicates and should contain both the root and
+             // the mesh
  c10d::ReduceOp::RedOpType redOp = c10d::ReduceOp::RedOpType::UNUSED;
+  int64_t scattered_axis = -1;


I had the same question, but I chose to not pulling too many changes in one PR :)

I understand, np.

If we decide to store a ptr to the TensorViews in params, then the info stored in scattered_axis can be inferred later when needed. Imo it's a cleaner solution. However, we can leave it to future pr if you prefer.
Anyway this will probably have to be fixed when we'll go to 2D

csrc/multidevice/communication.h

tests/cpp/test_multidevice_communications.cpp

cowanmeg

Overall, I think the PR moves in the right direction separating compile time and run time logic. We should discuss offline allocation optimizations, especially in relation to whether it can be pre-computed and relations with NCCL user buffers, etc, but that is well beyond the scope of the PR!

I think beside cleaning up the fixme and adding the buffer size checks to post, I am good with these changes!

cowanmeg · 2024-05-06T22:53:30Z

csrc/multidevice/communication.h

-  std::vector<at::Tensor> src_bufs;
-  std::vector<at::Tensor> dst_bufs;
-  Team team; // should not have duplicate
+  bool is_root_in_mesh = true;


This is a good point. While it doesn't look nice to store these pre-computed values like scattered_axis and is_root_in_mesh, it will cut down on compute during execution. Personally, I'm ok with having them stored as parameters even if it's not the cleanest approach.

cowanmeg · 2024-05-06T23:11:47Z

Overall, I think the PR moves in the right direction separating compile time and run time logic. We should discuss offline allocation optimizations, especially in relation to whether it can be pre-computed and relations with NCCL user buffers, etc, but that is well beyond the scope of the PR!

I think beside cleaning up the fixme and adding the buffer size checks to post, I am good with these changes!

I also am unsure which PR is easier to merge first, this one or #2185

wujingyue · 2024-05-09T00:02:15Z

!build --dist

As a follow up to #2172. `mesh` will eventually go away after the input and output TensorViews of a communication are embedded in the base Expr. Anyhow, `is_root_in_mesh` can be computed from other parameters so is removed.

wujingyue force-pushed the wjy/comm branch 2 times, most recently from bb876a8 to 0fec8fc Compare May 1, 2024 18:11

wujingyue requested review from cowanmeg and samnordmann May 1, 2024 18:12

wujingyue added 6 commits May 2, 2024 22:29

Separate compile-time and run-time logic.

222da70

Fix test_multidevice_communications.cpp.

bfe8deb

More checks.

1e8d6c8

Fix clangtidy.

5296a6b

Avoid rounding error.

ec3f831

Fix a bug in broadcast.

0cf8e27

wujingyue force-pushed the wjy/comm branch from 9222c61 to 0cf8e27 Compare May 2, 2024 22:30

samnordmann reviewed May 3, 2024

View reviewed changes

cowanmeg reviewed May 3, 2024

View reviewed changes

csrc/multidevice/communication.cpp Show resolved Hide resolved

csrc/multidevice/communication.cpp Show resolved Hide resolved

wujingyue mentioned this pull request May 5, 2024

Make Communications IRs inheriting from Expr. #2185

Merged

wujingyue added 2 commits May 5, 2024 23:17

Same I/O tensors for repetitions.

ecc5678

Reuse input/output buffers for tests.

5412c51

wujingyue requested a review from samnordmann May 6, 2024 06:53

samnordmann reviewed May 6, 2024

View reviewed changes

cowanmeg approved these changes May 6, 2024

View reviewed changes

Minor fixes.

17afd67

samnordmann approved these changes May 7, 2024

View reviewed changes

wujingyue force-pushed the wjy/comm branch from b267670 to 17afd67 Compare May 8, 2024 21:06

Add more checks.

56c8cf6

wujingyue added 3 commits May 8, 2024 22:45

Merge branch 'main' into wjy/comm

7a935fa

Assert at the right place.

04017aa

A similar fix to Scatter.

cdeae2e

Removed changes that shouldn't be in this PR.

f2c695e

wujingyue merged commit 66b2a0a into main May 9, 2024

wujingyue deleted the wjy/comm branch May 9, 2024 05:18

samnordmann mentioned this pull request May 14, 2024

Fix bug in scatter #2245

Merged

wujingyue mentioned this pull request May 15, 2024

Remove is_root_in_mesh from CommParams and add mesh. #2250

Merged

wujingyue added the cleanup label Nov 28, 2024

Conversation

wujingyue commented May 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wujingyue commented May 1, 2024

Uh oh!

wujingyue commented May 1, 2024

Uh oh!

wujingyue commented May 1, 2024

Uh oh!

wujingyue commented May 1, 2024

Uh oh!

wujingyue commented May 2, 2024

Uh oh!

wujingyue commented May 2, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wujingyue May 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samnordmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cowanmeg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cowanmeg commented May 6, 2024

Uh oh!

wujingyue commented May 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wujingyue commented May 1, 2024 •

edited

Loading

wujingyue May 6, 2024 •

edited

Loading