Conversation
|
!build --dist |
bb876a8 to
0fec8fc
Compare
|
!build --dist |
4 similar comments
|
!build --dist |
|
!build --dist |
|
!build --dist |
|
!build --dist |
| doLocalCopy(params_.dst_bufs.at(0), params_.src_bufs.at(0)); | ||
| if (params_.is_root_in_mesh) { | ||
| // Do a local copy and the subsequent broadcast will be in place. | ||
| doLocalCopy(output_tensor, input_tensor); |
There was a problem hiding this comment.
not for this pr, but in the future all those local copies should be inserted in the fusion as IRs during lowering, so they can potentially be optimized
| std::vector<at::Tensor> src_bufs; | ||
| std::vector<at::Tensor> dst_bufs; | ||
| Team team; // should not have duplicate | ||
| bool is_root_in_mesh = true; |
There was a problem hiding this comment.
this shouldn't be a param's field that can be set, this field is entirely deduced from root and team members. There shouldn't be such a member in the struct
There was a problem hiding this comment.
this field is entirely deduced from root and team members
team always contains root IIUC. Do you mean to let params store root and mesh instead?
There was a problem hiding this comment.
Sorry my comment was unclear/confused.
team always contains root IIUC
Correct, my mistake
Do you mean to let params store root and mesh instead?
Yes, that would be great! Or even better: params could store a pointer to the resharding Expr and/or to the I/O TVs.
Wdyt?
There was a problem hiding this comment.
Or even better: params could store a pointer to the resharding Expr and/or to the I/O TVs.
I'll think about it. One benefit of separating compilation from execution (and thus PR) is to avoid doing too much at runFusion time, e.g., having to analyze TVs and extract information needed for execution.
There was a problem hiding this comment.
This is a good point. While it doesn't look nice to store these pre-computed values like scattered_axis and is_root_in_mesh, it will cut down on compute during execution. Personally, I'm ok with having them stored as parameters even if it's not the cleanest approach.
There was a problem hiding this comment.
I'll think about it. One benefit of separating compilation from execution (and thus PR) is to avoid doing too much at runFusion time, e.g., having to analyze TVs and extract information needed for execution.
Agreed. And let's stick with that for this pr!
On the other hand (for the future) I also see motivation for storing the TV and Expr ptrs:
- it contains the full info 1) which is useful for printing e.g. IO tensors, and, 2) it can be needed in the future when the lowering becomes more fleshed up
- it is more clean/concise/structured way to encode the symbolic Communication
What we could do to achieve best of both world is to store (non-mutable) pointers to IO TensorViews and Expr and precompute as class private data whats needed at instantiation.
There was a problem hiding this comment.
With #2185 , Communication will be an Expr which automatically contains input and output TensorView. That should address your concern?
There was a problem hiding this comment.
I gave up on the team=>mesh change for this PR. It would cause even more divergence with #2185 , so we'll do that in a separate PR if needed.
There was a problem hiding this comment.
With #2185 ,
Communicationwill be anExprwhich automatically contains input and output TensorView. That should address your concern?
Yes! That would be a good way of doing it. If we follow this path, I think we can completely get rid of the struct CommunicationParams (we then only need to pass to the constructor the ReductionOpType, or a pointer to the set/reduction Expr*)
samnordmann
left a comment
There was a problem hiding this comment.
Looks good, thx! I only left minor comments
| std::vector<at::Tensor> src_bufs; | ||
| std::vector<at::Tensor> dst_bufs; | ||
| Team team; // should not have duplicate | ||
| bool is_root_in_mesh = true; |
There was a problem hiding this comment.
Sorry my comment was unclear/confused.
team always contains root IIUC
Correct, my mistake
Do you mean to let params store root and mesh instead?
Yes, that would be great! Or even better: params could store a pointer to the resharding Expr and/or to the I/O TVs.
Wdyt?
| Team team; // should not have duplicates and should contain both the root and | ||
| // the mesh | ||
| c10d::ReduceOp::RedOpType redOp = c10d::ReduceOp::RedOpType::UNUSED; | ||
| int64_t scattered_axis = -1; |
There was a problem hiding this comment.
I had the same question, but I chose to not pulling too many changes in one PR :)
I understand, np.
If we decide to store a ptr to the TensorViews in params, then the info stored in scattered_axis can be inferred later when needed. Imo it's a cleaner solution. However, we can leave it to future pr if you prefer.
Anyway this will probably have to be fixed when we'll go to 2D
cowanmeg
left a comment
There was a problem hiding this comment.
Overall, I think the PR moves in the right direction separating compile time and run time logic. We should discuss offline allocation optimizations, especially in relation to whether it can be pre-computed and relations with NCCL user buffers, etc, but that is well beyond the scope of the PR!
I think beside cleaning up the fixme and adding the buffer size checks to post, I am good with these changes!
| std::vector<at::Tensor> src_bufs; | ||
| std::vector<at::Tensor> dst_bufs; | ||
| Team team; // should not have duplicate | ||
| bool is_root_in_mesh = true; |
There was a problem hiding this comment.
This is a good point. While it doesn't look nice to store these pre-computed values like scattered_axis and is_root_in_mesh, it will cut down on compute during execution. Personally, I'm ok with having them stored as parameters even if it's not the cleanest approach.
I also am unsure which PR is easier to merge first, this one or #2185 |
|
!build --dist |
As a follow up to #2172. `mesh` will eventually go away after the input and output TensorViews of a communication are embedded in the base Expr. Anyhow, `is_root_in_mesh` can be computed from other parameters so is removed.
As a follow up to #2172. `mesh` will eventually go away after the input and output TensorViews of a communication are embedded in the base Expr. Anyhow, `is_root_in_mesh` can be computed from other parameters so is removed.
This PR prepares for integrating Communication/Communicator into FusionExecutor. The lowering of a communication will be done by
FusionExecutor::compileFusion, which is done <= once per shape and where sometimes only meta tensors are available. The execution will be done by::runFusion, which is on the critical path.