Make Communications IRs inheriting from Expr.#2185
Conversation
3d79609 to
3e6fddf
Compare
3e6fddf to
4560e21
Compare
wujingyue
left a comment
There was a problem hiding this comment.
LGTM in general! I made a few comments to address before I merge this.
| } else { | ||
| assertBufferCount(params_.dst_bufs, 0); | ||
| // TODO add checking symbolic representation of src and dst buffers | ||
| bool Communication::sameAs(const Statement* other) const { |
There was a problem hiding this comment.
I failed to see why we couldn't reuse the "default" implementation:
Line 342 in 4e83569
There was a problem hiding this comment.
Two Communications could be "Expr::sameAs" without being the same, for example if one is Allgather and the other Allreduce. Right?
My goal here is to mimick what's done for other IRs, such as
Line 2674 in 4e83569
But I don't actually use sameAs -- I simply thought implementing it was necessary for matching the IR specs.
So I'm open to any suggestion about this implementation, including removing it
There was a problem hiding this comment.
Expr::sameAs calls Expr::sameOp, which checks equality of all attributes. I believe the communication type is one of the attributes.
There was a problem hiding this comment.
Correct me if Im wrong, but in the current implementation it is not an attribute. The reason is that, IIUC, all attributes need to be Statement*s, and CommunicationParams is not.
There was a problem hiding this comment.
There is addDataAttribute as used in
Lines 2423 to 2424 in 730abb5
cowanmeg
left a comment
There was a problem hiding this comment.
LGTM assuming that the post and validate logic is just moved around! If there is changes in those let me know so I can look more closely.
Also it is great to see Send/Recv finally removed from communicator!
|
!build --dist |
wujingyue
left a comment
There was a problem hiding this comment.
LGTM! I'll try to merge this. I may need to split this up into multiple PRs, but will let you know.
|
I'm in the process of resolving conflicts. I should be able to merge this tomorrow or Monday. |
Thank you! I'm waiting for this one to be merge before going on with host ir dev. Let me know if I can help! |
3ebd7d4 to
0f7fa53
Compare
|
!build --dist |
This PR makes the "Communication" class a proper IR inheriting from Expr. This patch is needed for implementing Host Irs. It is also one step towards making the Communications (and more generally the multidevice module) fully symbolic.
By the way, we proceed to a couple of refactoring, and remove
Communicator::sendRecvmethod.Remarks:
Communicationand one derived class per collective type (Allgather, Allreduce, Broadcast, etc.). Now, there is only the classCommunication, and the collective type is encoded through an enum classCollectiveTypeadded to the parameter memberCommParamsCommunication::postmethod was replaced by a standalone functionpostCollective. The motivation is to scoop out the runtime execution from the symbolic representation of the collective.Collectiveis instantiated with concrete device Idx and concrete at::Tensor, while it should be instantiated with symbolic representations and binded to actual device indices and Aten buffers at runtime. This will be added in a future PR.CI: https://gitlab-master.nvidia.com/dl/pytorch/fuser-gh-mirror/-/pipelines/14923305