[MXNET-331] Single machine All Reduce Topology-aware Communication#11357
[MXNET-331] Single machine All Reduce Topology-aware Communication#11357ctcyang wants to merge 27 commits intoapache:masterfrom ctcyang:feature_multirootv9
Conversation
| } | ||
|
|
||
| private: | ||
| private: |
There was a problem hiding this comment.
nit: should have no extra space here.
There was a problem hiding this comment.
Thanks, I fixed lint errors.
| explicit KVStoreLocal(bool use_device_comm) : KVStore() { | ||
| if (use_device_comm) { | ||
| comm_ = new CommDevice(); | ||
| bool tree = dmlc::GetEnv("MXNET_KVSTORE_USETREE", 0); |
There was a problem hiding this comment.
Can we also have python gpu kvstore test with MXNET_KVSTORE_USETREE set?
| /// \brief the small buffer for compressed data in receiver | ||
| std::vector<NDArray> compressed_recv_buf; | ||
| /// \brief size of allocation in case we do not actually allocate merged | ||
| TShape merged_size; |
| // w = w * alpha*u | ||
| template <typename T> | ||
| inline void ewisemult(const std::vector<int>& u, | ||
| T alpha, |
| kvstore = mx.kv.create('device') | ||
| copy = mx.nd.random_normal(shape=(4,4), ctx=mx.gpu(0)) | ||
| grad = copy.tostype("row_sparse") | ||
| envs = ["","1"] |
There was a problem hiding this comment.
minor suggestion: we could add some util class like https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/autograd.py#L93-L119 which manages the scope of such env var. It sets the env var to some var when entering the scope, and reset the env var when exiting the scope.
…se PCI-E as fallback for GPUs that are not linked by NVLink
|
Recreated repo, which unfortunately has detached the branch from this PR. The code is at: https://github.com/ctcyang/incubator-mxnet/tree/feature_multirootv9 |
|
|
||
| using KeyAttrs = std::tuple<int, TShape, int>; | ||
| // try to allocate buff on device evenly | ||
| void InitMergeBuffer(const std::vector<Context>& devs) { |
There was a problem hiding this comment.
Just to confirm did you make any change to this function? Asking because the move makes it hard to see the diff for this.
There was a problem hiding this comment.
Nope. I made no changes to the existing --kv-store device.
| // track of each key's shape within BufferEntry | ||
| // -this information is required for inherited Reduce- and | ||
| // BroadcastRowSparse | ||
| InitMergeBuffer(devs_); |
There was a problem hiding this comment.
Why do we need the regular merge buffer too?
There was a problem hiding this comment.
ReduceRowSparse and BroadcastRowSparse will be implemented using topology-aware communication in the future. For now, the regular merge buffer is needed, so we can fallback to the existing --kv-store device behaviour for ReduceRowSparse and BroadcastRowSparse. This fallback is tested in the changed unittest tests/python/gpu/test_kvstore_gpu.py.
Due to the delay_alloc functionality, this does not cost any actual memory allocation if we don't end up using the InitMergeBuffer temporary memories.
|
Closed this PR. I deleted my old repo, and made a new one. See new PR with code here: #11591 |
Description
Single machine All Reduce Topology-aware Communication
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments