Improve sparse pull performance for gluon trainer#11429
Improve sparse pull performance for gluon trainer#11429eric-haibin-lin merged 18 commits intoapache:masterfrom
Conversation
|
@junrushao1994 pls help review engine code. Thanks! |
|
The engine code looks good to me |
| update_on_kvstore = False | ||
| if 'dist' in kvstore.type: | ||
| # kv.pull(row_sparse_grad) is not supported for dist kvstore | ||
| update_on_kvstore = self._contains_sparse_weight or self._contains_sparse_grad |
There was a problem hiding this comment.
from the comment I'm guessing you meant not self._contains_sparse_weight and not self._contains_sparse_grad
There was a problem hiding this comment.
This is intended. kv.pull(row_sparse_grad) is not supported for dist kvstore, so we want to set update_on_kvstore = True if there's sparse grad.
|
how does the performance look? @eric-haibin-lin @leezu |
|
@szha For dense weight with sparse grad, this PR pulls sparse grad instead of dense weight. Reduces gpu2gpu copy time from 60ms to less than 1ms. |
| NDArrayHandle* vals, | ||
| int priority); | ||
| int priority, | ||
| bool ignore_sparse = true); |
There was a problem hiding this comment.
C API doesn't support default value
| NDArrayHandle* vals, | ||
| int priority); | ||
| int priority, | ||
| bool ignore_sparse = true); |
There was a problem hiding this comment.
Added extra CAPIs instead of adding default value to this one
| /*! \brief Prioritized sync operation on CPU */ | ||
| kCPUPrioritized, | ||
| /*! \brief Prioritized sync operation on GPU */ | ||
| kGPUPrioritized, |
There was a problem hiding this comment.
Moved to the end.
| raise RuntimeError("Cannot set update_on_kvstore to False when sparse " | ||
| "gradients and/or sparse weights are present for " | ||
| "Parameter '%s'."%param.name) | ||
| raise RuntimeError("Cannot set update_on_kvstore to False when sparse weights " |
There was a problem hiding this comment.
Does this have to be a error, or can it be a warning and automatically use update_on_kvstore ?
There was a problem hiding this comment.
Also, shouldn't this be outside the if contains_sparse_weight condition?
There was a problem hiding this comment.
By default update_on_kvstore is None. It's only set if user provides a value on purpose. I think an explicit err is better, since we cannot satisfy user's original intent.
If user set update_on_kvstore to False and the model contains no sparse weight, it's totally fine. Why should this be outside the if condition?
| For `RowSparseNDArray` values, this call is ignored, | ||
| please use ``row_sparse_pull`` instead. | ||
| pull with `RowSparseNDArray` is not supported for dist kvstore. | ||
| Please use ``row_sparse_pull`` instead. |
There was a problem hiding this comment.
Should ignore_sparse be defaulted to false to be consistent with previous behavior?
There was a problem hiding this comment.
Previous behavior is to always ignore sparse. So it's consistent
|
@piiswrong @rahul003 pls review again, thanks. |
* clip sparse grad. fix _reduce for rowsparse param * fix kvstore init for local kv * trigger * pull with ignore sparse * rsp pull with priority * add doc; * fix bug in sparse kvstore * +kvstore test * add dist kvstore test * enhance dist kv test * fix lint * fix lint * CR comments
Description
@leezu @rahul003
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments