-
Notifications
You must be signed in to change notification settings - Fork 6.7k
use nd for accuracy calculation #9583
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -380,23 +380,27 @@ def update(self, labels, preds): | |
| Parameters | ||
| ---------- | ||
| labels : list of `NDArray` | ||
| The labels of the data. | ||
| The labels of the data with class indices as values, one per sample. | ||
|
|
||
| preds : list of `NDArray` | ||
| Predicted values. | ||
| Prediction values for samples. Each prediction value can either be the class index, | ||
| or a vector of likelihoods for all classes. | ||
| """ | ||
| check_label_shapes(labels, preds) | ||
|
|
||
| for label, pred_label in zip(labels, preds): | ||
| if pred_label.shape != label.shape: | ||
| pred_label = ndarray.argmax(pred_label, axis=self.axis) | ||
| pred_label = pred_label.asnumpy().astype('int32') | ||
| label = label.asnumpy().astype('int32') | ||
| pred_label = pred_label.astype('int32') | ||
| label = label.astype('int32') | ||
|
|
||
| check_label_shapes(label, pred_label) | ||
|
|
||
| self.sum_metric += (pred_label.flat == label.flat).sum() | ||
| self.num_inst += len(pred_label.flat) | ||
| if pred_label.context != label.context: | ||
| pred_label = pred_label.as_in_context(label.context) | ||
|
|
||
| self.sum_metric += (pred_label.flatten() == label.flatten()).sum().asscalar() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. asscalar is equivalent to : asnumpy()[0] This PR did not solve the problem of the metric being computed in the numpy world
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Computation happens before asnumpy() happens, so nothing is happening in the numpy world other than passing out a scalar value. May I ask what your interest is in this PR? Do you have a use case that benefits from using ndarray for metric?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe my understanding is flawed but in order to be able to bring back this value on the CPU to self.sum_metric, we will need to do a wait_on_read on the pred_label ? we will wait for the label_pred to be computed before we load the next batch on the GPU. Whilst if we had I have run several experiments where having the loss computed .asscalar() on every loop is slowing the training by 2-25%. see:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @szha to precise what I mean, I don't think using numpy is slower in itself, it is just that having a blocking operation in the loop is limiting the use of the parallelization happening in the backend and lead to GPU starvation.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However, completely using the non-blocking logic will cause some other problems. To be more specific, the allocated NDArrays cannot be reused and will finally cause an OOM. We should use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the precision @sxjscience , but why is it so? Once processed in the computation graph aren't they garbage collected?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's because the OPs are pushed in a much faster speed than the real computation. The graph will keep expanding and the allocation OPs will be executed to allocate new space (Even before the actual computation is performed). We have to call a blocking operator at some point to make sure that the current calculation in the graph has been completed. CC @piiswrong for this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks yes that's my understanding. However I think it should be left to the user to decide when to block, since it depends highly on their GPU and model size (like every 100 batches or every epoch). Also is there a reason why the accuracy is stored on the CPU rather than on specific context? My measures showed great improvements when storing the accuracy on GPU. Maybe if you don't mind we can continue the discussion there: #9571 |
||
| self.num_inst += numpy.prod(pred_label.shape) | ||
|
|
||
|
|
||
| @register | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64is better?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires larger space, which can show when the prediction class is large (such as in NLP applications). Should I make it an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be good in most cases.