Optimization of metric evaluation by ptrendx · Pull Request #13471 · apache/mxnet

ptrendx · 2018-11-30T00:09:29Z

Description

Currently metrics are mostly evaluated on CPU using NumPy. Due to Python GIL, they are evaluated in single thread, sequentially, which may become a problem once number of used GPUs is large enough.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented:
For user-facing API changes, API doc string has been updated.
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Changed TopKAccuracy metric implementation from using numpy.argsort to numpy.argpartition, which from my experiments on NDArrays of shape (208,1000) and top_k=5 is ~4x faster
Added global statistics to metrics. We noticed the problem with metrics speed after PR [MXNET-698] Correct train-metric log to reflect epoch metric #12182 which introduced computing the metrics twice (once for giving the immediate values for accuracy and once for computing accuracy over the entire epoch). This is wasteful, since the calculations needed are exactly the same in this case. This PR introduces additional fields in EvalMetric class (global_sum_metric and global_num_inst), accessors for them (get_global and get_global_name_values) and function reset_local which performs reset only on non-global versions of statistics. It also modifies all current metrics to be able to use the global statistics. That way, while the code is backward compatible (one can still just use get/get_name_values/reset functions), it also eliminates overhead introduced by calculating statistics twice in fit function.

CC @vandanavk for comments.

vandanavk · 2018-11-30T00:31:38Z

@mxnet-label-bot add [Metric, pr-awaiting-review]

Thanks @ptrendx, I'll have a look

vandanavk

Overall LGTM. Just a few comments.
Also, could you test the following:

image classification example (train_mnist.py, train_imagenet.py)
Speedometer's auto_reset=True and auto_reset=False
tools/parse_log.py

vandanavk · 2018-11-30T18:28:53Z

+        if "has_global_stats" in kwargs:
+            self._has_global_stats = kwargs["has_global_stats"]
+        else:
+            self._has_global_stats = False


self._has_global_stats = kwargs.get("has_global_stats", False) ?

done (with pop to not keep the has_global_stats key in kwargs, to not screw with deserialization later.)

vandanavk · 2018-11-30T18:41:14Z

        else:
-            self.sum_metric = self.metrics.fscore * self.metrics.total_examples
+            self.sum_metric = fscore * self.metrics.total_examples
+            self.global_sum_metric = fscore * self.metrics.total_examples


self.sum_metric = self.global_sum_metric = ?

vandanavk · 2018-11-30T18:41:27Z

+            self.sum_metric = fscore * self.metrics.total_examples
+            self.global_sum_metric = fscore * self.metrics.total_examples
            self.num_inst = self.metrics.total_examples
+            self.global_num_inst = self.metrics.total_examples


similarly here

vandanavk · 2018-11-30T18:42:37Z

+            self.sum_metric = matthewscc * self._metrics.total_examples
+            self.global_sum_metric = matthewscc * self._metrics.total_examples
            self.num_inst = self._metrics.total_examples
+            self.global_num_inst = self._metrics.total_examples


similarly here

ptrendx · 2018-11-30T19:33:34Z

I tested train_imagenet.py, tools/parse_log.py and both auto_reset values for Speedometer.

eric-haibin-lin · 2018-12-02T05:45:51Z

@ptrendx thanks for the PR. Do you mind elaborating a bit more how this PR avoids GIL/speeds up metric evaluation?

ptrendx · 2018-12-02T07:16:37Z

It does not avoid GIL, I just do less work in Python -

topk evaluation using numpy argpartition vs argsort is faster (and you don't need to do full sort to get top k elements)
currently metrics are calculated twice during training to get both local and per-epoch results - this PR makes it so they are evaluated once, and the result is used for both per-batch and per epochs statistics.

eric-haibin-lin

Thanks for the fix and explanation. LGTM

lupesko · 2018-12-05T06:06:47Z

Thanks for the contribution @ptrendx !
Adding @sandeep-krishnamurthy to help review and merge.

sandeep-krishnamurthy

Thanks. Few comments in line

sandeep-krishnamurthy · 2018-12-07T00:51:00Z

+            else:
+                return (self.name, self.global_sum_metric / self.global_num_inst)
+        else:
+            return self.get()


If user calls specifically global statistics and if it is not available, shouldn't we throw exception than silently return local? Same in other places.

I'm not sure if that is possible - doing this would break fully custom metrics (other than subclasses of CustomMetric class, for which I added support) that did not implement global stats.

sandeep-krishnamurthy · 2018-12-07T00:53:15Z

+            for metric in self.metrics:
+                metric.reset_local()
+        except AttributeError:
+            pass


why is this required? When will reach here? Can we please document?

This is not added in this PR and I'm not sure myself why this is needed here (this function is basically a copy of the reset function but calls reset_local on children instead of reset).

sandeep-krishnamurthy · 2018-12-07T00:57:24Z

-            self.sum_metric += (pred_label == label).sum()
+            num_correct = (pred_label == label).sum()
+            self.sum_metric += num_correct
+            self.global_sum_metric += num_correct


I am sorry for trivial question, but, when will global metrics different than local metrics with this logic?

They will be different when you call reset_local (which is called from Speedometer for example) - this will reset the local versions of metrics while keeping the global versions intact (which was the point of this PR - to enable computing both per-batch and per-epoch statistics using only single computation).

That said, this comment made me think about this again and I found a bug in how I handle global statistics in F1 and MCC metrics - fix and tests incoming. Thank you!

sandeep-krishnamurthy · 2018-12-07T00:58:21Z

        for label, pred_label in zip(labels, preds):
            assert(len(pred_label.shape) <= 2), 'Predictions should be no more than 2 dims'
-            pred_label = numpy.argsort(pred_label.asnumpy().astype('float32'), axis=1)
+            pred_label = numpy.argpartition(pred_label.asnumpy().astype('float32'), -self.top_k)


This is very nice. I think it warrants a comment on why argpartition is used here (for its performance benefit)

Sure, will do.

sandeep-krishnamurthy · 2018-12-07T01:01:35Z

+        Tuple of (str, float)
+            Representing name of the metric and evaluation result.
+        """
+        num = self.global_num_inst if self.global_num_inst > 0 else float('nan')


return float('nan')?

Oh, this change got here by accident - I will revert it. Good catch, thank you!

When I added tests, it turned out that changing this is actually necessary (I did change it slightly differently though, to match the get function from a base class) to avoid floating point division by 0 exception.

added test for global stats

ptrendx · 2018-12-12T19:45:58Z

I made fixes and added test for all metrics. @sandeep-krishnamurthy please review again.

sandeep-krishnamurthy

Thanks. LGTM.

ptrendx added 2 commits November 29, 2018 15:48

Change argsort to argpartition

6d4bd43

Global statistics in metrics

49050aa

ptrendx requested a review from szha as a code owner November 30, 2018 00:09

Fix lint

da6ec43

marcoabreu added Metric pr-awaiting-review PR is waiting for code review labels Nov 30, 2018

vandanavk reviewed Nov 30, 2018

View reviewed changes

Fixes from review

70455e6

Trigger

d7e14cc

eric-haibin-lin approved these changes Dec 3, 2018

View reviewed changes

Merge branch 'upstream' into pr_topk

d4410c6

sandeep-krishnamurthy reviewed Dec 7, 2018

View reviewed changes

Fixes from review, fix to F1, MCC and perplexity metrics,

3fb50c7

added test for global stats

ptrendx added 2 commits December 12, 2018 11:49

Fix lint

18807be

Fix compatibility with Python 2

2ad4b1e

sandeep-krishnamurthy approved these changes Dec 13, 2018

View reviewed changes

sandeep-krishnamurthy merged commit 98750fe into apache:master Dec 13, 2018

wuxun-zhang mentioned this pull request Mar 24, 2019

TypeError: 'float' object is not subscriptable` #14468

Open

Conversation

ptrendx commented Nov 30, 2018

Description

Checklist

Essentials

Changes

Uh oh!

vandanavk commented Nov 30, 2018

Uh oh!

vandanavk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Nov 30, 2018

Uh oh!

eric-haibin-lin commented Dec 2, 2018

Uh oh!

ptrendx commented Dec 2, 2018

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

lupesko commented Dec 5, 2018

Uh oh!

sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Dec 12, 2018

Uh oh!

sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants