[SPARK-14628][WIP] Simplify task metrics by always tracking read/write metrics #12388

rxin · 2016-04-14T07:52:24Z

What changes were proposed in this pull request?

Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark.

This patch also changes a few behaviors.

Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc).
Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225)

How was this patch tested?

Existing unit tests.

SparkQA · 2016-04-14T09:34:53Z

Test build #55806 has finished for PR 12388 at commit f4e72ae.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-14T22:46:07Z

Test build #55862 has finished for PR 12388 at commit 6e5fbef.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T00:22:06Z

Test build #55865 has finished for PR 12388 at commit 09ee852.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T00:50:54Z

Test build #55866 has finished for PR 12388 at commit a772493.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T01:24:15Z

Test build #55884 has finished for PR 12388 at commit a6a3604.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-15T02:40:44Z

Test build #55880 has finished for PR 12388 at commit ded2f21.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-04-15T08:06:37Z

core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala

-      (taskMetrics.shuffleReadMetrics.map(_.totalBytesRead).getOrElse(0L)
-        - oldMetrics.flatMap(_.shuffleReadMetrics).map(_.totalBytesRead).getOrElse(0L))
+      taskMetrics.shuffleReadMetrics.totalBytesRead
+        - oldMetrics.map(_.shuffleReadMetrics.totalBytesRead).getOrElse(0L)


haha, finally understand why this is wrong, it's because

val i = a - b

equals

val i = a -b

and previously it's

val i = (a -b)

damn scala!

oh come on... no one likes mandatory semi-colons right? 😜

Does this case not even generate a warning? I guess it doesn't know your .map is a pure expression.

no warning :(

…te metrics ## What changes were proposed in this pull request? Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark. This patch also changes a few behaviors. 1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc). 2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225) ## How was this patch tested? existing tests. This is bases on #12388, with more test fixes. Author: Reynold Xin <rxin@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #12417 from cloud-fan/metrics-refactor.

…te metrics ## What changes were proposed in this pull request? Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark. This patch also changes a few behaviors. 1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc). 2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225) ## How was this patch tested? existing tests. This is bases on apache#12388, with more test fixes. Author: Reynold Xin <rxin@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes apache#12417 from cloud-fan/metrics-refactor.

rxin force-pushed the metrics-refactor branch from 5ed6eaa to f4e72ae Compare April 14, 2016 07:56

rxin changed the title ~~[WIP] Always track ShuffleReadMetrics (i.e. not an Option)~~ [SPARK-14628][WIP] Always track ShuffleReadMetrics (i.e. not an Option) Apr 14, 2016

rxin added 2 commits April 14, 2016 10:57

Always track ShuffleReadMetrics (i.e. not an option)

f7f011c

first round commits

6e5fbef

rxin force-pushed the metrics-refactor branch from f4e72ae to 6e5fbef Compare April 14, 2016 22:35

rxin changed the title ~~[SPARK-14628][WIP] Always track ShuffleReadMetrics (i.e. not an Option)~~ [SPARK-14628][WIP] Always track read/write task metrics (not Options) Apr 14, 2016

rxin changed the title ~~[SPARK-14628][WIP] Always track read/write task metrics (not Options)~~ [SPARK-14628][WIP] Simplify task metrics by always tracking read/write metrics Apr 14, 2016

rxin added 3 commits April 14, 2016 15:47

mima and remove more options

54138f9

remove more options in StatsReportListener

09ee852

fix comment

a772493

rxin added 3 commits April 14, 2016 17:54

fix one test problem

8d62994

fix more test cases

ded2f21

delete a failing test case since it is invalid now

a6a3604

cloud-fan reviewed Apr 15, 2016
View reviewed changes

cloud-fan mentioned this pull request Apr 15, 2016

[SPARK-14628][CORE] Simplify task metrics by always tracking read/write metrics #12417

Closed

rxin closed this Apr 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14628][WIP] Simplify task metrics by always tracking read/write metrics #12388

[SPARK-14628][WIP] Simplify task metrics by always tracking read/write metrics #12388

Uh oh!

rxin commented Apr 14, 2016 •

edited

Loading

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

cloud-fan Apr 15, 2016

Uh oh!

cloud-fan Apr 15, 2016 •

edited

Loading

Uh oh!

rxin Apr 15, 2016

Uh oh!

marmbrus Apr 15, 2016

Uh oh!

rxin Apr 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-14628][WIP] Simplify task metrics by always tracking read/write metrics #12388

[SPARK-14628][WIP] Simplify task metrics by always tracking read/write metrics #12388

Uh oh!

Conversation

rxin commented Apr 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 14, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

SparkQA commented Apr 15, 2016

Uh oh!

cloud-fan Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

marmbrus Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

rxin Apr 15, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rxin commented Apr 14, 2016 •

edited

Loading

cloud-fan Apr 15, 2016 •

edited

Loading