Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Apr 14, 2016

What changes were proposed in this pull request?

Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark.

This patch also changes a few behaviors.

  1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc).
  2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225)

How was this patch tested?

Existing unit tests.

@rxin rxin force-pushed the metrics-refactor branch from 5ed6eaa to f4e72ae Compare April 14, 2016 07:56
@rxin rxin changed the title [WIP] Always track ShuffleReadMetrics (i.e. not an Option) [SPARK-14628][WIP] Always track ShuffleReadMetrics (i.e. not an Option) Apr 14, 2016
@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55806 has finished for PR 12388 at commit f4e72ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin rxin force-pushed the metrics-refactor branch from f4e72ae to 6e5fbef Compare April 14, 2016 22:35
@rxin rxin changed the title [SPARK-14628][WIP] Always track ShuffleReadMetrics (i.e. not an Option) [SPARK-14628][WIP] Always track read/write task metrics (not Options) Apr 14, 2016
@rxin rxin changed the title [SPARK-14628][WIP] Always track read/write task metrics (not Options) [SPARK-14628][WIP] Simplify task metrics by always tracking read/write metrics Apr 14, 2016
@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55862 has finished for PR 12388 at commit 6e5fbef.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55865 has finished for PR 12388 at commit 09ee852.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55866 has finished for PR 12388 at commit a772493.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55884 has finished for PR 12388 at commit a6a3604.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55880 has finished for PR 12388 at commit ded2f21.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

(taskMetrics.shuffleReadMetrics.map(_.totalBytesRead).getOrElse(0L)
- oldMetrics.flatMap(_.shuffleReadMetrics).map(_.totalBytesRead).getOrElse(0L))
taskMetrics.shuffleReadMetrics.totalBytesRead
- oldMetrics.map(_.shuffleReadMetrics.totalBytesRead).getOrElse(0L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, finally understand why this is wrong, it's because

val i =
  a
    - b

equals

val i = a
-b

Copy link
Contributor

@cloud-fan cloud-fan Apr 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and previously it's

val i =
  (a
    -b)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn scala!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh come on... no one likes mandatory semi-colons right? 😜

Does this case not even generate a warning? I guess it doesn't know your .map is a pure expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no warning :(

asfgit pushed a commit that referenced this pull request Apr 15, 2016
…te metrics

## What changes were proposed in this pull request?

Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark.

This patch also changes a few behaviors.

1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc).
2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225)

## How was this patch tested?

existing tests.

This is bases on #12388, with more test fixes.

Author: Reynold Xin <rxin@databricks.com>
Author: Wenchen Fan <wenchen@databricks.com>

Closes #12417 from cloud-fan/metrics-refactor.
@rxin rxin closed this Apr 16, 2016
lw-lin pushed a commit to lw-lin/spark that referenced this pull request Apr 20, 2016
…te metrics

## What changes were proposed in this pull request?

Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark.

This patch also changes a few behaviors.

1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc).
2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225)

## How was this patch tested?

existing tests.

This is bases on apache#12388, with more test fixes.

Author: Reynold Xin <rxin@databricks.com>
Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#12417 from cloud-fan/metrics-refactor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants