-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14628][CORE] Simplify task metrics by always tracking read/write metrics #12417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @rxin |
| fetchWaitTime = internal.fetchWaitTime, | ||
| remoteBytesRead = internal.remoteBytesRead, | ||
| totalBlocksFetched = internal.totalBlocksFetched, | ||
| recordsRead = internal.recordsRead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the internal and external shuffle read metrics don't match, the localBytesRead is missing.
|
Test build #55926 has finished for PR 12417 at commit
|
|
Test build #55931 has finished for PR 12417 at commit
|
| } else { | ||
| JNothing | ||
| } | ||
| val shuffleWriteMetrics: JValue = if (taskMetrics.shuffleWriteMetrics.isUpdated) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we always output the metrics, and just fix the json protocol test?
|
Test build #2791 has finished for PR 12417 at commit
|
|
Test build #2793 has finished for PR 12417 at commit
|
|
Going to merge this first. We can fix the test thing later. |
## What changes were proposed in this pull request? This PR is a follow up for #12417, now we always track input/output/shuffle metrics in spark JSON protocol and status API. Most of the line changes are because of re-generating the gold answer for `HistoryServerSuite`, and we add a lot of 0 values for read/write metrics. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #12462 from cloud-fan/follow.
…te metrics ## What changes were proposed in this pull request? Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark. This patch also changes a few behaviors. 1. Removed the distinction of data read/write methods (e.g. Hadoop, Memory, Network, etc). 2. Accumulate all data reads and writes, rather than only the first method. (Fixes SPARK-5225) ## How was this patch tested? existing tests. This is bases on apache#12388, with more test fixes. Author: Reynold Xin <rxin@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes apache#12417 from cloud-fan/metrics-refactor.
## What changes were proposed in this pull request? This PR is a follow up for apache#12417, now we always track input/output/shuffle metrics in spark JSON protocol and status API. Most of the line changes are because of re-generating the gold answer for `HistoryServerSuite`, and we add a lot of 0 values for read/write metrics. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#12462 from cloud-fan/follow.
What changes were proposed in this pull request?
Part of the reason why TaskMetrics and its callers are complicated are due to the optional metrics we collect, including input, output, shuffle read, and shuffle write. I think we can always track them and just assign 0 as the initial values. It is usually very obvious whether a task is supposed to read any data or not. By always tracking them, we can remove a lot of map, foreach, flatMap, getOrElse(0L) calls throughout Spark.
This patch also changes a few behaviors.
How was this patch tested?
existing tests.
This is bases on #12388, with more test fixes.