Skip to content

[SPARK-48788][CORE][UI] Expose task peak onheap/offheap execution memory to API and Spark UI#47872

Closed
liuzqt wants to merge 3 commits intoapache:masterfrom
liuzqt:SPARK-48788
Closed

[SPARK-48788][CORE][UI] Expose task peak onheap/offheap execution memory to API and Spark UI#47872
liuzqt wants to merge 3 commits intoapache:masterfrom
liuzqt:SPARK-48788

Conversation

@liuzqt
Copy link
Copy Markdown
Contributor

@liuzqt liuzqt commented Aug 26, 2024

What changes were proposed in this pull request?

#47776 has introduced task peak on/off heap execution memory metrics, this PR exposes these two metrics to APIs, also shown in Spark UI Stage page, specifically, in 3 relevant sections(see screen shot below):

  • stage details
  • task metrics summary (quantiles)
  • task metrics

Why are the changes needed?

Does this PR introduce any user-facing change?

Yes. Expose metrics to APIs, also show in Spark UI Stage page.

How was this patch tested?

Existing UTs.

Manually verify through Spark UI:
- html page
Spark shell - Details for Stage 4 (Attempt 0).mhtml.zip
- Stage details
Screenshot 2024-08-29 at 10 45 20 AM
- Task metrics summary
Screenshot 2024-08-29 at 10 44 37 AM
- Task metrics
Screenshot 2024-08-29 at 10 45 06 AM

Was this patch authored or co-authored using generative AI tooling?

NO

@liuzqt liuzqt changed the title [SPARK-48788][WIP] Expose task peak onheap/offheap execution memory to API [SPARK-48788][CORE][WIP] Expose task peak onheap/offheap execution memory to API Aug 26, 2024
@liuzqt
Copy link
Copy Markdown
Contributor Author

liuzqt commented Aug 29, 2024

Hi @mridulm @dongjoon-hyun I'm working on this follow-up item for #47776, mostly APIs change and Spark UI change, I've manually verified it in Spark UI, could you pls help review this when you have time?

And BTW do you know how to run HistoryServerSuite.main? seems like I need to run that to re-generate the ground truth.

@liuzqt liuzqt changed the title [SPARK-48788][CORE][WIP] Expose task peak onheap/offheap execution memory to API [SPARK-48788][CORE][WIP] Expose task peak onheap/offheap execution memory to API and Spark UI Aug 29, 2024
@liuzqt liuzqt changed the title [SPARK-48788][CORE][WIP] Expose task peak onheap/offheap execution memory to API and Spark UI [SPARK-48788][CORE][UI] Expose task peak onheap/offheap execution memory to API and Spark UI Aug 29, 2024
Copy link
Copy Markdown
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look, but I am not very familiar with this side of spark.

+CC @yaooqinn, @yanboliang as well for review.

int64 shuffle_remote_reqs_duration = 50;
int64 shuffle_merged_remote_req_duration = 51;
int64 peak_on_heap_execution_memory = 52;
int64 peak_off_heap_execution_memory = 53;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to ExecutorStageSummary, ExecutorSummary, ExecutorMetricsDistributions as well.
(Here and other model classes)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we already have executor level stage memory metrics:

  • ExecutorStageSummary has val peakMemoryMetrics: Option[ExecutorMetrics]
  • ExecutorSummary has val memoryMetrics: Option[MemoryMetrics]
  • ExecutorMetricsDistributions has val peakMemoryMetrics: ExecutorPeakMetricsDistributions

which are aggregated through AppStatusListener.updateStageLevelPeakExecutorMetrics

@mridulm
Copy link
Copy Markdown
Contributor

mridulm commented Sep 4, 2024

Also +CC @gengliangwang

@liuzqt liuzqt requested a review from mridulm September 9, 2024 03:54
@liuzqt
Copy link
Copy Markdown
Contributor Author

liuzqt commented Sep 17, 2024

Hi @mridulm I've fixed broken tests, also answered you questions, do you mind taking another look when you have time? Thanks

@github-actions
Copy link
Copy Markdown

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions Bot added the Stale label Dec 27, 2024
@github-actions github-actions Bot closed this Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants