Skip to content

Conversation

@Devian-ua
Copy link
Contributor

What changes were proposed in this pull request?

Suggest users to increase NodeManager's heap size if External Shuffle Service is enabled as
NM can spend a lot of time doing GC resulting in shuffle operations being a bottleneck due to Shuffle Read blocked time bumped up.
Also because of GC NodeManager can use an enormous amount of CPU and cluster performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with spark_shuffle service on.

How was this patch tested?

Added step 5:

shuffle_service

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented Nov 16, 2016

Thanks - merging in master/branch-2.1.

@asfgit asfgit closed this in 5558998 Nov 16, 2016
asfgit pushed a commit that referenced this pull request Nov 16, 2016
…Service

## What changes were proposed in this pull request?

Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as
`NM` can spend a lot of time doing GC resulting in  shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up.
Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on.

## How was this patch tested?

#### Added step 5:
![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png)

Author: Artur Sukhenko <artur.sukhenko@gmail.com>

Closes #15906 from Devian-ua/nmHeapSize.

(cherry picked from commit 5558998)
Signed-off-by: Reynold Xin <rxin@databricks.com>
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…Service

## What changes were proposed in this pull request?

Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as
`NM` can spend a lot of time doing GC resulting in  shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up.
Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on.

## How was this patch tested?

#### Added step 5:
![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png)

Author: Artur Sukhenko <artur.sukhenko@gmail.com>

Closes apache#15906 from Devian-ua/nmHeapSize.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants