[SPARK-50421][CORE] Fix executor related memory config incorrect when multiple resource profiles worked#48963
[SPARK-50421][CORE] Fix executor related memory config incorrect when multiple resource profiles worked#48963zjuwangg wants to merge 8 commits intoapache:masterfrom
Conversation
|
@tgravescs Would you help to review this commit? |
d28c67f to
1e41213
Compare
| case (ResourceProfile.OFFHEAP_MEM, request) => | ||
| driverConf.set(MEMORY_OFFHEAP_SIZE.key, request.amount.toString + "m") | ||
| if (request.amount > 0) { | ||
| driverConf.set(MEMORY_OFFHEAP_ENABLED.key, "true") |
There was a problem hiding this comment.
this is questionable to me, we haven't supported setting confs yet so I would expect it to pick up the default config for this. I'd rather see a separate issue if we want to revisit this behavior.
There was a problem hiding this comment.
Thank you very much for this PR, we just encountered this problem.
Is the logic of resetting the offheap configuration better this way?
case (ResourceProfile.OFFHEAP_MEM, request) if request.amount > 0 =>
driverConf.set(MEMORY_OFFHEAP_SIZE.key, request.amount.toString + "m")
driverConf.set(MEMORY_OFFHEAP_ENABLED.key, "true")
There was a problem hiding this comment.
this is questionable to me, we haven't supported setting confs yet so I would expect it to pick up the default config for this. I'd rather see a separate issue if we want to revisit this behavior.
Make sense to me. I'll address it later, how about introducing a config to control whether to control offheap enabled or not?
There was a problem hiding this comment.
Thank you very much for this PR, we just encountered this problem. Is the logic of resetting the offheap configuration better this way?
case (ResourceProfile.OFFHEAP_MEM, request) if request.amount > 0 => driverConf.set(MEMORY_OFFHEAP_SIZE.key, request.amount.toString + "m") driverConf.set(MEMORY_OFFHEAP_ENABLED.key, "true")
@xumanbu
I think this is incorrect. Imagine such a scenario where default resource profile's offheap is 512M, and resource profile with id 2 's offheap is 0, in such case the offheap config will be incorrect!
|
@tgravescs I just updated the commit as the comments. Please help review when you have free time. |
|
what all testing have you done with this? It would be nice to have a integration test with it but I know this would be difficult. |
I have test this in our inner spark version and UnifiedMemoryManager worked as expected. I just tried to add a integration test but found it's hard since there no easy way to mock driver and executor to verify the code. |
Thanks! Would you help merge it? |
There was a problem hiding this comment.
+1, LGTM. Thank you, @zjuwangg , @tgravescs , @xumanbu .
Merged to master.
cc @LuciferYang as a release manager of Apache Spark 3.5.4
|
Due to the Structured Log change, this patch is not applicable to Please make a backporting PR to branch-3.5, @zjuwangg . |
|
BTW, I added you to Apache Spark Contributor group in JIRA and assigned SPARK-50421 to you, @zjuwangg . Welcome to the Apache Spark community and thank you again. |
I'll make a backporting PR to branch-3.5 soon. |
… when multiple resource profiles worked ### What changes were proposed in this pull request? Reset the executor's env memory related config when resource profile is not as the default resource profile! ### Why are the changes needed? When multiple resource profile exists in the same spark application, now the executor's memory related config is not override by resource profile's memory size, which will cause maxOffHeap in `UnifiedMemoryManager` is not correct. See https://issues.apache.org/jira/browse/SPARK-50421 for more details ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests in our inner spark version and jobs. ### Was this patch authored or co-authored using generative AI tooling? No This is a backporting from #48963 to branch 3.5 Closes #49090 from zjuwangg/m35_fixConfig. Authored-by: Terry Wang <zjuwangg@foxmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
| log"${MDC(LogKeys.EXECUTOR_MEMORY_OVERHEAD_SIZE, request)}") | ||
| case (ResourceProfile.CORES, request) => | ||
| driverConf.set(EXECUTOR_CORES.key, request.amount.toString) | ||
| logInfo(log"Set executor cores to ${MDC(LogKeys.NUM_EXECUTOR_CORES, request)}") |
There was a problem hiding this comment.
Dont we need to do this for PYSPARK_MEM as well ? (from ResourceProfile.allSupportedExecutorResources)
There was a problem hiding this comment.
PYSPARK_MEM seems no actual use during task execute like EXECUTOR_MEMORY_OVERHEAD, but I think it's nice to have it here to keep consistent.
What changes were proposed in this pull request?
Reset the executor's env memory related config when resource profile is not as the default resource profile!
Why are the changes needed?
When multiple resource profile exists in the same spark application, now the executor's memory related config is not override by resource profile's memory size, which will cause maxOffHeap in
UnifiedMemoryManageris not correct.See https://issues.apache.org/jira/browse/SPARK-50421 for more details
Does this PR introduce any user-facing change?
No
How was this patch tested?
Tests in our inner spark version and jobs.
Was this patch authored or co-authored using generative AI tooling?
No