Skip to content

[VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager#9341

Merged
zhztheplayer merged 10 commits intoapache:mainfrom
zhztheplayer:wip-untracked
Apr 17, 2025
Merged

[VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager#9341
zhztheplayer merged 10 commits intoapache:mainfrom
zhztheplayer:wip-untracked

Conversation

@zhztheplayer
Copy link
Copy Markdown
Member

We noticed some users unexpectedly rely on dynamic off-heap sizing to emulate a case that all the memory allocations are not tracked by Spark for testing or PoC purpose. As the feature dynamic off-heap sizing is not reliable itself (with wrong free on-heap memory calculations), we are providing a new option in this patch, spark.gluten.memory.untracked which will completely make all native allocations untracked by Spark when being set to true.

Note the new option is only be used for similar testing or PoC purpose as well. The previous usages on spark.gluten.memory.dynamic.offHeap.sizing.enabled can be changed to this new option because we are fixing the existing issues on the dynamic off-heap sizing feature which may cause more OOMs reported when that feature is on.

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Apr 16, 2025
@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer changed the title [CORE] Provide an configuration option to completely turn off off-heap memory tracking with Spark memory manager [VL] Provide an configuration option to completely turn off off-heap memory tracking with Spark memory manager Apr 16, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer changed the title [VL] Provide an configuration option to completely turn off off-heap memory tracking with Spark memory manager [VL] Provide a configuration option to completely turn off off-heap memory tracking with Spark memory manager Apr 16, 2025
Copy link
Copy Markdown
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have one directional question: is this new memory untrack feature depends on the dynamic off-heap sizing feature?

buildConf("spark.gluten.memory.untracked")
.internal()
.doc(
"When enabled, turn all native memory allocations in Gluten into untracked. Spark " +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the description, this feature should be effect only with DYNAMIC_OFFHEAP_SIZING_ENABLED case? or do you intend to introduce this feature in case with static off-heap also?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't relate to the off-heap sizing feature, the idea use case is to allow user set

spark.memory.offHeap.enabled=false
spark.gluten.memory.untracked=true

to bypass allocation tracking from Spark memory manager.

}

if (
conf.getBoolean(COLUMNAR_MEMORY_UNTRACKED.key, COLUMNAR_MEMORY_UNTRACKED.defaultValue.get)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the logic, if DYNAMIC_OFFHEAP_SIZING_ENABLED=false, and COLUMNAR_MEMORY_UNTRACKED=true then it will also skip the check of off-heap settings, is this inteded?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as mentioned in the other comment, it's allowed for user to set

spark.memory.offHeap.enabled=false
spark.gluten.memory.untracked=true

at the same time.

Copy link
Copy Markdown
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Works for me
My review notes in case someone is also looking:

  • originally gluten has built two ways of memory management 1) static off-heap 2) dynamic off-heap sizing. In both case Spark will track the memory allocations and report OOM issue
  • this patch adds a new feature to "ignore" the memory allocation so Spark will not introduce "OOM" error(but the OS may kill the application due to its big memory usage)

@zhztheplayer zhztheplayer merged commit b99fff8 into apache:main Apr 17, 2025
50 checks passed
@zhztheplayer
Copy link
Copy Markdown
Member Author

  1. dynamic off-heap sizing.

There are still some messy code need to sort out for off-heap sizing. I'll have another PR for that.

After this series of work I hope we can either continue on or immediately remove the off-heap sizing feature in future based on our decision. Because the code is made more independent by the effort.

warrenzhu25 pushed a commit to warrenzhu25/gluten that referenced this pull request Jan 10, 2026
…emory tracking with Spark memory manager (apache#9341)

(cherry picked from commit b99fff8)
Change-Id: If3b9982d8391a97826bf12f3c1d4f8f4d37496c0
Reviewed-on: https://bigdataoss-internal-review.googlesource.com/c/third_party/apache/incubator-gluten/+/115778
Reviewed-by: Revanth Venkat Mikkilineni <revanthvenkat@google.com>
Reviewed-by: Preetesh Verma <preeteshverma@google.com>
Tested-by: Srinivas S T <srst@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants