Backend
VL (Velox)
Bug description
I am trying to set spark.gluten.memory.dynamic.offHeap.sizing.enabled=true", but OOM exception occurs.
spark configuration :
spark.executor.memory=4g;
spark.executor.memoryOverhead=1G;
spark.gluten.memory.dynamic.offHeap.sizing.enabled=true
spark.memory.offHeap.enabled=true
and web ui shows:
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=597059174
spark.gluten.memory.offHeap.size.in.bytes=2388236697
spark.gluten.memory.task.offHeap.size.in.bytes=597059174
spark.memory.offHeap.size=2388236697
And I got the OOM exception:
Reason: Operator::getOutput failed for [operator: ValueStream, plan node ID: 0]: Error during calling Java code from native code: org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 0.0 B. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled).
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=3.4 GiB
spark.gluten.memory.task.offHeap.size.in.bytes=876.6 MiB
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=876.6 MiB
spark.memory.offHeap.enabled=true
spark.gluten.memory.dynamic.offHeap.sizing.enabled=true
Memory consumer stats:
Task.52: Current used bytes: 104.0 MiB, peak bytes: N/A
\- Gluten.Tree.0: Current used bytes: 104.0 MiB, peak bytes: 112.0 MiB
\- root.0: Current used bytes: 104.0 MiB, peak bytes: 112.0 MiB
+- CelebornShuffleWriter.0: Current used bytes: 48.0 MiB, peak bytes: 48.0 MiB
| \- single: Current used bytes: 48.0 MiB, peak bytes: 48.0 MiB
| +- gluten::MemoryAllocator: Current used bytes: 28.8 MiB, peak bytes: 29.0 MiB
| \- root: Current used bytes: 4.2 MiB, peak bytes: 15.0 MiB
| \- default_leaf: Current used bytes: 4.2 MiB, peak bytes: 14.1 MiB
It may cause by:
24/10/18 17:16:00 WARN org.apache.gluten.memory.memtarget.DynamicOffHeapSizingMemoryTarget: "Failing allocation as unified memory is OOM. Used Off-heap: 406847480, Used On-Heap: 2021017784, Free On-heap: 1796847432, Total On-heap: 3817865216, Max On-heap: 2388236697, Allocation: 8388608."
24/10/18 17:16:00 INFO org.apache.spark.memory.TaskMemoryManager: "Memory used in task 11"
24/10/18 17:16:00 INFO org.apache.spark.memory.TaskMemoryManager: "Acquired by org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer@182a8cbe: 104.0 MiB"
24/10/18 17:16:00 INFO org.apache.spark.memory.TaskMemoryManager: "0 bytes of memory were used by task 11 but are not associated with specific consumers"
24/10/18 17:16:00 INFO org.apache.spark.memory.TaskMemoryManager: "406847480 bytes of memory are used for execution and 1129714 bytes of memory are used for storage"
As we can see above, the used off-heap memory is only 406847480(388MiB), while my off-heap configuration is 2.2GiB.
Why will throw OOM exception?
Spark version
Spark-3.2.x
Spark configurations
No response
System information
No response
Relevant logs
No response
Backend
VL (Velox)
Bug description
I am trying to set
spark.gluten.memory.dynamic.offHeap.sizing.enabled=true", but OOM exception occurs.spark configuration :
and web ui shows:
And I got the OOM exception:
It may cause by:
As we can see above, the used off-heap memory is only 406847480(388MiB), while my off-heap configuration is 2.2GiB.
Why will throw OOM exception?
Spark version
Spark-3.2.x
Spark configurations
No response
System information
No response
Relevant logs
No response