[VL] Change isDistinct of distinct aggregateExpression to false#9364
[VL] Change isDistinct of distinct aggregateExpression to false#9364zml1206 wants to merge 2 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Perhaps this change can speed up distinct aggregation processing? |
I am curious why Spark keeps the distinct flag after planing the distinct aggregation. Generally it's not needed by vanilla Spark as well. Have you investigated? |
Maybe Spark just didn't process it anymore, isDistinct is only used to create aggregate. |
Yes, it should also avoid the memory problem caused by velox distinct hash aggregation does not support spill. |
I am checking the code. Do we actually pass the flag to the substrait plan? I didn't find the relevant code. Would you help highlight the code here? Or else we never passed this flag to native? cc @rui-mo |
Oh, I overlooked it. Yes, we did not pass |
|
Got it. Thank you for the attempt anyway. |
What changes were proposed in this pull request?
Vanilla spark will add an aggregate to remove duplicates for the distinct aggregation function, so velox does not need to process distinct. Moreover, currently velox distinct hash aggregation does not support spill.
How was this patch tested?
Already existing UT.