[VL] Change isDistinct of distinct aggregateExpression to false by zml1206 · Pull Request #9364 · apache/gluten

zml1206 · 2025-04-18T09:13:34Z

What changes were proposed in this pull request?

Vanilla spark will add an aggregate to remove duplicates for the distinct aggregation function, so velox does not need to process distinct. Moreover, currently velox distinct hash aggregation does not support spill.

How was this patch tested?

Already existing UT.

github-actions · 2025-04-18T09:13:51Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2025-04-18T09:14:14Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-04-18T09:56:14Z

Run Gluten Clickhouse CI on x86

zhztheplayer · 2025-04-18T11:08:13Z

Perhaps this change can speed up distinct aggregation processing?

zhztheplayer · 2025-04-18T11:11:49Z

Vanilla spark will add an aggregate to remove duplicates for the distinct aggregation function, so velox does not need to process distinct.

I am curious why Spark keeps the distinct flag after planing the distinct aggregation. Generally it's not needed by vanilla Spark as well. Have you investigated?

zml1206 · 2025-04-18T11:20:12Z

I am curious why Spark keeps the distinct flag after planing the distinct aggregation. Generally it's not needed by vanilla Spark as well. Have you investigated?

Maybe Spark just didn't process it anymore, isDistinct is only used to create aggregate.

zml1206 · 2025-04-18T11:21:49Z

Perhaps this change can speed up distinct aggregation processing?

Yes, it should also avoid the memory problem caused by velox distinct hash aggregation does not support spill.

zhztheplayer · 2025-04-22T09:17:53Z

Perhaps this change can speed up distinct aggregation processing?

Yes, it should also avoid the memory problem caused by velox distinct hash aggregation does not support spill.

I am checking the code. Do we actually pass the flag to the substrait plan? I didn't find the relevant code. Would you help highlight the code here? Or else we never passed this flag to native? cc @rui-mo

zml1206 · 2025-04-22T09:33:50Z

I am checking the code. Do we actually pass the flag to the substrait plan? I didn't find the relevant code. Would you help highlight the code here? Or else we never passed this flag to native? cc @rui-mo

Oh, I overlooked it. Yes, we did not pass isDistinct to the substrait plan. This change is unnecessary. Thank you. @zhztheplayer

zhztheplayer · 2025-04-22T09:40:06Z

Got it. Thank you for the attempt anyway.

[VL] Change isDistinct of distinct aggregateExpression to false

f009972

zml1206 marked this pull request as draft April 18, 2025 09:13

github-actions bot added the CORE works for Gluten Core label Apr 18, 2025

update golden files

d2b1607

github-actions bot added the VELOX label Apr 18, 2025

zml1206 marked this pull request as ready for review April 18, 2025 11:17

zml1206 requested a review from zhztheplayer April 22, 2025 05:53

zml1206 closed this Apr 22, 2025

zhztheplayer mentioned this pull request Apr 22, 2025

[VL] Follow up on #9384 to avoid swallowing exceptions in UT #9393

Merged

zml1206 deleted the agg_distinct branch December 9, 2025 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Change isDistinct of distinct aggregateExpression to false#9364

[VL] Change isDistinct of distinct aggregateExpression to false#9364
zml1206 wants to merge 2 commits intoapache:mainfrom
zml1206:agg_distinct

zml1206 commented Apr 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

zhztheplayer commented Apr 18, 2025

Uh oh!

zhztheplayer commented Apr 18, 2025

Uh oh!

zml1206 commented Apr 18, 2025

Uh oh!

zml1206 commented Apr 18, 2025

Uh oh!

zhztheplayer commented Apr 22, 2025

Uh oh!

zml1206 commented Apr 22, 2025

Uh oh!

zhztheplayer commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zml1206 commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

zhztheplayer commented Apr 18, 2025

Uh oh!

zhztheplayer commented Apr 18, 2025

Uh oh!

zml1206 commented Apr 18, 2025

Uh oh!

zml1206 commented Apr 18, 2025

Uh oh!

zhztheplayer commented Apr 22, 2025

Uh oh!

zml1206 commented Apr 22, 2025

Uh oh!

zhztheplayer commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zml1206 commented Apr 18, 2025 •

edited

Loading