Skip to content

[VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type#8986

Merged
zhztheplayer merged 5 commits intoapache:mainfrom
kecookier:fix-double-sum
Apr 10, 2025
Merged

[VL] Disable FlushableHashAggreagte when aggregates contains sum/avg for floating type#8986
zhztheplayer merged 5 commits intoapache:mainfrom
kecookier:fix-double-sum

Conversation

@kecookier
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

(Fixes: #8985)

  1. Disable FlushableHashAggregate when aggregates contain sum/avg for floating types.
  2. To control PartialAgg flush easily in unit tests, add a Velox configuration s.g.s.c.b.v.maxPartialAggregationMemory to set PartialAgg memory, which has higher priority than s.g.s.c.b.v.maxPartialAggregationMemoryRatio.

How was this patch tested?

unit tests

@github-actions github-actions bot added the VELOX label Mar 13, 2025
@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@kecookier kecookier requested a review from zhztheplayer March 13, 2025 12:33
@kecookier
Copy link
Copy Markdown
Contributor Author

@zhztheplayer Can you help review this PR?

Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users who don't need 100% alignment with Spark may still tend to turn on flushing in this case to speed up queries.

I would suggest having an individual config option like s.g.floatingPointMode=strict/loose to control the tolerance of this kind of diffs in Gluten. While the mode is set to strict, we disable flushing for sum(float), etc.

"partial aggregation is enabled. Ignored when spark.gluten.sql.columnar.backend." +
"velox.flushablePartialAggregation=false."
)
.bytesConf(ByteUnit.BYTE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be

.byteConf(...)
.createOptional

@kecookier
Copy link
Copy Markdown
Contributor Author

@zhztheplayer I've updated the code as suggested. Please take a look.

@zhztheplayer
Copy link
Copy Markdown
Member

Hi @kecookier sorry for the late response. Missed the notification.

Given the purpose is to disable flushing in some cases, do we have to add new option maxPartialAggregationMemory? Any background of that?

@kecookier
Copy link
Copy Markdown
Contributor Author

Given the purpose is to disable flushing in some cases, do we have to add new option maxPartialAggregationMemory? Any background of that?

@zhztheplayer For easier control of flushable memory during unit tests.

@zhztheplayer zhztheplayer merged commit f667e81 into apache:main Apr 10, 2025
50 checks passed
warrenzhu25 pushed a commit to warrenzhu25/gluten that referenced this pull request Jan 10, 2026
…or floating type (apache#8986)

(cherry picked from commit f667e81)
Change-Id: I74a595766972f8b561c98ae45632788a2bdd705f
Reviewed-on: https://bigdataoss-internal-review.googlesource.com/c/third_party/apache/incubator-gluten/+/115777
Reviewed-by: Preetesh Verma <preeteshverma@google.com>
Reviewed-by: Revanth Venkat Mikkilineni <revanthvenkat@google.com>
Tested-by: Srinivas S T <srst@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] The result of sum(double) sometimes mismatches with Vanilla Spark

2 participants