Skip to content

[VL] Support Spark legacy statistical aggregation function behavior#9181

Merged
rui-mo merged 1 commit intoapache:mainfrom
NEUpanning:stat_legacy
Apr 10, 2025
Merged

[VL] Support Spark legacy statistical aggregation function behavior#9181
rui-mo merged 1 commit intoapache:mainfrom
NEUpanning:stat_legacy

Conversation

@NEUpanning
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

To align with Spark, facebookincubator/velox#12566 introduced spark.legacy_statistical_aggregate configuration, which controls whether NULL or NaN is returned when dividing by zero. This PR enables this config if spark.sql.legacy.statisticalAggregate is set to true.

How was this patch tested?

integration tests

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Mar 31, 2025
@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2025

Run Gluten Clickhouse CI on x86

@NEUpanning
Copy link
Copy Markdown
Contributor Author

@rui-mo Could you help to review this PR? Thanks.

(SQLConf.CASE_SENSITIVE.key, SQLConf.CASE_SENSITIVE.defaultValueString),
(SQLConf.IGNORE_MISSING_FILES.key, SQLConf.IGNORE_MISSING_FILES.defaultValueString),
(SQLConf.LEGACY_TIME_PARSER_POLICY.key, SQLConf.LEGACY_TIME_PARSER_POLICY.defaultValueString),
(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is same with native backend value std::to_string(veloxCfg_->get<bool>(kSparkLegacyStatisticalAggregate, false));. Add the key in L470 is enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the default value that aligns with Spark would be great. Maybe delete the default value in std::to_string(veloxCfg_->get<bool>(kSparkLegacyStatisticalAggregate, false));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user doesn't set the config, I think it's better we don't set it too.

Copy link
Copy Markdown
Contributor Author

@NEUpanning NEUpanning Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user doesn't set this config, the SQLConf.LEGACY_STATISTICAL_AGGREGATE.defaultValueString will be used rather than false, although it is also false now.

@@ -509,6 +510,9 @@ object GlutenConfig {
(SQLConf.CASE_SENSITIVE.key, SQLConf.CASE_SENSITIVE.defaultValueString),
(SQLConf.IGNORE_MISSING_FILES.key, SQLConf.IGNORE_MISSING_FILES.defaultValueString),
(SQLConf.LEGACY_TIME_PARSER_POLICY.key, SQLConf.LEGACY_TIME_PARSER_POLICY.defaultValueString),
Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as LEGACY_TIME_PARSER_POLICY.

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add test in Gluten to ensure this config could take effect. Thanks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2025

Run Gluten Clickhouse CI on x86

@NEUpanning NEUpanning requested a review from rui-mo April 8, 2025 03:05
@NEUpanning
Copy link
Copy Markdown
Contributor Author

@baibaichen could you help to show the log of the failed ClickHouse CI?

And the failed CI run-tpc-test-ubuntu-2204-celeborn seems unrelated to this PR :

25/04/07 09:45:30 ERROR CelebornShuffleReader: Exception caught when readPartition 72!
org.apache.celeborn.common.exception.CelebornIOException: createPartitionReader failed! PartitionLocation[
  id-epoch:72-0
  host-rpcPort-pushPort-fetchPort-replicatePort:172.18.0.2-41173-42161-33177-42025
  mode:PRIMARY
  peer:(empty)
  storage hint:StorageInfo{type=HDD, mountPoint='', finalResult=true, filePath=}
  mapIdBitMap:null]
	at org.apache.celeborn.client.read.CelebornInputStream$CelebornInputStreamImpl.createReaderWithRetry(CelebornInputStream.java:370)
	at org.apache.celeborn.client.read.CelebornInputStream$CelebornInputStreamImpl.moveToNextReader(CelebornInputStream.java:273)
	at org.apache.celeborn.client.read.CelebornInputStream$CelebornInputStreamImpl.<init>(CelebornInputStream.java:222)
	at org.apache.celeborn.client.read.CelebornInputStream.create(CelebornInputStream.java:72)
	at org.apache.celeborn.client.ShuffleClientImpl.readPartition(ShuffleClientImpl.java:1675)
	at org.apache.spark.shuffle.celeborn.CelebornShuffleReader$$anon$3.run(CelebornShuffleReader.scala:125)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Exception in sendRpcSync to: /172.18.0.2:33177
	at org.apache.celeborn.common.network.client.TransportClient.sendRpcSync(TransportClient.java:324)
	at org.apache.celeborn.client.read.WorkerPartitionReader.<init>(WorkerPartitionReader.java:129)
	at org.apache.celeborn.client.read.CelebornInputStream$CelebornInputStreamImpl.createReader(CelebornInputStream.java:444)
	at org.apache.celeborn.client.read.CelebornInputStream$CelebornInputStreamImpl.createReaderWithRetry(CelebornInputStream.java:341)
	... 10 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: org.apache.celeborn.common.exception.PartitionUnRetryAbleException: Could not find file 72-0-0 for local-1744017116595-72.
	at org.apache.celeborn.common.util.ExceptionUtils.wrapIOExceptionToUnRetryable(ExceptionUtils.java:41)
	at org.apache.celeborn.service.deploy.worker.FetchHandler.handleRpcException(FetchHandler.scala:350)
	at org.apache.celeborn.service.deploy.worker.FetchHandler.handleRpcIOException(FetchHandler.scala:342)
	at org.apache.celeborn.service.deploy.worker.FetchHandler.handleOpenStreamInternal(FetchHandler.scala:293)
	at org.apache.celeborn.service.deploy.worker.FetchHandler.handleRpcRequest(FetchHandler.scala:138)
	at org.apache.celeborn.service.deploy.worker.FetchHandler.receive(FetchHandler.scala:97)
	at org.apache.celeborn.common.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:96)
	at org.apache.celeborn.common.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:84)
	at org.apache.celeborn.common.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:156)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: Could not find file 72-0-0 for local-[17440](https://github.com/apache/incubator-gluten/actions/runs/14304948985/job/40086868424?pr=9181#step:7:17441)17116595-72.
	at org.apache.celeborn.service.deploy.worker.FetchHandler.getRawFileInfo(FetchHandler.scala:88)
	at org.apache.celeborn.service.deploy.worker.FetchHandler.handleOpenStreamInternal(FetchHandler.scala:214)
	... 29 more

	at org.apache.celeborn.common.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:390)
	at org.apache.celeborn.common.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:158)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at org.apache.celeborn.shaded.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at org.apache.celeborn.common.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:74)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at org.apache.celeborn.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at org.apache.celeborn.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at org.apache.celeborn.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at org.apache.celeborn.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at org.apache.celeborn.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at org.apache.celeborn.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at org.apache.celeborn.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at org.apache.celeborn.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at org.apache.celeborn.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at org.apache.celeborn.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at org.apache.celeborn.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the workflow passes.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2025

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@NEUpanning
Copy link
Copy Markdown
Contributor Author

The workflow has finally passed. Please help merge this pull request, @rui-mo.

@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Apr 10, 2025

Thanks!

@rui-mo rui-mo merged commit 7d11bb6 into apache:main Apr 10, 2025
49 checks passed
@NEUpanning NEUpanning deleted the stat_legacy branch April 10, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants