Skip to content

feat: comet native scan improvements - Dynamic Partition Pruning #3546

Open
Shekharrajak wants to merge 14 commits intoapache:mainfrom
Shekharrajak:feature/comet-native-scan-improvements
Open

feat: comet native scan improvements - Dynamic Partition Pruning #3546
Shekharrajak wants to merge 14 commits intoapache:mainfrom
Shekharrajak:feature/comet-native-scan-improvements

Conversation

@Shekharrajak
Copy link
Copy Markdown
Contributor

@Shekharrajak Shekharrajak commented Feb 18, 2026

Which issue does this PR close?

Ref #3510

Rationale for this change

CometNativeScanExec currently falls back to Spark when Dynamic Partition Pruning (DPP) is present. This limits performance for star-schema queries that rely on DPP to prune partitions at
runtime based on dimension table filters.

What changes are included in this PR?

Added DPP support to CometNativeScanExec for V1 native scans
Implemented partition filter evaluation from DPP subqueries

How are these changes tested?

Added DPP benchmark comparing Spark vs Comet native scan performance
Unit tests

$ make benchmark-org.apache.spark.sql.benchmark.CometDPPBenchmark

  OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.2                                                                                                                                       
  Apple M4 Max
  Star-Schema DPP Query (5000000 rows, 50 partitions):  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative                                                       
  -----------------------------------------------------------------------------------------------------------------------------------
  Spark (JVM) with DPP                                           163            173           9         30.7          32.6       1.0X
  Spark (JVM) without DPP                                        154            164           7         32.4          30.9       1.1X
  Comet (Native) with DPP                                        103            107           3         48.7          20.5       1.6X
  Comet (Native) without DPP                                      98            101           2         50.9          19.6       1.7X

@Shekharrajak Shekharrajak force-pushed the feature/comet-native-scan-improvements branch from 5099c01 to 00fc8ce Compare February 20, 2026 05:30
@Shekharrajak Shekharrajak force-pushed the feature/comet-native-scan-improvements branch from cb700d9 to f8dd8c8 Compare February 23, 2026 19:34
: : : +- BroadcastHashJoin
: : : :- Filter
: : : : +- ColumnarToRow
: : : : +- Scan parquet spark_catalog.default.store_returns [COMET: Native DataFusion scan does not support subqueries/dynamic pruning]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI checks where failing and hence need to update them

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not changing the fallback message for this PR and have a follow on PR to improve the message, so that this PR is smaller and just focuses on the functionality.

Another option is to add a new config to feature gate the DPP support and disable it for now in the stability suite.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove for review, Actually unit tests were failing and hence I had to update in PR itself to make all checks green.

val scanImpl = COMET_NATIVE_SCAN_IMPL.get()

// native_datafusion + DPP requires AQE. Without AQE, DPP subqueries aren't prepared
// before the scan tries to use their results, causing "has not finished" errors.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

Please trigger the CI checks

@Shekharrajak
Copy link
Copy Markdown
Contributor Author

CI check failure is due to network issue :

 Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 43705  100 43705    0     0  2538k      0 --:--:-- --:--:-- --:--:-- 2667k
Downloading https://services.gradle.org/distributions/gradle-8.13-bin.zip

Error: Exception in thread "main" java.io.IOException: Server returned HTTP response code: 502 for URL: https://github.com/gradle/gradle-distributions/releases/download/v8.13.0/gradle-8.13-bin.zip
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1978)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1564)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
	at org.gradle.wrapper.Install.forceFetch(SourceFile:2)
	at org.gradle.wrapper.Install$1.call(SourceFile:8)
	at org.gradle.wrapper.GradleWrapperMain.main(SourceFile:67)
Error: Process completed with exit code 1.

How can we re-trigger those 2 failing CI workflow ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants