feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto` by andygrove · Pull Request #1933 · apache/datafusion-comet

andygrove · 2025-06-25T19:38:01Z

Which issue does this PR close?

Rationale for this change

With this change, most end users no longer need to be aware of native_comet, native_datafusion, or native_iceberg_compat scans and what each of them supports. Comet will just pick the best scan for the job. If we hit any issues with this approach then we can still ask users to specify a specific scan to use.

What changes are included in this PR?

How are these changes tested?

This reverts commit 38d6643.

codecov-commenter · 2025-06-25T20:03:07Z

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 58.47%. Comparing base (f09f8af) to head (cca388d).
⚠️ Report is 1150 commits behind head on main.

Files with missing lines	Patch %	Lines
...n/scala/org/apache/comet/rules/CometScanRule.scala	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1933      +/-   ##
============================================
+ Coverage     56.12%   58.47%   +2.35%     
- Complexity      976     1144     +168     
============================================
  Files           119      131      +12     
  Lines         11743    12909    +1166     
  Branches       2251     2399     +148     
============================================
+ Hits           6591     7549     +958     
- Misses         4012     4136     +124     
- Partials       1140     1224      +84

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hsiang-c · 2025-06-26T19:03:42Z

+
+    // native_iceberg_compat only supports local filesystem and S3
+    if (!scanExec.relation.inputFiles
+        .forall(path => path.startsWith("file://") || path.startsWith("s3a://"))) {


S3AFileSystem, used by HadoopFileIO class in Iceberg, recognizes s3a scheme.

However, there is a S3FileIO Iceberg class that recognizes s3, s3a and s3n. We might have to handle more schemes in the future.

It also supports HDFS if the feature is enabled

I wouldn't bother with s3:// and s3n:// urls. Those are defunct afaik.

andygrove · 2025-06-27T18:00:13Z

+    if (CometSparkSessionExtensions.isSpark40Plus) {
+      fallbackReasons += s"$SCAN_NATIVE_ICEBERG_COMPAT  is not implemented for Spark 4.0.0"
+    }


We can revisit this after #1830 is merged

parthchandra · 2025-06-27T18:16:45Z

Wondering if it is a good idea to change the default this close to a release. It might be safer to change it at the beginning of a release cycle, perhaps?

andygrove · 2025-06-27T18:41:32Z

Wondering if it is a good idea to change the default this close to a release. It might be safer to change it at the beginning of a release cycle, perhaps?

If anyone runs into issues, they can specify spark.comet.scan.impl=native_comet to revert to the previous behavior.

The benefit of enabling auto as the default is that we now get complex type support by default when reading Parquet, as well as much improved performance.

kazuyukitanimura · 2025-06-27T19:57:48Z

-        run: |
-          cd apache-spark
-          rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
-          ENABLE_COMET=true ENABLE_COMET_SHUFFLE=true COMET_PARQUET_SCAN_IMPL=auto build/sbt -Dsbt.log.noformat=true ${{ matrix.module.args1 }} "${{ matrix.module.args2 }}"


What about repurpose this test for COMET_PARQUET_SCAN_IMPL=native_comet?

Yes, that's a good idea.

I enabled the native_comet tests in spark_sql_test.yaml, alongside the auto and native_iceberg_compat tests.

kazuyukitanimura

Thanks @andygrove
pending with CI

parthchandra · 2025-06-27T21:07:02Z

Okay, tried this out with a few test queries and real world data and everything worked okay, so I feel more confident that this change is safe.

…che#1933)

andygrove added 14 commits June 13, 2025 13:50

enable Spark SQL tests

ae23bee

fix

38d6643

Revert "fix"

db10be7

This reverts commit 38d6643.

fix 3.5.6 diff

207d39b

fix 3.4.3 diff

facd4e3

fix 4.0.0-preview1 diff

5beaad8

fix Spark SQL log format

a0bba56

Merge remote-tracking branch 'apache/main' into spark-sql-test-auto-scan

e2fda71

Merge remote-tracking branch 'apache/main' into spark-sql-test-auto-scan

248816d

Check for data file location in auto scan mode

79f77d6

upmerge

ef5c0a9

format

4050677

fix

5b23ef0

Enable auto scan mode by default

483cb8c

andygrove added 3 commits June 25, 2025 17:32

skip test

eedcde8

Merge branch 'auto-file-prefix' into auto-scan

b996964

fix Spark 4 issue

06a80fb

andygrove mentioned this pull request Jun 26, 2025

Release Comet 0.9.0 (June/July 2025) #1856

Closed

2 tasks

andygrove added 3 commits June 26, 2025 05:57

fix

590388e

update 4.0.0 diff

90c8183

Merge branch 'auto-file-prefix' into auto-scan

31964b3

hsiang-c reviewed Jun 26, 2025

View reviewed changes

upmerge

75e5c87

andygrove commented Jun 27, 2025

View reviewed changes

andygrove marked this pull request as ready for review June 27, 2025 18:18

kazuyukitanimura reviewed Jun 27, 2025

View reviewed changes

run Spark SQL tests for native_comet

cca388d

kazuyukitanimura approved these changes Jun 27, 2025

View reviewed changes

parthchandra approved these changes Jun 27, 2025

View reviewed changes

andygrove merged commit 06ed88b into apache:main Jun 28, 2025
122 of 124 checks passed

andygrove deleted the auto-scan branch July 10, 2025 22:07

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025

feat: Change default value of COMET_NATIVE_SCAN_IMPL to auto (apa…

8f48fe9

…che#1933)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto`#1933

feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto`#1933
andygrove merged 22 commits intoapache:mainfrom
andygrove:auto-scan

andygrove commented Jun 25, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 25, 2025 •

edited

Loading

Uh oh!

hsiang-c Jun 26, 2025 •

edited

Loading

Uh oh!

comphead Jun 27, 2025

Uh oh!

parthchandra Jun 27, 2025

Uh oh!

andygrove Jun 27, 2025

Uh oh!

parthchandra commented Jun 27, 2025

Uh oh!

andygrove commented Jun 27, 2025

Uh oh!

kazuyukitanimura Jun 27, 2025

Uh oh!

andygrove Jun 27, 2025

Uh oh!

andygrove Jun 27, 2025

Uh oh!

kazuyukitanimura left a comment

Uh oh!

parthchandra commented Jun 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

andygrove commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hsiang-c Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra commented Jun 27, 2025

Uh oh!

andygrove commented Jun 27, 2025

Uh oh!

kazuyukitanimura Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

parthchandra commented Jun 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

andygrove commented Jun 25, 2025 •

edited

Loading

codecov-commenter commented Jun 25, 2025 •

edited

Loading

hsiang-c Jun 26, 2025 •

edited

Loading