perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config by andygrove · Pull Request #1936 · apache/datafusion-comet

andygrove · 2025-06-26T14:53:32Z

Which issue does this PR close?

N/A

Rationale for this change

The new native scans perform poorly when Parquet filter pushdown is enabled, which is the default in Spark.

See apache/datafusion#3463 for reasons why filter pushdown is not enabled in DataFusion by default yet.

What changes are included in this PR?

Add a new config that tells Comet whether to respect Spark's filter pushdown config. We need to respect the config when running Spark SQL tests, but want to ignore the config by default for best performance.

How are these changes tested?

codecov-commenter · 2025-06-26T15:24:44Z

Codecov Report

Attention: Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.

Project coverage is 33.43%. Comparing base (f09f8af) to head (e7b2a58).
Report is 291 commits behind head on main.

Files with missing lines	Patch %	Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala	57.14%	1 Missing and 2 partials ⚠️
.../apache/comet/parquet/CometParquetFileFormat.scala	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##               main    #1936       +/-   ##
=============================================
- Coverage     56.12%   33.43%   -22.70%     
+ Complexity      976      804      -172     
=============================================
  Files           119      131       +12     
  Lines         11743    12917     +1174     
  Branches       2251     2402      +151     
=============================================
- Hits           6591     4319     -2272     
- Misses         4012     7660     +3648     
+ Partials       1140      938      -202

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

comphead

Thanks @andygrove I think its lgtm, although it might be confusing having a parameter that enables another parameter 🤔

andygrove · 2025-06-27T16:22:42Z

Thanks @andygrove I think its lgtm, although it might be confusing having a parameter that enables another parameter 🤔

Yeah, I know. The alternative is to ask users to disable the Spark config, but I'm assuming that most users won't read the documentation to discover that this is needed for good performance.

parthchandra

lgtm

parthchandra · 2025-06-27T18:23:05Z

  private val datetimeRebaseModeInRead = options.datetimeRebaseModeInRead
-  private val parquetFilterPushDown = sqlConf.parquetFilterPushDown
+  private val parquetFilterPushDown = sqlConf.parquetFilterPushDown &&
+    CometConf.COMET_RESPECT_PARQUET_FILTER_PUSHDOWN.get(sqlConf)


This is not necessary right now because this is part of DSV2 support and the new native scan impls do not support DSV2.
Though we might add it for native_iceberg_compat

Thanks. I reverted this change.

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

…sion-comet into parquet-pushdown-config

kazuyukitanimura · 2025-06-27T20:04:18Z

 +      conf
 +        .set("spark.sql.extensions", "org.apache.comet.CometSparkSessionExtensions")
 +        .set("spark.comet.enabled", "true")
+        .set("spark.comet.parquet.respectFilterPushdown", "true")


Do we need to add .set("spark.comet.parquet.respectFilterPushdown", "true") at a few more locations?
E.g. TestHive.scala
Could be other locations as well

All of the Spark SQL tests are passing.

I checked, and there are no hive tests that reference PARQUET_FILTER_PUSHDOWN_ENABLED.

andygrove · 2025-06-27T20:59:25Z

Thanks for the reviews @kazuyukitanimura @parthchandra @comphead

Add new config

28b8cb6

fix

eeca033

andygrove changed the title ~~perf: Add COMET_RESPECT_PARQUET_FILTER_PUSHDOWN_ENABLED config~~ perf: Add COMET_RESPECT_PARQUET_FILTER_PUSHDOWN config Jun 26, 2025

format

18cfcec

andygrove mentioned this pull request Jun 26, 2025

Release Comet 0.9.0 (June/July 2025) #1856

Closed

2 tasks

andygrove added 5 commits June 26, 2025 13:29

diffs

349ced5

Clippy fixes for Rust 1.88

443137d

format

097dad3

more

a5cf229

xMerge branch 'clippy-1.88' into parquet-pushdown-config

a6a4a25

andygrove marked this pull request as ready for review June 26, 2025 23:07

comphead reviewed Jun 27, 2025

View reviewed changes

Comment thread common/src/main/scala/org/apache/comet/CometConf.scala Outdated

comphead reviewed Jun 27, 2025

View reviewed changes

Comment thread docs/source/user-guide/configs.md Outdated

comphead approved these changes Jun 27, 2025

View reviewed changes

parthchandra approved these changes Jun 27, 2025

View reviewed changes

andygrove and others added 6 commits June 27, 2025 12:47

Update common/src/main/scala/org/apache/comet/CometConf.scala

17ddec6

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

Update docs/source/user-guide/configs.md

bcf4d88

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

Merge remote-tracking branch 'apache/main' into parquet-pushdown-config

dd66984

address feedback

7d0a51f

Merge branch 'parquet-pushdown-config' of github.com:andygrove/datafu…

2de0847

…sion-comet into parquet-pushdown-config

scalastyle

e7b2a58

kazuyukitanimura reviewed Jun 27, 2025

View reviewed changes

kazuyukitanimura approved these changes Jun 27, 2025

View reviewed changes

andygrove merged commit 469ee6e into apache:main Jun 27, 2025
96 checks passed

andygrove deleted the parquet-pushdown-config branch June 27, 2025 20:59

andygrove mentioned this pull request Jun 30, 2025

Feat: Support Spark 4.0.0 part1 #1830

Merged

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025

perf: Add COMET_RESPECT_PARQUET_FILTER_PUSHDOWN config (apache#1936)

4550155

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config#1936

perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config#1936
andygrove merged 14 commits intoapache:mainfrom
andygrove:parquet-pushdown-config

andygrove commented Jun 26, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

comphead left a comment

Uh oh!

andygrove commented Jun 27, 2025

Uh oh!

parthchandra left a comment

Uh oh!

parthchandra Jun 27, 2025

Uh oh!

andygrove Jun 27, 2025

Uh oh!

kazuyukitanimura Jun 27, 2025 •

edited

Loading

Uh oh!

andygrove Jun 27, 2025

Uh oh!

andygrove Jun 27, 2025

Uh oh!

Uh oh!

andygrove commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

andygrove commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Jun 27, 2025

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

parthchandra Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andygrove commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andygrove commented Jun 26, 2025 •

edited

Loading

codecov-commenter commented Jun 26, 2025 •

edited

Loading

kazuyukitanimura Jun 27, 2025 •

edited

Loading