Skip to content

fix: Do not push down subquery filters on native_datafusion scan#2438

Merged
andygrove merged 1 commit intoapache:mainfrom
wForget:COMET-2424
Sep 22, 2025
Merged

fix: Do not push down subquery filters on native_datafusion scan#2438
andygrove merged 1 commit intoapache:mainfrom
wForget:COMET-2424

Conversation

@wForget
Copy link
Copy Markdown
Member

@wForget wForget commented Sep 22, 2025

Which issue does this PR close?

Closes #2424.

Rationale for this change

We should not push down subquery filters on native_datafusion scan, because subqueries have not been executed yet when we build the nativeOp for CometNativeScanExec.

What changes are included in this PR?

Do not push down subquery filters on native_datafusion scan

How are these changes tested?

After this, tpch q22 ran successfully on spark 4.0.1 with comet:

image

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 22, 2025

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.04%. Comparing base (f09f8af) to head (4e01acc).
⚠️ Report is 1154 commits behind head on main.

Files with missing lines Patch % Lines
...ala/org/apache/spark/sql/comet/CometScanExec.scala 60.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2438      +/-   ##
============================================
+ Coverage     56.12%   58.04%   +1.91%     
- Complexity      976     1402     +426     
============================================
  Files           119      147      +28     
  Lines         11743    13469    +1726     
  Branches       2251     2342      +91     
============================================
+ Hits           6591     7818    +1227     
- Misses         4012     4417     +405     
- Partials       1140     1234      +94     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally and I can now run the benchmarks to completion 🎉

Thanks @wForget!

@andygrove andygrove merged commit caeb28c into apache:main Sep 22, 2025
180 of 188 checks passed

@transient
private lazy val pushedDownFilters = getPushedDownFilters(relation, dataFilters)
private lazy val pushedDownFilters = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also document it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should we document it?

@parthchandra
Copy link
Copy Markdown
Contributor

@wForget thank you for fixing this. One thought though - The same subquery execution would be completed if the scan were a spark scan, so some investigation is needed to determine why we see this issue with CometNativeScan.

@wForget
Copy link
Copy Markdown
Member Author

wForget commented Sep 23, 2025

@wForget thank you for fixing this. One thought though - The same subquery execution would be completed if the scan were a spark scan, so some investigation is needed to determine why we see this issue with CometNativeScan.

As commented in #2424 (comment), we called scan.rdd in the planner phase which caused pushedDownFilters to be used prematurely. If it was called in execute phase it would always wait for all subqueries to complete.

coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

5 participants