-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-29277][SQL][test-hadoop3.2] Add early DSv2 filter and projection pushdown #26341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
### What changes were proposed in this pull request? This adds a new rule, `V2ScanRelationPushDown`, to push filters and projections in to a new `DataSourceV2ScanRelation` in the optimizer. That scan is then used when converting to a physical scan node. The new relation correctly reports stats based on the scan. To run scan pushdown before rules where stats are used, this adds a new optimizer override, `earlyScanPushDownRules` and a batch for early pushdown in the optimizer, before cost-based join reordering. The other early pushdown rule, `PruneFileSourcePartitions`, is moved into the early pushdown rule set. This also moves pushdown helper methods from `DataSourceV2Strategy` into a util class. ### Why are the changes needed? This is needed for DSv2 sources to supply stats for cost-based rules in the optimizer. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This updates the implementation of stats from `DataSourceV2Relation` so tests will fail if stats are accessed before early pushdown for v2 relations. Closes apache#25955 from rdblue/move-v2-pushdown. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Ryan Blue <blue@apache.org>
|
We have 2 |
|
Test build #113011 has finished for PR 26341 at commit
|
|
@viirya @HyukjinKwon seems it's CRAN issue again? I see this Spark R failure in many other PRs. |
|
@cloud-fan seems so, let me looking at this. |
|
Thank you for making this back swiftly, @cloud-fan ! |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Since all test passed with hadoop-3.2 and we know this CRAN issue, I'll merge this back.
* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) :
dims [product 24] do not match the length of object [0]
Thank you, @cloud-fan . Merged to master.
|
Contacted CRAN sysadmin and expect fix soon. |
|
Thank you always, @viirya ! 😄 |
|
Can someone explain what happened here? |
|
Okay, so the PR needed to be updated to pass tests in a different profile? Is this something we could have caught before merging the initial PR? Or is there something that we can do to avoid problems like this in the future? |
|
Yes, right. Technically, we need to trigger twice with the default and with |
|
For now, our PRBuilder doesn't test for both ones. So, until now, we need to trigger manually with the title string. |
Bring back #25955
What changes were proposed in this pull request?
This adds a new rule,
V2ScanRelationPushDown, to push filters and projections in to a newDataSourceV2ScanRelationin the optimizer. That scan is then used when converting to a physical scan node. The new relation correctly reports stats based on the scan.To run scan pushdown before rules where stats are used, this adds a new optimizer override,
earlyScanPushDownRulesand a batch for early pushdown in the optimizer, before cost-based join reordering. The other early pushdown rule,PruneFileSourcePartitions, is moved into the early pushdown rule set.This also moves pushdown helper methods from
DataSourceV2Strategyinto a util class.Why are the changes needed?
This is needed for DSv2 sources to supply stats for cost-based rules in the optimizer.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This updates the implementation of stats from
DataSourceV2Relationso tests will fail if stats are accessed before early pushdown for v2 relations.