-
Notifications
You must be signed in to change notification settings - Fork 305
chore: add assertion that not using comet scan but using native scan #1793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2406,19 +2406,19 @@ object QueryPlanSerde extends Logging with CometExprShim { | |
|
|
||
| // TODO this could be optimized more to stop walking the tree on hitting | ||
| // certain operators such as join or aggregate which will copy batches | ||
| def containsNativeCometScan(plan: SparkPlan): Boolean = { | ||
| def containsNonDataFusionScan(plan: SparkPlan): Boolean = { | ||
| plan match { | ||
| case w: CometScanWrapper => containsNativeCometScan(w.originalPlan) | ||
| case scan: CometScanExec => scan.scanImpl == CometConf.SCAN_NATIVE_COMET | ||
| case _: CometScanWrapper => true | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed it to return true as native scan exec is not wrapped in CometScanWrapper
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and when not scan datafusion so it will not use df filter when compat iceberg |
||
| case scan: CometScanExec => scan.scanImpl != CometConf.SCAN_NATIVE_DATAFUSION | ||
| case _: CometNativeScanExec => false | ||
| case _ => plan.children.exists(containsNativeCometScan) | ||
| case _ => plan.children.isEmpty || plan.children.exists(containsNonDataFusionScan) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. exists return false for empty children as it means that this is the data source, right? and if so it using the ScanExec of rust and not ScanExec of DataFusion
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
what does this mean?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we must start from a scan, right? either |
||
| } | ||
| } | ||
|
|
||
| val filterBuilder = OperatorOuterClass.Filter | ||
| .newBuilder() | ||
| .setPredicate(cond.get) | ||
| .setUseDatafusionFilter(!containsNativeCometScan(op)) | ||
| .setUseDatafusionFilter(!containsNonDataFusionScan(op)) | ||
| Some(result.setFilter(filterBuilder).build()) | ||
| } else { | ||
| withInfo(op, condition, child) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we only reuse buffers when using
native_cometto read the Parquet files. Ifnative_iceberg_compatis used then we will still have aScanExecreading from JVM but it is safe to use DataFusion's FilterExec in this case.@parthchandra would you agree with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct.
native_iceberg_compatdoes not reuse buffers.