[VL] Fix function input_file_name() outputs empty string in certain query plan patterns#7124
[VL] Fix function input_file_name() outputs empty string in certain query plan patterns#7124zhztheplayer merged 4 commits intoapache:mainfrom
input_file_name() outputs empty string in certain query plan patterns#7124Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
|
cc @zhztheplayer Thank you. |
zhztheplayer
left a comment
There was a problem hiding this comment.
Overall looking good to me. Thank you @zml1206.
cc @liuneng1994, if you are considering switching to this way, perhaps we need a PR for CH individually to add the rules to CH rule list after this PR gets landed.
| ProjectExec(newProjectList, newChild) | ||
| } | ||
|
|
||
| private def rewriteExpr(expr: Expression, replacedExprs: Map[String, Alias]): Expression = { |
There was a problem hiding this comment.
Could be replacedExprs: mutable.Map[String, Alias]
| } | ||
| } | ||
|
|
||
| def addMetadataCol(plan: SparkPlan, replacedExprs: Map[String, Alias]): SparkPlan = plan match { |
There was a problem hiding this comment.
Could be replacedExprs: mutable.Map[String, Alias]
| def injectLegacy(injector: LegacyInjector): Unit = { | ||
| // Gluten columnar: Transform rules. | ||
| injector.injectTransform(_ => RemoveTransitions) | ||
| injector.injectTransform(_ => PushDownInputFileExpressionBeforeLeaf) |
There was a problem hiding this comment.
Let's do a rename for the two rules. Since they seem to always be used together as a rule pair.
| injector.injectTransform(_ => PushDownInputFileExpressionBeforeLeaf) | |
| injector.injectTransform(_ => PushDownInputFileExpression.PreOffload) |
| injector.injectTransform(_ => TransformPreOverrides()) | ||
| injector.injectTransform(_ => RemoveNativeWriteFilesSortAndProject()) | ||
| injector.injectTransform(c => RewriteTransformer.apply(c.session)) | ||
| injector.injectTransform(_ => PushDownInputFileExpressionToScan) |
There was a problem hiding this comment.
Similarly,
| injector.injectTransform(_ => PushDownInputFileExpressionToScan) | |
| injector.injectTransform(_ => PushDownInputFileExpression.PostOffload) |
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
input_file_name() outputs empty string in certain query plan patterns
… query plan patterns (apache#7124)
… query plan patterns (apache#7124)
… query plan patterns (apache#7124)
… query plan patterns (apache#7124)
What changes were proposed in this pull request?
The Spark implementations of input_file_name/input_file_block_start/input_file_block_length uses a thread local to stash the file name and retrieve it from the function. If there is a transformer node between project input_file_function and scan, the result of input_file_name is an empty string. So we should push down input_file_function to transformer scan or add fallback project of input_file_function before fallback scan.
The processing logic is as follows:
1.Before offload, add new project before leaf node and push down input file expression to the new project
2.Normal offload project and scan
3.After offload, if scan be offloaded, push down input file expression into scan and remove project
How was this patch tested?
UT