Skip to content

[VL] Collapse trivial projects generated by rule PushDownInputFileExpression#7188

Merged
zhztheplayer merged 2 commits intoapache:mainfrom
zml1206:collapse_project
Sep 13, 2024
Merged

[VL] Collapse trivial projects generated by rule PushDownInputFileExpression#7188
zhztheplayer merged 2 commits intoapache:mainfrom
zml1206:collapse_project

Conversation

@zml1206
Copy link
Copy Markdown
Contributor

@zml1206 zml1206 commented Sep 10, 2024

What changes were proposed in this pull request?

follow #7124
In PushDownInputFileExpression, we will add new project before leaf node and push down input file expression to the
new project, if scan is fallback and the outer project is cheap or fallback, we can collapse project to simplify physical plan.

How was this patch tested?

UT.

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Sep 10, 2024
@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Sep 11, 2024

cc @zhztheplayer Thank you.

@zhztheplayer
Copy link
Copy Markdown
Member

Hi @zml1206, why this is done in rule PushDownInputFileExpression? Given that we already had individual rules like CollapseProjectExecTransformer to collapse projects. Should we have an individual rule for this feature as well?

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Sep 11, 2024

Hi @zml1206, why this is done in rule PushDownInputFileExpression?

Because this Project is produced by PushDownInputFileExpression, only when scan is fallback we try to collapse. The collapse conditions here are special. Normally there will not be two projects like this because of the spark CollapseProject rule. So done it in this rule would be more appropriate?

@zhztheplayer zhztheplayer merged commit db5a2f7 into apache:main Sep 13, 2024
@zhztheplayer zhztheplayer changed the title [VL] Collapse project if scan is fallback and the outer project is cheap or fallback [VL] Collapse trivial projects generated by rule PushDownInputFileExpression Sep 13, 2024
@baibaichen
Copy link
Copy Markdown
Contributor

baibaichen commented Jan 21, 2025

@zml1206 @zhztheplayer

IIUC, the following codes colud be removed, after this pr and #7124, right?

GlutenWholeStageColumnarRDD::compute()
    // To support input_file_name(). According to semantic we should return
    // the exact file name a row belongs to. However in columnar engine it's
    // not easy to accomplish this. so we return a list of file(part) names
    split match {
      case FirstZippedPartitionsPartition(_, g: GlutenPartition, _) =>
        InputFileBlockHolderProxy.set(g.files.mkString(","))
      case _ =>
        InputFileBlockHolderProxy.unset()
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants