Skip to content

[Core] Don't report 'Not supported to map spark function name to substrait function name: input_file_name(), class name: InputFileName.' #8580

@baibaichen

Description

@baibaichen

Problem description

Both ch and velox support InputFileName, InputFileBlockStart and InputFileBlockLength,

Issue 1 We still get the following warning:

WARN org.apache.gluten.execution.ProjectExecTransformer: Validation failed with exception for plan: ProjectExecTransformer, due to: Not supported to map spark function name to substrait function name: input_file_name(), class name: InputFileName.

Issue 2 Unnessary codes:

spark supports InputXXX through ThreadLocal, while Gluten supports them through push down. IIUC, the following codes colud be removed, after this pr and #7124.

GlutenWholeStageColumnarRDD::compute()
    // To support input_file_name(). According to semantic we should return
    // the exact file name a row belongs to. However in columnar engine it's
    // not easy to accomplish this. so we return a list of file(part) names
    split match {
      case FirstZippedPartitionsPartition(_, g: GlutenPartition, _) =>
        InputFileBlockHolderProxy.set(g.files.mkString(","))
      case _ =>
        InputFileBlockHolderProxy.unset()
    }

Issue 3 Impreove Code quality:

Can PushDownInputFileExpression be optimized without going to CollapseProject later?

System information

no

CMake log

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions