Problem description
Both ch and velox support InputFileName, InputFileBlockStart and InputFileBlockLength,
Issue 1 We still get the following warning:
WARN org.apache.gluten.execution.ProjectExecTransformer: Validation failed with exception for plan: ProjectExecTransformer, due to: Not supported to map spark function name to substrait function name: input_file_name(), class name: InputFileName.
Issue 2 Unnessary codes:
spark supports InputXXX through ThreadLocal, while Gluten supports them through push down. IIUC, the following codes colud be removed, after this pr and #7124.
GlutenWholeStageColumnarRDD::compute()
// To support input_file_name(). According to semantic we should return
// the exact file name a row belongs to. However in columnar engine it's
// not easy to accomplish this. so we return a list of file(part) names
split match {
case FirstZippedPartitionsPartition(_, g: GlutenPartition, _) =>
InputFileBlockHolderProxy.set(g.files.mkString(","))
case _ =>
InputFileBlockHolderProxy.unset()
}
Issue 3 Impreove Code quality:
Can PushDownInputFileExpression be optimized without going to CollapseProject later?
System information
no
CMake log
Problem description
Both ch and velox support
InputFileName,InputFileBlockStartandInputFileBlockLength,Issue 1 We still get the following warning:
Issue 2 Unnessary codes:
spark supports
InputXXXthroughThreadLocal, while Gluten supports them through push down. IIUC, the following codes colud be removed, after this pr and #7124.Issue 3 Impreove Code quality:
Can PushDownInputFileExpression be optimized without going to CollapseProject later?
System information
no
CMake log