Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@
//! Simplify expressions optimizer rule and implementation

use super::{ExprSimplifier, SimplifyContext};
use crate::utils::merge_schema;
use crate::{OptimizerConfig, OptimizerRule};
use datafusion_common::Result;
use datafusion_common::{DFSchemaRef, Result};
use datafusion_expr::{logical_plan::LogicalPlan, utils::from_plan};
use datafusion_physical_expr::execution_props::ExecutionProps;

Expand Down Expand Up @@ -59,13 +60,12 @@ impl SimplifyExpressions {
plan: &LogicalPlan,
execution_props: &ExecutionProps,
) -> Result<LogicalPlan> {
// We need to pass down the all schemas within the plan tree to `optimize_expr` in order to
// to evaluate expression types. For example, a projection plan's schema will only include
// projected columns. With just the projected schema, it's not possible to infer types for
// expressions that references non-projected columns within the same project plan or its
// children plans.
let info = plan
.all_schemas()
// Pass down the `children merge schema` and `plan schema` to evaluate expression types.
// pass all `child schema` and `plan schema` isn't enough, because like `t1 semi join t2 on
// on t1.id = t2.id`, each individual schema can't contain all the columns in it.
let children_merge_schema = DFSchemaRef::new(merge_schema(plan.inputs()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways, it seems to me that we should only be using the children's schemas as the expressions within he LogicalPlan should be in terms of the plan's inputs (the children's schemas) not the plan's output (plan.schema())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me that we should only be using the children's schemas.

I can't agree with it more.

But current code get attribution of some Expression by get it from plan.schema() instead of inferring/computing them from children ouput.

For example, some test will fail like csv_query_group_by_and_having_and_where.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else to work on over time, perhaps

let schemas = vec![plan.schema(), &children_merge_schema];
let info = schemas
.into_iter()
.fold(SimplifyContext::new(execution_props), |context, schema| {
context.with_schema(schema.clone())
Expand Down