Skip to content

Incorrect results for nested joins after applying EliminateCrossJoin rule #7530

@epsio-banay

Description

@epsio-banay

Describe the bug

This bug is very similar to #4844 - The EliminateCrossJoin rule discards the Filter rules of InnerJoin.
The fix for the previous bug (#4869) was to skip the rule if an InnerJoin with a Filter was found.
Unfortunately the fix only checks the top InnerJoin, so nested InnerJoins are not checked and might lose their filter condition

To Reproduce

Add the below test to eliminate_cross_join.rs:

    #[test]
    fn eliminate_cross_not_possible_nested_inner_join_with_filter() -> Result<()> {
        let t1 = test_table_scan_with_name("t1")?;
        let t2 = test_table_scan_with_name("t2")?;
        let t3 = test_table_scan_with_name("t3")?;

        // could not eliminate to inner join with filter
        let plan = LogicalPlanBuilder::from(t1)
            .join(
                t3,
                JoinType::Inner,
                (vec!["t1.a"], vec!["t3.a"]),
                Some(col("t1.a").gt(lit(20u32))),
            )?
            .join(t2, JoinType::Inner, (vec!["t1.a"], vec!["t2.a"]), None)?
            .filter(col("t1.a").gt(lit(15u32)))?
            .build()?;

        let formatted = plan.display_indent_schema().to_string();
        let before_optimization: Vec<&str> = formatted.trim().lines().collect();

        assert_optimized_plan_eq(&plan, before_optimization);

        Ok(())
    }

The test will fail in assert_optimized_plan_eq with this error:

expected:

[
    "Filter: t1.a > UInt32(15) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
    "  Inner Join: t1.a = t2.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
    "    Inner Join: t1.a = t3.a Filter: t1.a > UInt32(20) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
    "      TableScan: t1 [a:UInt32, b:UInt32, c:UInt32]",
    "      TableScan: t3 [a:UInt32, b:UInt32, c:UInt32]",
    "    TableScan: t2 [a:UInt32, b:UInt32, c:UInt32]",
]
actual:

[
    "Filter: t1.a > UInt32(15) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
    "  Inner Join: t1.a = t2.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
    "    Inner Join: t1.a = t3.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
    "      TableScan: t1 [a:UInt32, b:UInt32, c:UInt32]",
    "      TableScan: t3 [a:UInt32, b:UInt32, c:UInt32]",
    "    TableScan: t2 [a:UInt32, b:UInt32, c:UInt32]",
]

expected is the plan before the optimization is applied and actual is the plan after the optimization is applied.
You can see nested Inner join filter was incorrectly dismissed (t1.a > UInt32(20))

Expected behavior

In case the InnerJoin has a Filter the rule should be skipped or optimized correctly

Additional context

#4866 will probably fix this bug but I think a quicker fix is needed to be applied here in order to regain query results correctness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions