Skip to content

Incorrect results from pre_selection_scatter when RHS is scalar #16928

@mwylde

Description

@mwylde

Describe the bug

When evaluating an AND expression, if fewer than 20% of the LHS values are true, the evaluator will use a "pre selection" strategy wherein we evaluate the right hand only for those rows where the left hand is true.

However, there is a bug in how scalars are handled for the right hand:

let right_boolean_array = match &right_result {
ColumnarValue::Array(array) => array.as_boolean(),
ColumnarValue::Scalar(_) => return Ok(right_result),
};

For any scalar RHS we just return the scalar, which is not generally correct.

For example, in the expession (x = 5 AND true) evaluated against [3, 5, 10, 12, 1] this will return [true, true, true, true], whereas the correct result is [false, true, false, false, false].

To Reproduce

This is somewhat hard to reproduce as expressions that trigger the issue will generally be optimized away, however it can be triggered by unoptimized expressions or expressions that are manually constructed.

This test exhibits the behavior

    #[test]
    fn test_and_true_preselection_returns_lhs() {
        let schema =
            Arc::new(Schema::new(vec![Field::new("c", DataType::Boolean, false)]));
        let c_array = Arc::new(BooleanArray::from(vec![false, true, false, false, false]))
            as ArrayRef;
        let batch = RecordBatch::try_new(Arc::clone(&schema), vec![Arc::clone(&c_array)])
            .unwrap();

        let expr = logical2physical(&logical_col("c").and(expr_lit(true)), &schema);

        let result = expr.evaluate(&batch).unwrap();
        let ColumnarValue::Array(result_arr) = result else {
            panic!("Expected ColumnarValue::Array");
        };

        let expected: Vec<_> = c_array.as_boolean().iter().collect();
        let actual: Vec<_> = result_arr.as_boolean().iter().collect();
        assert_eq!(
            expected, actual,
            "AND with TRUE must equal LHS even with PreSelection"
        );
    }

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionSomething that used to work no longer does

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions