fix `NULL <op> column` evaluation, tests for same #2510

alamb · 2022-05-11T11:20:35Z

Which issue does this PR close?

Rationale for this change

While adding tests suggested in #2481 for reversed arguments, it uncovered a bug if the argument order was different.

Previously we tested column1 < NULL but not NULL < column1. When I did so I got

---- sql::expr::comparisons_with_null_neq_reverse stdout ----
thread 'sql::expr::comparisons_with_null_neq_reverse' panicked at 'called `Result::unwrap()` on an `Err` value: "ArrowError(ExternalError(Internal(\"Cannot convert Int64(NULL) to i64\"))) at Executing physical plan for 'select NULL != column1 from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t': ProjectionExec { expr: [(BinaryExpr { left: TryCastExpr { expr: Literal { value: NULL }, cast_type: Int64 }, op: NotEq, right: Column { name: \"column1\", index: 0 } }, \"NULL NotEq t.column1\")], schema: Schema { fields: [Field { name: \"NULL NotEq t.column1\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"column1\", index: 0 }, \"column1\")], schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ValuesExec { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column3\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, data: [RecordBatch { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column3\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int64>\n[\n  1,\n  2,\n], StringArray\n[\n  \"foo\",\n  \"bar\",\n], PrimitiveArray<Float64>\n[\n  2.3,\n  5.4,\n]], row_count: 2 }] }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }"', datafusion/core/tests/sql/mod.rs:563:10

What changes are included in this PR?

This PR adds a test exercising the order of expressions with null constants, and fixes evaluation of same

Are there any user-facing changes?

Fewer internal errors

cc @WinkerDu and @yjshen

alamb · 2022-05-11T11:20:54Z

datafusion/core/src/physical_plan/file_format/parquet.rs

-        // when a null array is generated for a statistics column,
-        assert_eq!(row_group_filter, vec![true, true]);
+
+        // bool = NULL always evaluates to NULL (and thus will not


this is the only surprising change I thought

alamb · 2022-05-11T11:21:15Z

datafusion/core/tests/sql/expr.rs

-        "+---------------------+",
+    // we expect all the following queries to yield a two null values
+    let cases = vec![
+        // 1. Numeric comparison with NULL


I wanted to test a bunch of different combinations, so I rewrote the test so I could check them all

alamb · 2022-05-11T11:21:24Z

datafusion/core/tests/sql/expr.rs

+        // ---- a different evaluation path)
+        // ----
+
+        // 1. Numeric comparison with NULL


Tests below here are new. The tests above were pre-existing

alamb · 2022-05-11T11:22:02Z

datafusion/physical-expr/src/expressions/binary.rs

        array: &dyn Array,
        scalar: &ScalarValue,
    ) -> Result<Option<Result<ArrayRef>>> {
+        let bool_type = &DataType::Boolean;


changes here were to make the code more readable (so cargo fmt put it on the same line)

alamb · 2022-05-11T11:22:51Z

datafusion/physical-expr/src/expressions/binary.rs

-            Operator::GtEq => binary_array_op_scalar!(array, scalar.clone(), lt_eq),
-            Operator::Eq => binary_array_op_scalar!(array, scalar.clone(), eq),
+            Operator::Lt => {
+                binary_array_op_dyn_scalar!(array, scalar.clone(), gt, bool_type)


This is the actual fix -- to use binary_array_op_dyn_scalar which @WinkerDu updated to handle null constants rather than binary_array_op_scalar which does not.

I plan to look into removing binary_array_op_scalar completely as a follow on PR.

Sorry for late response, love this fix ❤️

yjshen · 2022-05-17T01:15:30Z

Sorry for the late review, LGTM!

alamb · 2022-05-17T19:52:53Z

Thanks @yjshen

This reverts commit d497cde

fix NULL <op> column evaluation, tests for same

55ec30e

github-actions bot added the datafusion label May 11, 2022

alamb commented May 11, 2022

View reviewed changes

This was referenced May 11, 2022

Numeric, String, Boolean comparisons with literal NULL #2481

Merged

Remove unused binary_array_op_scalar! in binary.rs #2512

Merged

alamb requested review from andygrove and tustvold May 16, 2022 12:25

yjshen approved these changes May 17, 2022

View reviewed changes

Merge remote-tracking branch 'apache/master' into alamb/null_arg_order

fc31961

yjshen merged commit d497cde into apache:master May 18, 2022

alamb deleted the alamb/null_arg_order branch May 18, 2022 09:16

Ted-Jiang added a commit to Ted-Jiang/arrow-datafusion that referenced this pull request May 31, 2022

Revert "fix NULL <op> column evaluation, tests for same (apache#2510)"

8dcfa8c

This reverts commit d497cde

ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 15, 2022

fix NULL <op> column evaluation, tests for same (apache#2510)

909ed9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix `NULL <op> column` evaluation, tests for same #2510

fix `NULL <op> column` evaluation, tests for same #2510

Uh oh!

alamb commented May 11, 2022

Uh oh!

alamb May 11, 2022

Uh oh!

alamb May 11, 2022

Uh oh!

alamb May 11, 2022

Uh oh!

alamb May 11, 2022

Uh oh!

alamb May 11, 2022

Uh oh!

WinkerDu May 20, 2022

Uh oh!

yjshen commented May 17, 2022

Uh oh!

alamb commented May 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix NULL <op> column evaluation, tests for same #2510

fix NULL <op> column evaluation, tests for same #2510

Uh oh!

Conversation

alamb commented May 11, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

alamb May 11, 2022

Choose a reason for hiding this comment

Uh oh!

alamb May 11, 2022

Choose a reason for hiding this comment

Uh oh!

alamb May 11, 2022

Choose a reason for hiding this comment

Uh oh!

alamb May 11, 2022

Choose a reason for hiding this comment

Uh oh!

alamb May 11, 2022

Choose a reason for hiding this comment

Uh oh!

WinkerDu May 20, 2022

Choose a reason for hiding this comment

Uh oh!

yjshen commented May 17, 2022

Uh oh!

alamb commented May 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix `NULL <op> column` evaluation, tests for same #2510

fix `NULL <op> column` evaluation, tests for same #2510