Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented May 11, 2022

Which issue does this PR close?

re #1179 and #2482 .

Rationale for this change

While adding tests suggested in #2481 for reversed arguments, it uncovered a bug if the argument order was different.

Previously we tested column1 < NULL but not NULL < column1. When I did so I got

---- sql::expr::comparisons_with_null_neq_reverse stdout ----
thread 'sql::expr::comparisons_with_null_neq_reverse' panicked at 'called `Result::unwrap()` on an `Err` value: "ArrowError(ExternalError(Internal(\"Cannot convert Int64(NULL) to i64\"))) at Executing physical plan for 'select NULL != column1 from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t': ProjectionExec { expr: [(BinaryExpr { left: TryCastExpr { expr: Literal { value: NULL }, cast_type: Int64 }, op: NotEq, right: Column { name: \"column1\", index: 0 } }, \"NULL NotEq t.column1\")], schema: Schema { fields: [Field { name: \"NULL NotEq t.column1\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ProjectionExec { expr: [(Column { name: \"column1\", index: 0 }, \"column1\")], schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: ValuesExec { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column3\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, data: [RecordBatch { schema: Schema { fields: [Field { name: \"column1\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column2\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: \"column3\", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int64>\n[\n  1,\n  2,\n], StringArray\n[\n  \"foo\",\n  \"bar\",\n], PrimitiveArray<Float64>\n[\n  2.3,\n  5.4,\n]], row_count: 2 }] }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }, metrics: ExecutionPlanMetricsSet { inner: Mutex { data: MetricsSet { metrics: [] } } } }"', datafusion/core/tests/sql/mod.rs:563:10

What changes are included in this PR?

This PR adds a test exercising the order of expressions with null constants, and fixes evaluation of same

Are there any user-facing changes?

Fewer internal errors

cc @WinkerDu and @yjshen

// when a null array is generated for a statistics column,
assert_eq!(row_group_filter, vec![true, true]);

// bool = NULL always evaluates to NULL (and thus will not
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only surprising change I thought

"+---------------------+",
// we expect all the following queries to yield a two null values
let cases = vec![
// 1. Numeric comparison with NULL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to test a bunch of different combinations, so I rewrote the test so I could check them all

// ---- a different evaluation path)
// ----

// 1. Numeric comparison with NULL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests below here are new. The tests above were pre-existing

array: &dyn Array,
scalar: &ScalarValue,
) -> Result<Option<Result<ArrayRef>>> {
let bool_type = &DataType::Boolean;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes here were to make the code more readable (so cargo fmt put it on the same line)

Operator::GtEq => binary_array_op_scalar!(array, scalar.clone(), lt_eq),
Operator::Eq => binary_array_op_scalar!(array, scalar.clone(), eq),
Operator::Lt => {
binary_array_op_dyn_scalar!(array, scalar.clone(), gt, bool_type)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual fix -- to use binary_array_op_dyn_scalar which @WinkerDu updated to handle null constants rather than binary_array_op_scalar which does not.

I plan to look into removing binary_array_op_scalar completely as a follow on PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response, love this fix ❤️

@yjshen
Copy link
Member

yjshen commented May 17, 2022

Sorry for the late review, LGTM!

@alamb
Copy link
Contributor Author

alamb commented May 17, 2022

Thanks @yjshen

@yjshen yjshen merged commit d497cde into apache:master May 18, 2022
@alamb alamb deleted the alamb/null_arg_order branch May 18, 2022 09:16
Ted-Jiang added a commit to Ted-Jiang/arrow-datafusion that referenced this pull request May 31, 2022
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants