set druid.expressions.useStrictBooleans to true by default#14734
set druid.expressions.useStrictBooleans to true by default#14734clintropolis merged 7 commits intoapache:masterfrom
Conversation
| |---|-----------|-------| | ||
| |`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|false| | ||
| |`druid.expressions.allowNestedArrays`|If enabled, Druid array expressions can create nested arrays.|false| | ||
| |`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|true| |
There was a problem hiding this comment.
| |`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|true| | |
| |`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions. If enabled, all boolean values are either `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes) for more information.|true| |
| native boolean types, Druid ingests these values as strings if `druid.expressions.useStrictBooleans` is set to `false` | ||
| (the default), or longs if set to `true` (for more SQL compatible behavior). Array typed columns can be queried using | ||
| native boolean types, Druid ingests these values as longs if `druid.expressions.useStrictBooleans` is set to `true` | ||
| (the default), or strings if set to `false`. Array typed columns can be queried using |
There was a problem hiding this comment.
| (the default), or strings if set to `false`. Array typed columns can be queried using | |
| (the default) or strings if set to `false`. Array typed columns can be queried using |
| * `cast` is supported for numeric and string types | ||
| * math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types | ||
| * logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`) | ||
| * logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`, the default) |
There was a problem hiding this comment.
| * logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`, the default) | |
| * logical operators: `!`, `&&`, `||`, are supported for string and numeric types by default |
| Prior to the 0.23 release of Apache Druid, boolean function expressions have inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false. | ||
|
|
||
| After 0.23, while the inconsistent legacy behavior is still the default, it can be optionally be changed by setting `druid.expressions.useStrictBooleans=true`, so that these operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. Additionally, | ||
| In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default, so logical operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. |
There was a problem hiding this comment.
| In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default, so logical operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. | |
| In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default. Logical operations treat `null` values as "unknown" for SQL compatible behavior. _all boolean output functions_ output homogeneous `LONG` typed boolean values of `1` for `true` and `0` for `false`. |
| * `100 && 0` -> `0` | ||
| * `'troo' && 'true'` -> `'troo'` | ||
| * `'troo' || 'true'` -> `'true'` | ||
| * `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`. |
There was a problem hiding this comment.
| * `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`. | |
| * `STRING`: the value `'true'` (case insensitive) is considered `true`, everything else is `false`. |
| @@ -321,14 +319,7 @@ For the "and" operator: | |||
|
|
|||
| Druid currently still retains implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values in both modes: | |||
| * `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false` | |||
There was a problem hiding this comment.
| * `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false` | |
| * `LONG` or `DOUBLE`: any value greater than 0 is considered `true`, else `false`. |
| * `'troo' && 'true'` -> `0` | ||
| * `'troo' || 'true'` -> `1` | ||
|
|
||
| Prior to the 28 release of Apache Druid, boolean function expressions had inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false. This legacy mode can be enabled by setting `druid.expressions.useStrictBooleans=false`. |
There was a problem hiding this comment.
This paragraph is confusing because you use past and present tense to refer to the past behavior. I think we should rewrite it so that it follows the same format.
ektravel
left a comment
There was a problem hiding this comment.
Left some suggestions for the docs.
|
didn't review the code line by line but +1 on setting this default to true makes sense. So we avoid issues such as https://www.linen.dev/s/apache-druid/t/10978927/hi-all-i-think-i-find-found-a-calcite-bug-or-something-if-i- |
gianm
left a comment
There was a problem hiding this comment.
Reviewed just the docs. I haven't look at the production code since others have already reviewed that.
| * `'troo' && 'true'` -> `0` | ||
| * `'troo' || 'true'` -> `1` | ||
|
|
||
| Prior to the 28 release of Apache Druid, boolean function expressions had inconsistent handling of true and false values. The logical 'and' and 'or' operators behaved in a manner that was incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) was enabled. Logical operators would also pass through their input values similar to many scripting languages, and treated `null` as false, which would result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retained their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produced `LONG` typed values of `1` for true and `0` for false. This legacy mode can be enabled by setting `druid.expressions.useStrictBooleans=false`. |
There was a problem hiding this comment.
"Prior to Druid 28.0.0" is more consistent with how we write this kind of stuff elsewhere.
Also, I think this paragraph would read better if switched into the present tense, and written as if it's describing druid.expressions.useStrictBooleans = false, not as if it's describing Druid prior to 28.0.0. (Still include a mention that the default changed in 28.0.0 though.)
Description
Part of #14154, this PR sets 'strict booleans' to be on by default, which homogenizes Druid boolean handling to consistently use
LONGtypes internally and behave in a more SQL compliant manner, particularly with how nulls are handled in correct tri-state logic. This config was introduced in #11184.Release note
druid.expressions.useStrictBooleansis now enabled by default in the code in Druid 28. If this setting is not explicitly configured in runtime.properties, upon upgrade clusters will take on new behavior with how Druid handles boolean types during ingestion and query processing, consistently usingLONGtypes internally for any boolean values which are ingested and as the output of boolean functions for ingestion transformations and query time operations.Previous behavior:
100 && 11->110.7 || 0.3->0.7100 && 0->0'troo' && 'true'->'troo''troo' || 'true'->'true'New behavior:
100 && 11->10.7 || 0.3->1100 && 0->0'troo' && 'true'->0'troo' || 'true'->1Logical operators will also behave in an SQL compliant manner, treating nulls as 'unknown' instead of false.
For the "or" operator:
true || null,null || true->1false || null,null || false,null || null->nullFor the "and" operator:
true && null,null && true,null && null->nullfalse && null,null && false->0To revert to the default behavior present in Druid 27 and older,
druid.expressions.useStrictBooleansmay be set tofalsein the runtime.properties of cluster configuration.This PR has: