Skip to content

Math expression null handling #3645

@gianm

Description

@gianm

Math expressions can run into nulls in a few situations:

  1. Missing columns in expressions used for aggregations like SUM(log(x + y)). Some segments may have column x but not y.
  2. Missing columns in expressions used for grouping or filtering like WHERE sin(x + y) = 1.
  3. Null values in string columns, i.e. does strlen(x) return 0 or null if x is null. This can influence the value of aggregations like min(strlen(x)).
  4. Missing string columns, i.e. does strlen(x) return 0 or null if the column x is missing in a segment.

We should nail down how to handle this and document that.

For (1) I think it makes sense to treat the whole expression as null if any component is null. Then the aggregation function should do something reasonable with that (such as implemented in #3627). If users want something other than this behavior, they can use nvl or if to assign default values to specific identifiers.

For (2) I think we can also treat the whole lhs expression as null and basically have the = always be false.

For (3) and (4) I think it makes sense for them to have the same behavior, and for that behavior to be that null string values don't actually exist. The identifiers should behave like empty strings. This is different from how identifiers we expect to be numeric work (which behave like nulls). That may or may not be fine?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions