Math expressions can run into nulls in a few situations:
- Missing columns in expressions used for aggregations like
SUM(log(x + y)). Some segments may have column x but not y.
- Missing columns in expressions used for grouping or filtering like
WHERE sin(x + y) = 1.
- Null values in string columns, i.e. does
strlen(x) return 0 or null if x is null. This can influence the value of aggregations like min(strlen(x)).
- Missing string columns, i.e. does
strlen(x) return 0 or null if the column x is missing in a segment.
We should nail down how to handle this and document that.
For (1) I think it makes sense to treat the whole expression as null if any component is null. Then the aggregation function should do something reasonable with that (such as implemented in #3627). If users want something other than this behavior, they can use nvl or if to assign default values to specific identifiers.
For (2) I think we can also treat the whole lhs expression as null and basically have the = always be false.
For (3) and (4) I think it makes sense for them to have the same behavior, and for that behavior to be that null string values don't actually exist. The identifiers should behave like empty strings. This is different from how identifiers we expect to be numeric work (which behave like nulls). That may or may not be fine?
Math expressions can run into nulls in a few situations:
SUM(log(x + y)). Some segments may have column x but not y.WHERE sin(x + y) = 1.strlen(x)return0ornullif x is null. This can influence the value of aggregations likemin(strlen(x)).strlen(x)return0ornullif the column x is missing in a segment.We should nail down how to handle this and document that.
For (1) I think it makes sense to treat the whole expression as null if any component is null. Then the aggregation function should do something reasonable with that (such as implemented in #3627). If users want something other than this behavior, they can use
nvlorifto assign default values to specific identifiers.For (2) I think we can also treat the whole lhs expression as null and basically have the
=always be false.For (3) and (4) I think it makes sense for them to have the same behavior, and for that behavior to be that null string values don't actually exist. The identifiers should behave like empty strings. This is different from how identifiers we expect to be numeric work (which behave like nulls). That may or may not be fine?