fix a bugs related to SQL type inference return type nullability#11327
fix a bugs related to SQL type inference return type nullability#11327gianm merged 5 commits intoapache:masterfrom
Conversation
gianm
left a comment
There was a problem hiding this comment.
Generally LGTM. Could you take a look at ConcatOperatorConversion too? I think it should also be "cascade nullable".
Btw, surveying what else is out there:
Today, our ConcatFunc is implemented to return null if any argument is null, and we use that ConcatFunc for both CONCAT and |
I've updated As far as the docs go, I think it would be worth trying to fill out the null handling of all functions, but I would maybe rather do this as a follow-up since there are a lot of them if that is ok. I imagine something similar to what #11188 added for aggregator functions, but I'm not yet sure if that format makes the most sense for the other functions, or if there is a more concise way to describe general behavior unless otherwise specified, need to think about it a bit. |
I think the way PostgreSQL does it is reasonable for a single-page doc style: https://www.postgresql.org/docs/13/functions-string.html. It describes null handling behavior only for functions where it's non-obvious (like concat) or where there's something special going on (like quote_literal and quote_nullable). It keeps the noise down. One day, we might want to consider a multi-page doc style, though, with a whole page for each function. Then we can go into all the details, give usage examples, etc. Microsoft docs are like that: https://docs.microsoft.com/en-us/sql/t-sql/functions/concat-transact-sql?view=sql-server-ver15. But I wouldn't worry about that as part of the null handling documentation stuff. |
Description
This PR fixes the return type inference of several SQL operators which were incorrectly reporting their return type as not nullable, when in fact this was dependent on the arguments to the operator. A new method,
returnTypeCascadeNullable, has been added toOperatorConversionsbuilder type, which allows defining operators with a return type that is only nullable if any of the operands are nullable, which is many of them. I only evaluated all callers ofreturnTypeNonNull, and didn't look closely if some of thereturnTypeNullableshould be switched to this new method.List of impacted functions:
ARRAY_LENGTH/MV_LENGTHARRAY_OFFSET_OF/MV_OFFSET_OFARRAY_ORDINAL_OF/MV_ORDINAL_OFARRAY_TO_STRING/MV_TO_STRINGBTRIMDATE_TRUNCLPADLTRIMLEFTMILLIS_TO_TIMESTAMPPARSE_LONGRPADRTRIMREPEATREVERSERIGHTSTRPOSTEXTCATTIME_CEILTIME_EXTRACTTIME_FLOORTIME_FORMATTIME_SHIFTTIMESTAMP_TO_MILLISThis fixes bugs like in the test case added to
CalciteQueryTests, which covers some of the functions which have been fixed and illustrates the fix (since these operators were previously marked as non-null, the count aggregator would be effectively translated tocount(*)). I am unsure of what other bugs might have been happening besides this, but it is quite possible this PR fixes a handful of other issues with null handing in SQL compatible mode since some inappropriate optimizations might have been happening.Along the way, I also fixed null handling bugs in
repeatwhich would've had a null pointer exception, andtimestamp_shiftwhich would ignore numeric nulls in SQL compatible mode and just shift.This PR has: