Skip to content

fix a bugs related to SQL type inference return type nullability#11327

Merged
gianm merged 5 commits intoapache:masterfrom
clintropolis:fix-sql-null-handling-return-types
Jun 15, 2021
Merged

fix a bugs related to SQL type inference return type nullability#11327
gianm merged 5 commits intoapache:masterfrom
clintropolis:fix-sql-null-handling-return-types

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Jun 3, 2021

Description

This PR fixes the return type inference of several SQL operators which were incorrectly reporting their return type as not nullable, when in fact this was dependent on the arguments to the operator. A new method, returnTypeCascadeNullable, has been added to OperatorConversions builder type, which allows defining operators with a return type that is only nullable if any of the operands are nullable, which is many of them. I only evaluated all callers of returnTypeNonNull, and didn't look closely if some of the returnTypeNullable should be switched to this new method.

List of impacted functions:

  • ARRAY_LENGTH/MV_LENGTH
  • ARRAY_OFFSET_OF/MV_OFFSET_OF
  • ARRAY_ORDINAL_OF/MV_ORDINAL_OF
  • ARRAY_TO_STRING/MV_TO_STRING
  • BTRIM
  • DATE_TRUNC
  • LPAD
  • LTRIM
  • LEFT
  • MILLIS_TO_TIMESTAMP
  • PARSE_LONG
  • RPAD
  • RTRIM
  • REPEAT
  • REVERSE
  • RIGHT
  • STRPOS
  • TEXTCAT
  • TIME_CEIL
  • TIME_EXTRACT
  • TIME_FLOOR
  • TIME_FORMAT
  • TIME_SHIFT
  • TIMESTAMP_TO_MILLIS

This fixes bugs like in the test case added to CalciteQueryTests, which covers some of the functions which have been fixed and illustrates the fix (since these operators were previously marked as non-null, the count aggregator would be effectively translated to count(*)). I am unsure of what other bugs might have been happening besides this, but it is quite possible this PR fixes a handful of other issues with null handing in SQL compatible mode since some inappropriate optimizations might have been happening.

Along the way, I also fixed null handling bugs in repeat which would've had a null pointer exception, and timestamp_shift which would ignore numeric nulls in SQL compatible mode and just shift.


This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Could you take a look at ConcatOperatorConversion too? I think it should also be "cascade nullable".

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jun 14, 2021

Generally LGTM. Could you take a look at ConcatOperatorConversion too? I think it should also be "cascade nullable".

Btw, surveying what else is out there:

Today, our ConcatFunc is implemented to return null if any argument is null, and we use that ConcatFunc for both CONCAT and ||. I think in light of what other DBs do, this is fine, so I'd just suggest updating the ConcatOperatorConversion and the docs appropriately.

@clintropolis
Copy link
Copy Markdown
Member Author

I think in light of what other DBs do, this is fine, so I'd just suggest updating the ConcatOperatorConversion and the docs appropriately.

I've updated ConcatOperatorConversion, missed that one because it wasn't using the builder.

As far as the docs go, I think it would be worth trying to fill out the null handling of all functions, but I would maybe rather do this as a follow-up since there are a lot of them if that is ok. I imagine something similar to what #11188 added for aggregator functions, but I'm not yet sure if that format makes the most sense for the other functions, or if there is a more concise way to describe general behavior unless otherwise specified, need to think about it a bit.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jun 15, 2021

As far as the docs go, I think it would be worth trying to fill out the null handling of all functions, but I would maybe rather do this as a follow-up since there are a lot of them if that is ok. I imagine something similar to what #11188 added for aggregator functions, but I'm not yet sure if that format makes the most sense for the other functions, or if there is a more concise way to describe general behavior unless otherwise specified, need to think about it a bit.

I think the way PostgreSQL does it is reasonable for a single-page doc style: https://www.postgresql.org/docs/13/functions-string.html. It describes null handling behavior only for functions where it's non-obvious (like concat) or where there's something special going on (like quote_literal and quote_nullable). It keeps the noise down.

One day, we might want to consider a multi-page doc style, though, with a whole page for each function. Then we can go into all the details, give usage examples, etc. Microsoft docs are like that: https://docs.microsoft.com/en-us/sql/t-sql/functions/concat-transact-sql?view=sql-server-ver15. But I wouldn't worry about that as part of the null handling documentation stuff.

@gianm gianm merged commit bfbd7ec into apache:master Jun 15, 2021
@clintropolis clintropolis deleted the fix-sql-null-handling-return-types branch June 15, 2021 19:44
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants