Part 2 of changes for SQL Compatible Null Handling#5958
Part 2 of changes for SQL Compatible Null Handling#5958b-slim merged 23 commits intoapache:masterfrom
Conversation
|
@leventov : Created a new PR, please review. |
|
@nishantmonu51 as far as I can see you didn't address all comments that I left in the last batch in the previous PR, please answer to them there (if you are able to open that PR) |
93f41e8 to
6a6d35a
Compare
|
@leventov: last batch of comments was re lines more than 120 cols, add line breaks and reformatted code. Also replied to your comments on previous PR, If you have any more comments can you please add them here ? |
|
Since this PR is tagged as |
|
@nishantmonu51 is this the last PR for #4349? Would you let me know which are not done yet in #5278 and this PR? |
|
Some comments and questions for the PR description.
This should be moved to #5278.
I haven't checked through the whole patch, so am wondering this is true only for the SQL layer or the entire Druid query processing system. |
jihoonson
left a comment
There was a problem hiding this comment.
This is a great work. Thanks @nishantmonu51!
Reviewed up to LoadingLookup and still reviewing.
| return value; | ||
| } | ||
|
|
||
| public boolean isNull() |
There was a problem hiding this comment.
nit: I think it's better to call this method rather than doing value() == null in call sites.
There was a problem hiding this comment.
this method is replaced with isNumericNull as that is what we need to check for Expressions.
isNumericNull will check whether the primitive long/float/double equivalent is null or not.
It is possible for an expression to have a non-null String value but it can return null when parsed as a primitive long/float/double.
| if (leftVal.type() == ExprType.STRING && rightVal.type() == ExprType.STRING) { | ||
| return evalString(leftVal.asString(), rightVal.asString()); | ||
| } else if (leftVal.type() == ExprType.LONG && rightVal.type() == ExprType.LONG) { | ||
| if (NullHandling.sqlCompatible() && (leftVal.isNumericNull() || rightVal.isNumericNull())) { |
There was a problem hiding this comment.
Wonder why we check this here instead of changing evalLong() or evalDouble to return a nullable Long or Double, respectively. Is this to avoid boxing/unboxing? If so, I wonder how much it affects to the performance.
There was a problem hiding this comment.
Not sure about how much performance effect boxing/unboxing will have, in this form atleast the change is on safer side and does not affect performance for non-sql case.
We can run more benchmarks in a follow up PR and change this api to return boxed values if the performance diff is not much.
| return theLong == null ? 0L : theLong; | ||
| Number number = asNumber(); | ||
| if (number == null) { | ||
| assert NullHandling.replaceWithDefault(); |
There was a problem hiding this comment.
How about checking and throwing an Exception rather than a assert statement? The assert statement wouldn't be super helpful for users if it happens. Same for all other assert statements in this class.
There was a problem hiding this comment.
IIRC, I was actually doing that before, but changed this to assert based on discussions on previous PRs.
There was a problem hiding this comment.
So, would you point me out where the discussion is? I guess it's in #5278?
There was a problem hiding this comment.
unable to find the discussion among lots of review comments.
I believe the intent here was to catch any coding errors/places where we may have missed handling nulls properly.
@leventov: can probably add more context here.
There was a problem hiding this comment.
| public void aggregate() | ||
| { | ||
| histogram.offer(selector.getFloat()); | ||
| if (NullHandling.replaceWithDefault() || !selector.isNull()) { |
There was a problem hiding this comment.
What does NullHandling.replaceWithDefault() mean here? Does that mean nulls have replaced with some default value at ingestion time? If so, I guess select.isNull() always return false and thus this check can be simplified into if (!selector.isNull()).
There was a problem hiding this comment.
this means whether nulls should be replaced with default value.
In case of ExpressionColumnValueSelector isNull will compute the expression and then give the result, the check is there to not have any performance impact of calling isNull for default case.
There was a problem hiding this comment.
Got it. Thanks. Would you add a comment here?
| public String apply(@Nullable final String key) | ||
| { | ||
| if (key == null) { | ||
| String keyEquivalent = NullHandling.nullToEmptyIfNeeded(key); |
There was a problem hiding this comment.
I thought this kind of null value conversion is needed only when retrieving values from the storage layer (like in ColumnValueSelector), so that we expect key is already converted to some default value if needed. But, apparently it seems not. Would you explain why this is needed after retrieving values?
There was a problem hiding this comment.
this is to preserve backwards compatibility,
previously, nulls and empty strings were considered same.
and if a looking has mapping "" -> "someval" then all nulls/empty strings will be resolved to "someval".
|
Thanks @jihoonson for helping with the review and taking this PR forward. |
| import java.util.Objects; | ||
|
|
||
| public class FloatFirstAggregatorFactory extends AggregatorFactory | ||
| public class FloatFirstAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
not a blocker for this PR, would like to do it a follow up PR - #6039
| Chronology chronology = timeZone == null ? ISOChronology.getInstanceUTC() : ISOChronology.getInstance(timeZone); | ||
| final Object value = originArg.eval(bindings).value(); | ||
| origin = value != null ? new DateTime(value, chronology) : null; | ||
| if (value instanceof String && StringUtils.isEmpty((String) value)) { |
There was a problem hiding this comment.
#6040
Will fix this line in this PR and prohibit the api in future PR.
| public DimFilter optimize() | ||
| { | ||
| return new InDimFilter(dimension, ImmutableList.of(value), extractionFn).optimize(); | ||
| return new InDimFilter(dimension, Arrays.asList(value), extractionFn).optimize(); |
| // converted to strings. We should also add functions | ||
| // for these and modify ColumnComparisonFilter to handle | ||
| // comparing Long and Float columns to eachother. | ||
| // Returns null when the underlying Long/Float value is null. |
| private Ordering<Row> metricOrdering(final String column, final Comparator comparator) | ||
| { | ||
| return Ordering.from(Comparator.comparing((Row row) -> row.getRaw(column), Comparator.nullsLast(comparator))); | ||
| if (NullHandling.sqlCompatible()) { |
There was a problem hiding this comment.
Need a comment explaining this.
| @@ -93,7 +93,7 @@ public String getAnnouncementPath(String listenerName) | |||
| { | |||
| return ZKPaths.makePath( | |||
| getListenersPath(), Preconditions.checkNotNull( | |||
There was a problem hiding this comment.
Each argument should be on a separate line
| public static String escapeStringLiteral(final String s) | ||
| { | ||
| Preconditions.checkNotNull(s); | ||
| if (s == null) { |
There was a problem hiding this comment.
This check immediately after Preconditions.checkNotNull?
There was a problem hiding this comment.
removed the == null check, it was not needed.
| final Object value = row[i]; | ||
| final String stringValue = value != null ? String.valueOf(value) : ""; | ||
| if (value == null) { | ||
| // NULLS are not supposed to match NULLs in a join. So ignore them. |
| literal = rexBuilder.makeLiteral(exprResult.asBoolean(), constExp.getType(), true); | ||
| } else if (sqlTypeName == SqlTypeName.DATE) { | ||
| if (!constExp.getType().isNullable() && exprResult.isNull()) { | ||
| if (!constExp.getType().isNullable() && exprResult.isNumericNull()) { |
There was a problem hiding this comment.
It is possible for an expression to have a non-null String value but it can return null when parsed as a primitive long/float/double.
ExprEval.isNumericNull checks whether the parsed primitive value is null or not.
Added comment to code.
| getCluster().getRexBuilder().makeLiteral(value) | ||
| ) | ||
| ); | ||
| // NULLS are not supposed to match NULLs in a join. So ignore them. |
|
@leventov @jihoonson : Thanks for the review comments, addressed them. |
jihoonson
left a comment
There was a problem hiding this comment.
@nishantmonu51 I finished my first review. You did a really nice job. I would like to thank you.
Two more comments:
- There are some checkstyle errors including wrong license. Please fix them.
- Would you please check my previous comments?
| if (NullHandling.sqlCompatible()) { | ||
| // empty string should also have same behavior | ||
| Assert.assertEquals(null, loadingLookup.apply("")); | ||
| } |
There was a problem hiding this comment.
Can be Assert.assertNull().
Also, do we need to test for non-sql-compatible mode? Like:
Assert.assertNull(loadingLookup.apply(null));
if (NullHandling.sqlCompatible()) {
// empty string should also have same behavior
Assert.assertNull(loadingLookup.apply(""));
} else {
Assert.assertEquals("", loadingLookup.apply(""));
}| if (NullHandling.sqlCompatible()) { | ||
| Assert.assertEquals(Collections.emptyList(), loadingLookup.unapply(null)); | ||
| } else { | ||
| Assert.assertEquals(null, loadingLookup.unapply(null)); |
| Assert.assertEquals(ImmutableMap.of("value", Lists.newArrayList("key")), loadingLookup.unapplyAll(ImmutableSet.<String>of("value"))); | ||
| Assert.assertEquals( | ||
| ImmutableMap.of("value", Lists.newArrayList("key")), | ||
| loadingLookup.unapplyAll(ImmutableSet.<String>of("value")) |
There was a problem hiding this comment.
Would you please remove the unnecessary type argument?
| } | ||
|
|
||
| public long sumMetric(final Task task, final DimFilter filter, final String metric) | ||
| public Long sumMetric(final Task task, final DimFilter filter, final String metric) |
There was a problem hiding this comment.
Please annotate @Nullable.
| } | ||
|
|
||
| public long sumMetric(final Task task, final DimFilter filter, final String metric) | ||
| public Long sumMetric(final Task task, final DimFilter filter, final String metric) |
There was a problem hiding this comment.
Please annotate @Nullable.
| byte[] extractionFnBytes = extractionFn == null ? new byte[0] : extractionFn.getCacheKey(); | ||
|
|
||
| ByteBuffer filterCacheKey = ByteBuffer.allocate(3 | ||
| ByteBuffer filterCacheKey = ByteBuffer.allocate(5 |
There was a problem hiding this comment.
nit: can we use CacheKeyBuilder here?
| byte[] extractionFnBytes = extractionFn == null ? new byte[0] : extractionFn.getCacheKey(); | ||
|
|
||
| return ByteBuffer.allocate(3 + dimensionBytes.length + valueBytes.length + extractionFnBytes.length) | ||
| return ByteBuffer.allocate(5 + dimensionBytes.length + valueBytes.length + extractionFnBytes.length) |
There was a problem hiding this comment.
nit: can we use CacheKeyBuilder here?
| } | ||
|
|
||
| // This class is only used when SQL compatible null handling is enabled. | ||
| private class NullableRowBasedKeySerdeHelper implements RowBasedKeySerdeHelper |
There was a problem hiding this comment.
Please add a comment about the memory layout.
| if (isSimpleEquals()) { | ||
| // Verify that dimension equals prefix. | ||
| return ImmutableList.of(selector.getBitmapIndex(dimension, likeMatcher.getPrefix())); | ||
| return ImmutableList.of(selector.getBitmapIndex( |
There was a problem hiding this comment.
format
return ImmutableList.of(
selector.getBitmapIndex(
dimension,
NullHandling.emptyToNullIfNeeded(likeMatcher.getPrefix())
)
);| return DruidExpression.fromFunctionCall( | ||
| "timestamp_ceil", | ||
| ImmutableList.of( | ||
| Arrays.asList( |
There was a problem hiding this comment.
Maybe Stream.of() is better.
There was a problem hiding this comment.
Changed because ImmutableList doesn't support nulls, I suppose. Anyway, a parallel PR #5980 handles this in the whole codebase.
leventov
left a comment
There was a problem hiding this comment.
LGTM besides @jihoonson's comments, and if you also assign #6039 to yourself and do that after this PR.
Removed it from this PR.
it applies to whole query processing. updated PR description to add a note. |
|
@jihoonson : For the pending work, I have labelled the issues with Area : Null Handling Have updated the PR based on your comments, please check if i missed any comments. |
|
Thanks @nishantmonu51. Would you please check this comment: #5958 (comment)? Also, I'm not sure what have done in #5278 + this PR and not. I think at least |
this actually works now for Expressions, See BinaryEvalOpExprBase.eval, if any of the values being compared is null it returns null. |
|
@jihoonson In terms of functionality, After this PR, all things mentioned in the proposal except following should be working -
|
|
@jihoonson any more comments here ? |
|
@nishantmonu51 awesome! Thanks for clarifying. Everything looks good except the travis failure. Would you check it please? |
| .andReturn(Collections.singletonList("key")) | ||
| .once(); | ||
| EasyMock.replay(reverseLookupCache); | ||
| <<<<<<< HEAD |
|
@jihoonson: travis is passing now. |
|
@leventov @jihoonson @b-slim Thanks for the review. |
Part 2 of changes for SQL Compatible Null Handling -
This PR contains changes to improve handling of null values in druid by treating nulls as missing values which don’t actually exist. This will make it more compatible to SQL standard and help with integration with other existing BI systems which support ODBC/JDBC. (Detailed description in proposal - #4349)
NOTE - NULL and empty strings are treated differently in entire druid query processing system.
Nullability for Aggregators/Metrics
As per the current implementation, most aggregators are coded to replace null values with default e.g. sum treats them as zeroes, min treats them as positive infinity, etc.
To match the new semantics and make aggregators nullable, where if an aggregator encounters only the null values the result will be null value following changes are made -
Math Expressions Null Handling
Filtering on Null values
Changes to Druid build-in SQL layer
Backwards Compatibility
Testing
Pending work -
https://github.com/apache/incubator-druid/labels/Area%20-%20Null%20Handling