SQL compatible Null Handling Part 2 - Processing Layer and Druid-SQL changes#5452
SQL compatible Null Handling Part 2 - Processing Layer and Druid-SQL changes#5452nishantmonu51 wants to merge 50 commits intoapache:masterfrom
Conversation
fix more tests more test fixes fix more tests more test fixes Fix more tests
fix format Fix test Fix tests more test fix more test failures fix formatting
1198058 to
a5b6f9b
Compare
|
Reviewed 45 of 167 files at r1. processing/src/main/java/io/druid/guice/GuiceInjectors.java, line 45 at r1 (raw file):
I thought this was done part of the previous patch? Comments from Reviewable |
| // Result of any Binary expressions is null if any of the argument is null. | ||
| // e.g "select null * 2 as c;" or "select null + 1 as c;" will return null as per Standard SQL spec. | ||
| if (NullHandling.sqlCompatible() && (leftVal.isNull() || rightVal.isNull())) { | ||
| if (NullHandling.sqlCompatible() && (leftVal.value() == null || rightVal.value() == null)) { |
There was a problem hiding this comment.
Reason is that for String ExprEval isNull method now checks that the expression can be parsed as a valid long/double/float or not instead of checking the nullability.
Have added a comment on the ExprEval isNull method that explains this change.
| public static boolean asBoolean(String x) | ||
| { | ||
| return !Strings.isNullOrEmpty(x) && Boolean.valueOf(x); | ||
| return !NullHandling.isNullOrEquivalent(x) && Boolean.valueOf(x); |
There was a problem hiding this comment.
Could you implement this thing: #5278 (comment) in this PR?
There was a problem hiding this comment.
sure, will add a checkstyle rule.
| @@ -127,7 +128,7 @@ private abstract static class NumericExprEval extends ExprEval<Number> | |||
|
|
|||
| private NumericExprEval(Number value) | |||
| @@ -99,6 +97,9 @@ public Object value() | |||
| return value; | |||
There was a problem hiding this comment.
This field should be annotated @Nullable.
| @@ -153,7 +154,7 @@ private static class DoubleExprEval extends NumericExprEval | |||
| { | |||
| private DoubleExprEval(Number value) | |||
There was a problem hiding this comment.
@Nullable. @nishantmonu51 could you please self-review this PR and fix all missing @Nullable annotations in this PR?
There was a problem hiding this comment.
IntelliJ usually highlights such places with yellow (e. g. when you return null from a method not annotated @Nullable, or check for non-null an argument not annotated @Nullable)
| return NullHandling.replaceWithDefault() ? aggregator : new NullableBufferAggregator(aggregator, selector); | ||
| } | ||
|
|
||
| protected abstract BufferAggregator factorizeBuffered( |
There was a problem hiding this comment.
Move below ABSTRACT METHODS BELOW
| * Implementations of {@link AggregatorFactory} which needs to Support Nullable Aggregations are encouraged | ||
| * to extend this class. | ||
| */ | ||
| public abstract class NullableAggregatorFactory extends AggregatorFactory |
| return getMaxIntermediateSize2() + (NullHandling.replaceWithDefault() ? 0 : Byte.BYTES); | ||
| } | ||
|
|
||
| // ---- ABSTRACT METHODS BELOW ------ |
There was a problem hiding this comment.
Please add docs to those methods, for extension writers.
| */ | ||
| public class NullableBufferAggregator implements BufferAggregator | ||
| { | ||
| private static final byte IS_NULL_BYTE = (byte) 1; |
There was a problem hiding this comment.
I would switch 1 and 0 values. NULL = 0, NOT_NULL = 1 is more natural, plus comparisons with 0 (done below) are cheaper.
| /** | ||
| * The result of a NullableBufferAggregator will be null if all the values to be aggregated are null values or no values are aggregated at all. | ||
| * If any of the value is non-null, the result would be the aggregated value of the delegate aggregator. | ||
| * Note that the delegate aggregator is not required to perform check for isNull on the columnValueSelector as only non-null values |
|
Reviewed 123 of 167 files at r1, 1 of 1 files at r2. processing/src/main/java/io/druid/query/aggregation/last/DoubleLastAggregatorFactory.java, line 154 at r2 (raw file):
why are we changing this? I know it is the same size but not really needed. processing/src/main/java/io/druid/query/filter/SelectorDimFilter.java, line 79 at r2 (raw file):
5 ? sql/src/main/java/io/druid/sql/calcite/planner/Calcites.java, line 108 at r2 (raw file):
this will throw an exception, is that what should be done? the old code seems to be fine with nulls? sql/src/main/java/io/druid/sql/calcite/planner/DruidOperatorTable.java, line 121 at r2 (raw file):
why not == null or != null ? Comments from Reviewable |
|
Review status: 128 of 194 files reviewed at latest revision, 29 unresolved discussions. processing/src/main/java/io/druid/query/aggregation/last/DoubleLastAggregatorFactory.java, line 154 at r2 (raw file): Previously, b-slim (Slim) wrote…
fixed. processing/src/main/java/io/druid/query/filter/SelectorDimFilter.java, line 79 at r2 (raw file): Previously, b-slim (Slim) wrote…
fixed. sql/src/main/java/io/druid/sql/calcite/planner/Calcites.java, line 108 at r2 (raw file): Previously, b-slim (Slim) wrote…
yes, we never expect null here now. sql/src/main/java/io/druid/sql/calcite/planner/DruidOperatorTable.java, line 121 at r2 (raw file): Previously, b-slim (Slim) wrote…
== and != are binary operators in SQL which when comparing nulls return false as per sql standard. Comments from Reviewable |
Will Add Unit tests in subsequent PR.
|
@leventov: Updated PR based on your comments, please check. |
… UTs * Extract NullableGroupBycolumnSelectorStrategy - only used when null handling is enabled, fix failing UTs * fix test after making extracting groupby changes * Set AWS region in travis
|
@leventov: Any more comments here ? |
|
@nishantmonu51 I haven't finished my review. I will continue the review soon |
| final DelimitedParser delegate = new DelimitedParser( | ||
| Strings.emptyToNull(delimiter), | ||
| Strings.emptyToNull(listDelimiter), | ||
| StringUtils.emptyToNullNonDruidDataString(delimiter), |
There was a problem hiding this comment.
Wrong indentation, and on the next line
| @Override | ||
| public String call() | ||
| { | ||
| // When SQL based null handling is disabled, |
| // Do some queries. | ||
| Assert.assertEquals(2000, sumMetric(task, null, "rows")); | ||
| Assert.assertEquals(2000, sumMetric(task, null, "met1")); | ||
| Assert.assertEquals(2000, sumMetric(task, null, "met1").longValue()); |
| // Do some queries. | ||
| Assert.assertEquals(3, sumMetric(task, null, "rows")); | ||
| Assert.assertEquals(3, sumMetric(task, null, "met1")); | ||
| Assert.assertEquals(3, sumMetric(task, null, "met1").longValue()); |
| * See io.druid.guice.NullHandlingModule for details. | ||
| * It does not take effect in all unit tests since we don't use Guice Injection. | ||
| * For tests default system property is supposed to be used only in tests | ||
| * default system property is supposed to be used only in tests |
There was a problem hiding this comment.
This doc comment has proper sentences (starting with capital and ending with dot), please follow this within a comment.
| @Override | ||
| public long getLong(ByteBuffer buf, int position) | ||
| { | ||
| if (isNull(buf, position)) { |
| @Override | ||
| public double getDouble(ByteBuffer buf, int position) | ||
| { | ||
| if (isNull(buf, position)) { |
| import java.util.Objects; | ||
|
|
||
| public abstract class SimpleDoubleAggregatorFactory extends AggregatorFactory | ||
| public abstract class SimpleDoubleAggregatorFactory extends NullableAggregatorFactory |
There was a problem hiding this comment.
If NullableAggregatorFactory is generified, it's going to be extends NullableAggregatorFactory<BaseDoubleColumnValueSelector>
| { | ||
| Assert.assertEquals(null, loadingLookup.apply(null)); | ||
| Assert.assertEquals(null, loadingLookup.apply("")); | ||
| if (!NullHandling.replaceWithDefault()) { |
There was a problem hiding this comment.
This is NullHandling.sqlCompatible()
| // SQL standard spec does not count null values, | ||
| // Skip counting null values when we are not replacing null with default value. | ||
| // A special value for null in case null handling is configured to use empty string for null. | ||
| if (!NullHandling.replaceWithDefault() && !hasNonNullValue && value != null) { |
|
@leventov: Please review. made NullableAggregatorFactory and NullableAggregatorCombiner generic and handled other comments. |
| import java.util.Objects; | ||
|
|
||
| public class FloatFirstAggregatorFactory extends NullableAggregatorFactory | ||
| public class FloatFirstAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
Why ColumnValueSelector rather than BaseFloatColumnValueSelector?
There was a problem hiding this comment.
It returns a SerializablePair<Long, Float> instead of simple Float
There was a problem hiding this comment.
The parameter of NullableAggregatorFactory is not the output selector type, it's the input selector type. So it should be BaseFloatColumnValueSelector for FloatFirstAggregatorFactory. The reason why this refactoring couldn't be done right now is that getCombiningFactory() returns an anonymous subclass of FloatFirstAggregatorFactory, that IMO wrong. It seems to me that it should be a separate class FloatFirstCombiningAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector<SerializablePair<Long, Float>>>
| import java.util.Objects; | ||
|
|
||
| public class DoubleFirstAggregatorFactory extends NullableAggregatorFactory | ||
| public class DoubleFirstAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
Why ColumnValueSelector rather than BaseDoubleColumnValueSelector?
There was a problem hiding this comment.
It returns a SerializablePair<Long, Double> instead of simple Double
| import java.util.Objects; | ||
|
|
||
| public class DoubleLastAggregatorFactory extends NullableAggregatorFactory | ||
| public class DoubleLastAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
It returns a Pair instead of primitive object.
| import java.util.Objects; | ||
|
|
||
| public class LongLastAggregatorFactory extends NullableAggregatorFactory | ||
| public class LongLastAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
It returns a Pair instead of primitive object
| import java.util.Objects; | ||
|
|
||
| public class FloatLastAggregatorFactory extends NullableAggregatorFactory | ||
| public class FloatLastAggregatorFactory extends NullableAggregatorFactory<ColumnValueSelector> |
There was a problem hiding this comment.
It returns a Pair instead of primitive object
|
|
||
| @Override | ||
| public void processValueFromGroupingKey( | ||
| GroupByColumnSelectorPlus selectorPlus, ByteBuffer key, Map<String, Object> resultMap, int keyBufferPosition |
There was a problem hiding this comment.
Separate lines, and below in this class
| if (metricVal == null || value == null) { | ||
| return metricVal == null && value == null; | ||
| } | ||
| return HavingSpecMetricComparator.compare(row, aggregationName, value, aggregators) == 0; |
There was a problem hiding this comment.
Could it optimize to not perform row.getRaw() twice?
| if (metricVal == null || value == null) { | ||
| return false; | ||
| } | ||
| return HavingSpecMetricComparator.compare(row, aggregationName, value, aggregators) > 0; |
| if (metricVal == null || value == null) { | ||
| return false; | ||
| } | ||
| return HavingSpecMetricComparator.compare(row, aggregationName, value, aggregators) < 0; |
| private Ordering<Row> metricOrdering(final String column, final Comparator comparator) | ||
| { | ||
| return Ordering.from(Comparator.comparing((Row row) -> row.getRaw(column), Comparator.nullsLast(comparator))); | ||
| return Ordering.from(Comparator.comparing((Row row) -> row.getRaw(column), Comparator.nullsFirst(comparator))); |
There was a problem hiding this comment.
in case of SQL compatibility we need nullsFirst, have modified to not change the behavior for non-sql compatible case to older one.
|
@leventov : handled comments, do you have any more comments ? |
|
@leventov : have you finished your review ? can this be merged ? |
| return ((Double) o).compareTo((Double) o1); | ||
| } | ||
| }; | ||
| private static final Comparator<Number> COMPARATOR = Comparator.nullsFirst(Comparator.comparingDouble(Number::doubleValue)); |
There was a problem hiding this comment.
Line longer than 120 cols. Might also be in other similar classes, I didn't check.
| import javax.annotation.Nullable; | ||
|
|
||
| /** | ||
| * The result of a NullableAggregator will be null if all the values to be aggregated are null values or no values are aggregated at all. |
| /** | ||
| * The result of a NullableAggregator will be null if all the values to be aggregated are null values or no values are aggregated at all. | ||
| * If any of the value is non-null, the result would be the aggregated value of the delegate aggregator. | ||
| * Note that the delegate aggregator is not required to perform check for {@link BaseNullableColumnValueSelector#isNull()} on the selector as only non-null values |
| ); | ||
|
|
||
| /** | ||
| * Creates an {@link AggregateCombiner} to fold rollup aggregation results from serveral "rows" of different indexes during |
|
@nishantmonu51 after addressing latest comments please squash all commits and create a new PR, it's impossible to work with this one anymore. |
|
Closing in favor of #5958 |
Part 2 of changes for SQL Compatible Null Handling -
This PR contains changes to improve handling of null values in druid by treating nulls as missing values which don’t actually exist. This will make it more compatible to SQL standard and help with integration with other existing BI systems which support ODBC/JDBC. (Detailed description in proposal - #4349)
Nullability for Aggregators/Metrics
As per the current implementation, most aggregators are coded to replace null values with default e.g. sum treats them as zeroes, min treats them as positive infinity, etc.
To match the new semantics and make aggregators nullable, where if an aggregator encounters only the null values the result will be null value following changes are made -
Math Expressions Null Handling
Filtering on Null values
Changes to Druid build-in SQL layer
Backwards Compatibility
Testing
This change is