-
Notifications
You must be signed in to change notification settings - Fork 3.8k
first/last aggregators and nulls #9161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fd12b66
1f0cd18
9209421
defc3a7
ab2c73b
7960b3c
94c92b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -136,7 +136,7 @@ Note that queries with first/last aggregators on a segment created with rollup e | |
|
|
||
| #### `doubleFirst` aggregator | ||
|
|
||
| `doubleFirst` computes the metric value with the minimum timestamp or 0 if no row exist | ||
| `doubleFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you referring to this property - https://github.com/apache/druid/blob/master/docs/configuration/index.md#sql-compatible-null-handling ? Would be nice to link to the configuration here. nit: I'd re-phrase slightly |
||
|
|
||
| ```json | ||
| { | ||
|
|
@@ -148,7 +148,7 @@ Note that queries with first/last aggregators on a segment created with rollup e | |
|
|
||
| #### `doubleLast` aggregator | ||
|
|
||
| `doubleLast` computes the metric value with the maximum timestamp or 0 if no row exist | ||
| `doubleLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist | ||
|
|
||
| ```json | ||
| { | ||
|
|
@@ -160,7 +160,7 @@ Note that queries with first/last aggregators on a segment created with rollup e | |
|
|
||
| #### `floatFirst` aggregator | ||
|
|
||
| `floatFirst` computes the metric value with the minimum timestamp or 0 if no row exist | ||
| `floatFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist | ||
|
|
||
| ```json | ||
| { | ||
|
|
@@ -172,7 +172,7 @@ Note that queries with first/last aggregators on a segment created with rollup e | |
|
|
||
| #### `floatLast` aggregator | ||
|
|
||
| `floatLast` computes the metric value with the maximum timestamp or 0 if no row exist | ||
| `floatLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist | ||
|
|
||
| ```json | ||
| { | ||
|
|
@@ -184,7 +184,7 @@ Note that queries with first/last aggregators on a segment created with rollup e | |
|
|
||
| #### `longFirst` aggregator | ||
|
|
||
| `longFirst` computes the metric value with the minimum timestamp or 0 if no row exist | ||
| `longFirst` computes the metric value with the minimum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist | ||
|
|
||
| ```json | ||
| { | ||
|
|
@@ -196,7 +196,7 @@ Note that queries with first/last aggregators on a segment created with rollup e | |
|
|
||
| #### `longLast` aggregator | ||
|
|
||
| `longLast` computes the metric value with the maximum timestamp or 0 if no row exist | ||
| `longLast` computes the metric value with the maximum timestamp or 0 in default mode or `null` in SQL compatible mode if no row exist | ||
|
|
||
| ```json | ||
| { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,42 +20,29 @@ | |
| package org.apache.druid.query.aggregation.first; | ||
|
|
||
| import org.apache.druid.collections.SerializablePair; | ||
| import org.apache.druid.query.aggregation.Aggregator; | ||
| import org.apache.druid.segment.BaseDoubleColumnValueSelector; | ||
| import org.apache.druid.segment.BaseLongColumnValueSelector; | ||
|
|
||
| public class DoubleFirstAggregator implements Aggregator | ||
| public class DoubleFirstAggregator extends NumericFirstAggregator<BaseDoubleColumnValueSelector> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice abstraction! 🎉 note to self: can the get call be abstracted into the base class?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not without boxing the primitive
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, I guess that is going to happen anyway in making the pair.. so I guess maybe the on heap version of get could be shared, but not really possible for the buffer aggregator. |
||
| { | ||
|
|
||
| private final BaseDoubleColumnValueSelector valueSelector; | ||
| private final BaseLongColumnValueSelector timeSelector; | ||
|
|
||
| protected long firstTime; | ||
| protected double firstValue; | ||
| double firstValue; | ||
|
|
||
| public DoubleFirstAggregator(BaseLongColumnValueSelector timeSelector, BaseDoubleColumnValueSelector valueSelector) | ||
| { | ||
| this.valueSelector = valueSelector; | ||
| this.timeSelector = timeSelector; | ||
|
|
||
| firstTime = Long.MAX_VALUE; | ||
| super(timeSelector, valueSelector); | ||
| firstValue = 0; | ||
| } | ||
|
|
||
| @Override | ||
| public void aggregate() | ||
| void setCurrentValue() | ||
| { | ||
| long time = timeSelector.getLong(); | ||
| if (time < firstTime) { | ||
| firstTime = time; | ||
| firstValue = valueSelector.getDouble(); | ||
| } | ||
| firstValue = valueSelector.getDouble(); | ||
| } | ||
|
|
||
| @Override | ||
| public Object get() | ||
| { | ||
| return new SerializablePair<>(firstTime, firstValue); | ||
| return new SerializablePair<>(firstTime, rhsNull ? null : firstValue); | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -75,11 +62,5 @@ public long getLong() | |
| { | ||
| return (long) firstValue; | ||
| } | ||
|
|
||
| @Override | ||
| public void close() | ||
| { | ||
|
|
||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,10 +30,11 @@ | |
| import org.apache.druid.query.aggregation.AggregatorFactory; | ||
| import org.apache.druid.query.aggregation.AggregatorUtil; | ||
| import org.apache.druid.query.aggregation.BufferAggregator; | ||
| import org.apache.druid.query.aggregation.NullableNumericAggregatorFactory; | ||
| import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector; | ||
| import org.apache.druid.segment.BaseDoubleColumnValueSelector; | ||
| import org.apache.druid.segment.ColumnSelectorFactory; | ||
| import org.apache.druid.segment.ColumnValueSelector; | ||
| import org.apache.druid.segment.NilColumnValueSelector; | ||
| import org.apache.druid.segment.column.ColumnHolder; | ||
|
|
||
| import javax.annotation.Nullable; | ||
|
|
@@ -45,10 +46,34 @@ | |
| import java.util.Map; | ||
| import java.util.Objects; | ||
|
|
||
| public class DoubleFirstAggregatorFactory extends NullableNumericAggregatorFactory<ColumnValueSelector> | ||
| public class DoubleFirstAggregatorFactory extends AggregatorFactory | ||
| { | ||
| private static final Aggregator NIL_AGGREGATOR = new DoubleFirstAggregator( | ||
| NilColumnValueSelector.instance(), | ||
| NilColumnValueSelector.instance() | ||
| ) | ||
| { | ||
| @Override | ||
| public void aggregate() | ||
| { | ||
| // no-op | ||
| } | ||
| }; | ||
|
|
||
| private static final BufferAggregator NIL_BUFFER_AGGREGATOR = new DoubleFirstBufferAggregator( | ||
| NilColumnValueSelector.instance(), | ||
| NilColumnValueSelector.instance() | ||
| ) | ||
| { | ||
| @Override | ||
| public void aggregate(ByteBuffer buf, int position) | ||
| { | ||
| // no-op | ||
| } | ||
| }; | ||
|
|
||
| public static final Comparator<SerializablePair<Long, Double>> VALUE_COMPARATOR = | ||
| Comparator.comparingDouble(o -> o.rhs); | ||
| SerializablePair.createNullHandlingComparator(Double::compare, true); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It took me a long time to try and figure out what the comparator was used for. I got wrapped up in the fact that the aggregator was meant compare timestamps, that I didn't realize this was for ordering. I think a javadoc on # |
||
|
|
||
| private final String fieldName; | ||
| private final String name; | ||
|
|
@@ -69,24 +94,31 @@ public DoubleFirstAggregatorFactory( | |
| } | ||
|
|
||
| @Override | ||
| protected ColumnValueSelector selector(ColumnSelectorFactory metricFactory) | ||
| public Aggregator factorize(ColumnSelectorFactory metricFactory) | ||
| { | ||
| return metricFactory.makeColumnValueSelector(fieldName); | ||
| } | ||
|
|
||
| @Override | ||
| protected Aggregator factorize(ColumnSelectorFactory metricFactory, ColumnValueSelector selector) | ||
| { | ||
| return new DoubleFirstAggregator(metricFactory.makeColumnValueSelector(ColumnHolder.TIME_COLUMN_NAME), selector); | ||
| final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName); | ||
| if (valueSelector instanceof NilColumnValueSelector) { | ||
| return NIL_AGGREGATOR; | ||
| } else { | ||
| return new DoubleFirstAggregator( | ||
| metricFactory.makeColumnValueSelector(ColumnHolder.TIME_COLUMN_NAME), | ||
| valueSelector | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| protected BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory, ColumnValueSelector selector) | ||
| public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory) | ||
| { | ||
| return new DoubleFirstBufferAggregator( | ||
| metricFactory.makeColumnValueSelector(ColumnHolder.TIME_COLUMN_NAME), | ||
| selector | ||
| ); | ||
| final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName); | ||
| if (valueSelector instanceof NilColumnValueSelector) { | ||
| return NIL_BUFFER_AGGREGATOR; | ||
| } else { | ||
| return new DoubleFirstBufferAggregator( | ||
| metricFactory.makeColumnValueSelector(ColumnHolder.TIME_COLUMN_NAME), | ||
| valueSelector | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -126,35 +158,54 @@ public AggregatorFactory getCombiningFactory() | |
| return new DoubleFirstAggregatorFactory(name, name) | ||
| { | ||
| @Override | ||
| public Aggregator factorize(ColumnSelectorFactory metricFactory, ColumnValueSelector selector) | ||
| public Aggregator factorize(ColumnSelectorFactory metricFactory) | ||
| { | ||
| final ColumnValueSelector<SerializablePair<Long, Double>> selector = | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on javadocs in in which case
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is for combining, the selector will be of the serialized pair complex objects, which will not be null. |
||
| metricFactory.makeColumnValueSelector(name); | ||
| return new DoubleFirstAggregator(null, null) | ||
| { | ||
| @Override | ||
| public void aggregate() | ||
| { | ||
| SerializablePair<Long, Double> pair = (SerializablePair<Long, Double>) selector.getObject(); | ||
| SerializablePair<Long, Double> pair = selector.getObject(); | ||
| if (pair.lhs < firstTime) { | ||
| firstTime = pair.lhs; | ||
| firstValue = pair.rhs; | ||
| if (pair.rhs != null) { | ||
| firstValue = pair.rhs; | ||
| rhsNull = false; | ||
| } else { | ||
| rhsNull = true; | ||
| } | ||
| } | ||
| } | ||
| }; | ||
| } | ||
|
|
||
| @Override | ||
| public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory, ColumnValueSelector selector) | ||
| public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory) | ||
| { | ||
| final ColumnValueSelector<SerializablePair<Long, Double>> selector = | ||
| metricFactory.makeColumnValueSelector(name); | ||
| return new DoubleFirstBufferAggregator(null, null) | ||
| { | ||
| @Override | ||
| public void putValue(ByteBuffer buf, int position) | ||
| { | ||
| SerializablePair<Long, Double> pair = selector.getObject(); | ||
| buf.putDouble(position, pair.rhs); | ||
| } | ||
|
|
||
| @Override | ||
| public void aggregate(ByteBuffer buf, int position) | ||
| { | ||
| SerializablePair<Long, Double> pair = (SerializablePair<Long, Double>) selector.getObject(); | ||
| long firstTime = buf.getLong(position); | ||
| if (pair.lhs < firstTime) { | ||
| buf.putLong(position, pair.lhs); | ||
| buf.putDouble(position + Long.BYTES, pair.rhs); | ||
| if (pair.rhs != null) { | ||
| updateTimeWithValue(buf, position, pair.lhs); | ||
| } else { | ||
| updateTimeWithNull(buf, position, pair.lhs); | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -178,6 +229,9 @@ public List<AggregatorFactory> getRequiredColumns() | |
| public Object deserialize(Object object) | ||
| { | ||
| Map map = (Map) object; | ||
| if (map.get("rhs") == null) { | ||
| return new SerializablePair<>(((Number) map.get("lhs")).longValue(), null); | ||
| } | ||
| return new SerializablePair<>(((Number) map.get("lhs")).longValue(), ((Number) map.get("rhs")).doubleValue()); | ||
| } | ||
|
|
||
|
|
@@ -221,16 +275,15 @@ public byte[] getCacheKey() | |
| @Override | ||
| public String getTypeName() | ||
| { | ||
| if (storeDoubleAsFloat) { | ||
| return "float"; | ||
| } | ||
| return "double"; | ||
| // if we don't pretend to be a primitive, group by v1 gets sad and doesn't work because no complex type serde | ||
| return storeDoubleAsFloat ? "float" : "double"; | ||
| } | ||
|
|
||
| @Override | ||
| public int getMaxIntermediateSize() | ||
| { | ||
| return Long.BYTES + Double.BYTES; | ||
| // timestamp, is null, value | ||
| return Long.BYTES + Byte.BYTES + Double.BYTES; | ||
| } | ||
|
|
||
| @Override | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is missing unit tests? Also javadocs since this is a utility that would be used by many other classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by calcite query tests that order by the first/last aggregators; afaik SerializedPair is only used by first/last despite it's generic name, to store a timestamp and value.