Add pair Latest/Earliest aggregators#14195
Add pair Latest/Earliest aggregators#14195imply-cheddar wants to merge 1 commit intoapache:masterfrom
Conversation
The current Latest/Earliest aggregators all do finalization to pretend like they are dealing with their payload type instead of exposing that they are actually dealing with pairs. This adds a STRING_LATEST and STRING_EARLIEST that explicitly deal with pairs and also adds PAIR_LEFT and PAIR_RIGHT functions that can access the two sides of the pairs. This does not add the numerical versions yet, though that can be an easy iteration on top of this.
| return ExpressionType.LONG; | ||
| } | ||
| } | ||
| throw new UOE("Invalid input type [%s] to pair_left, expected a complex pair type"); |
Check failure
Code scanning / CodeQL
Missing format argument
| return ExpressionType.STRING; | ||
| } | ||
| } | ||
| throw new UOE("Invalid input type [%s] to pair_right, expected a complex pair type"); |
Check failure
Code scanning / CodeQL
Missing format argument
There was a problem hiding this comment.
Both the missing format arguments need to be introduced
|
|
||
| import javax.annotation.Nullable; | ||
|
|
||
| public class PairOperatorConversion implements SqlOperatorConversion |
There was a problem hiding this comment.
Can this implement DirectOperatorConversion instead ?
| validationHelperCheckArgumentCount(args, 1); | ||
| Expr arg = args.get(0); | ||
|
|
||
| class PairLeft extends ExprMacroTable.BaseScalarUnivariateMacroFunctionExpr |
There was a problem hiding this comment.
nit. Might call it PairLeftExpression
Also, we can move these to individual classes
| @JsonProperty("fieldName") @Nullable final String fieldName, | ||
| @JsonProperty("timeColumn") @Nullable final String timeColumn, | ||
| @JsonProperty("maxStringBytes") Integer maxStringBytes | ||
| @JsonProperty("foldColumn") @Nullable final String foldColumn, |
There was a problem hiding this comment.
why do we need a separate foldColumn? It seems like instead we could just not auto initialize the timeColumn, and in factorize methods examine the ColumnCapabilities of ColumnSelectorFactory of the fieldName column and factorize into either the folding aggregator or the building aggregator as appropriate
| // Less efficient code path when folding is a possibility (we must read the value selector first just in case | ||
| // it's a foldable object). |
There was a problem hiding this comment.
nit: stale comment copied from old combined agg?
| @JsonProperty("fieldName") @Nullable final String fieldName, | ||
| @JsonProperty("timeColumn") @Nullable final String timeColumn, | ||
| @JsonProperty("maxStringBytes") Integer maxStringBytes | ||
| @JsonProperty("foldColumn") @Nullable final String foldColumn, |
There was a problem hiding this comment.
same comment about foldColumn vs just using ColumnCapabilities
| @Override | ||
| public ExpressionType getOutputType(InputBindingInspector inspector) | ||
| { | ||
| final ExpressionType inputType = inspector.getType(arg.getBindingIfIdentifier()); |
There was a problem hiding this comment.
inputType could be null here. You could alternatively use arg.getOutputType(inspector) though it can also still return null.
If the argument must be an identifier, i would recommend pulling arg.getBindingIfIdentifier() and validating that it is not null separately.
| return ExpressionType.LONG; | ||
| } | ||
| } | ||
| throw new UOE("Invalid input type [%s] to pair_left, expected a complex pair type"); |
There was a problem hiding this comment.
macro expression implement an interface called NamedFunction that provide a bunch of validation helper methods to help provide consistent error messaging for expressions, so could do something like
validationFailed("Invalid input type [%s], expected [%s]", inputType, EXPRESSION_TYPE)
| @Override | ||
| public ExpressionType getOutputType(InputBindingInspector inspector) | ||
| { | ||
| final ExpressionType inputType = inspector.getType(arg.getBindingIfIdentifier()); |
There was a problem hiding this comment.
same comment about using arg.getOutputType(inspector)
| public static final byte KLL_FLOATS_SKETCH_TO_QUANTILE_CACHE_TYPE_ID = 42; | ||
| public static final byte KLL_FLOATS_SKETCH_TO_QUANTILES_CACHE_TYPE_ID = 43; | ||
| public static final byte KLL_FLOATS_SKETCH_TO_STRING_CACHE_TYPE_ID = 44; | ||
| public static final byte PAIRS = 45; |
| case STRING: | ||
| case COMPLEX: | ||
| return new StringFirstAggregatorFactory(name, fieldName, timeColumn, maxStringBytes); | ||
| return new StringFirstAggregatorFactory(name, fieldName, timeColumn, null, maxStringBytes, true); |
There was a problem hiding this comment.
is this right if we go forward with separate foldColumn? Shouldn't the complex case put fieldName in the foldColumn?
If you drop foldColumn and use the selector factory + capabilities (my preference at least at the time of writing this comment) instead then this doesn't need to change for now I think. Though, this part will also need to change if the PR that adds ingest time support for numeric first/last, since we'll need to get the complex type name and pick the correct folding aggregator...
| case STRING: | ||
| case COMPLEX: | ||
| return new StringLastAggregatorFactory(name, fieldName, timeColumn, maxStringBytes); | ||
| return new StringLastAggregatorFactory(name, fieldName, timeColumn, null, maxStringBytes, true); |
There was a problem hiding this comment.
same comment about COMPLEX case and foldColumn.
| public class EarliestLatestPairSqlAggregator implements SqlAggregator | ||
| { | ||
| public static final SqlAggregator[] DEFINED_AGGS = new SqlAggregator[]{ | ||
| new EarliestLatestPairSqlAggregator(AggregatorType.EARLIEST, PayloadType.STRING), | ||
| new EarliestLatestPairSqlAggregator(AggregatorType.LATEST, PayloadType.STRING) | ||
| }; | ||
|
|
||
| enum AggregatorType | ||
| { | ||
| EARLIEST, LATEST | ||
| } | ||
|
|
||
| enum PayloadType | ||
| { | ||
| STRING, LONG, DOUBLE | ||
| } |
There was a problem hiding this comment.
do we really need new functions to do aggregations on complex types? can't EARLIEST and LATEST detect when their inputs are a pair and do the right thing?
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
The current Latest/Earliest aggregators all do
finalization to pretend like they are dealing with their payload type instead of exposing that they are actually dealing with pairs. This adds a STRING_LATEST and STRING_EARLIEST that explicitly deal with pairs and also adds PAIR_LEFT and PAIR_RIGHT functions that can access the two sides of the pairs.
This does not add the numerical versions yet, though that can be an easy iteration on top of this.