Conversation
|
@acslk we need docs in docs/content/querying aggregations doc you can probably just C&P the description in this PR for docs |
There was a problem hiding this comment.
do you need a @JsonCreator annotation?
There was a problem hiding this comment.
I don't need it for this PR since I create the SerializablePair using object map in AggregatorFactory instead of using the object mapper, but it should be useful if this class is to be used in the future.
There was a problem hiding this comment.
It's good code cleanliness to have deserializers when we have serializers, so let's add both (and a test for both).
|
These should be documented in aggregations.md. |
There was a problem hiding this comment.
valueType is probably clearer
|
I think this won't work at indexing time as-is; we would need a serde for writing out columns that have the (timestamp, last value) pairs in them. |
|
@gianm Using first/last aggregator at ingestion time is kind of tricky with the value we want to store. Ideally, we want to persist the first and last metric as long/double column so other aggregator such as sum can aggregate them. However, doing so would cause merging persisted data to be incorrect since no time value are stored for the metric. If the values are instead stored as time value pair, the column could not be aggregated by standard aggregators. Basically the problem is that we want an intermediate storage format for merging and a different final storage format for querying, and this cannot be done in the ingestion process. Now I’ll just leave the comment in aggregations.md that first/last aggregators can't be used at ingestion time. |
|
It seems that there are very little shared code between long and double type aggregator, so I changed the syntax to be the same with max, min, and sum with doubleFirst/Last and longFirst/Last in the newer commit. |
There was a problem hiding this comment.
we should really define these constants in another file somewhere. It is getting more and more difficult to track available values
There was a problem hiding this comment.
Or some other better way to track cache unique key guarantees. Right now it is pretty much impossible for extensions to guarantee uniqueness of id across them.
There was a problem hiding this comment.
we can store the cache keys in a helper class similar to how DimFilter cache keys are stored, but I think that can go in another PR
There was a problem hiding this comment.
fixing in a larger way is outside the scope of this PR
|
👍 |
There was a problem hiding this comment.
I understand it's not useable for ingestion time, but at least shouldn't we return float or long here?
There was a problem hiding this comment.
Sorry if I misunderstood, but are you suggesting to have getTypeName return some type other than float or long?
There was a problem hiding this comment.
Ah, forget that. Now I understand the intention.
|
what should the default value for inner query finalize parameter? To keep it the same as before would make the default value false, but it feels more consistent with outer query to have it be true. |
|
hmm, I agree finalize = true makes the most sense, but I think in this case compatibility concerns win. So let's make it false by default. |
There was a problem hiding this comment.
should be "DoubleLastAggregatorFactory{"
There was a problem hiding this comment.
changed it to Double, strange that I got both Long and Double wrong
|
some minor comments, looks good so far, will review again after query finalization comments from @gianm are resolved |
f15f28e to
11167f9
Compare
|
Added option to finalize inner query, and also slightly changed how v1 build inner incrementalIndex. v1 strategy process inner query result by building IncrementalIndex on query result with aggregators from AggregatorFactory.getRequiredColumn(). When building IncrementalIndex, getCombiningFactory is called for the passed in aggregators. This makes sense for the merging runners that uses IncrementalIndex, but not so much for copying value from results. I parameterized whether or not to use combining factory so indexing inner query does not use combining factory. |
|
@acslk Thanks for putting in the effort to make this possible. I tried this pull request on the latest master code branch. Though it works for aggregators, I have 3 issues:
|
|
@gauravkumar37 Thanks for the feedback, here's my thoughts on the issues:
|
| public AggregatorFactory apply(String input) | ||
| { | ||
| return new JavaScriptAggregatorFactory(input, fieldNames, fnAggregate, fnReset, fnCombine, config); | ||
| return new JavaScriptAggregatorFactory(input, Lists.newArrayList(input), fnCombine, fnReset, fnCombine, config); |
There was a problem hiding this comment.
Why is this change to JavaScriptAggregatorFactory needed?
There was a problem hiding this comment.
Previously getCombiningFactory was always called on top of getRequiredColumn to get the identity AggregatorFactory needed for copying the javascript aggregator values. Since getCombiningFactory is no longer called on this, the original getRequiredColumn does really make sense for copying values.
There was a problem hiding this comment.
Apart from javascript, implementation of getRequiredColumn for other aggregatorFactories seems to work fine without converting to combiningFactory
|
LGTM, 👍 |
|
@acslk if you can fix the conflicts I can help finish up review. |
|
since this is one of the last few PRs for 0.9.2, as discussed on the call last week, I'll bump it to 0.9.3. |
5bfcb5b to
7d24658
Compare
|
rebased and resolved conflict |
|
Superseded by #3566. |
This PR implements the 'first' and 'last' aggregator discussed in #2845.
The first and last aggregator can be used in the following format
The first aggregator output the value of fieldName with the smallest timestamp (using the __time column), while last aggregator output the value of fieldName with the largest timestamp. In case of multiple first and last times, one of them will be selected arbitrarily.