SQL: EARLIEST, LATEST aggregators.#8815
Conversation
I chose these names instead of FIRST, LAST because those are already reserved functions in Calcite that mean something different. I think these are also better names anyway.
| public void testEarliestAggregators() throws Exception | ||
| { | ||
| // Cannot vectorize EARLIEST aggregator. | ||
| skipVectorize(); |
There was a problem hiding this comment.
Does this need to be reset at the end of the test? Otherwise all the other tests that run after this will run with vectorize disabled? Or is that being handled by some rule that I'm not seeing in this delta
There was a problem hiding this comment.
The test framework creates a new instance for each test method, so that takes care of resetting it.
|
|
||
| enum EarliestOrLatest | ||
| { | ||
| EARLIEST { |
There was a problem hiding this comment.
nit: leave a comment here reminding people not to rename the enum since the name() is used in the AggFunction below
| case DOUBLE: | ||
| return new DoubleFirstAggregatorFactory(name, fieldName); | ||
| case STRING: | ||
| return new StringFirstAggregatorFactory(name, fieldName, maxStringBytes); |
There was a problem hiding this comment.
Do we want to validate that maxStringBytes >= 0 in both the aggregator factories? I traced through the code and I think an exception will be thrown in the String*BufferAggregator#aggregate because there will be an out of bounds exception.
Also it's not clear to me what the expected result should be if maxStringBytes is 0
There was a problem hiding this comment.
The validation sounds like a nice addition. I can add it after #8834 is merged. Right now, this patch conflicts with that one, and is blocked on it.
I think if maxStringBytes is 0 you'd expect to get an empty string. A little goofy but technically correct. The best kind of correct :)
There was a problem hiding this comment.
:) I asked because I was wondering if we could short circuit that special case. You don't need to compare the timestamps in the aggregator - as long as a string exists for any row, we know the result will be an empty string.
This edge case probably never happens - so again, feel free to ignore
There was a problem hiding this comment.
Just pushed the change with the validation.
I think short circuiting the special case isn't super needed, because it's crazy, and who would do it? (Famous last words.)
|
This is blocked on #8834. The null handling fixes there are required for the tests to pass. |
|
This is unblocked now. |
|
@suneet-amp any more comments? |
|
Taking a look now - should have a review out in an hour. Doubt I'll find anything else. Feel free to merge and I'll comment retroactively |
| .map(rexNode -> toDruidExpressionForSimpleAggregator(plannerContext, rowSignature, rexNode)) | ||
| .collect(Collectors.toList()); | ||
|
|
||
| if (args.stream().noneMatch(Objects::isNull)) { |
There was a problem hiding this comment.
Could you add a comment here explaining why we do this part? It seems not obvious to me
There was a problem hiding this comment.
I added method-level javadocs that explain it:
* @return list of expressions corresponding to aggregator arguments, or null if any cannot be translated
I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.
Fixes #8536.