Skip to content

SQL: EARLIEST, LATEST aggregators.#8815

Merged
clintropolis merged 8 commits intoapache:masterfrom
gianm:sql-earliest-latest
Nov 9, 2019
Merged

SQL: EARLIEST, LATEST aggregators.#8815
clintropolis merged 8 commits intoapache:masterfrom
gianm:sql-earliest-latest

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Nov 3, 2019

I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.

Fixes #8536.

I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.
public void testEarliestAggregators() throws Exception
{
// Cannot vectorize EARLIEST aggregator.
skipVectorize();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be reset at the end of the test? Otherwise all the other tests that run after this will run with vectorize disabled? Or is that being handled by some rule that I'm not seeing in this delta

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test framework creates a new instance for each test method, so that takes care of resetting it.


enum EarliestOrLatest
{
EARLIEST {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: leave a comment here reminding people not to rename the enum since the name() is used in the AggFunction below

case DOUBLE:
return new DoubleFirstAggregatorFactory(name, fieldName);
case STRING:
return new StringFirstAggregatorFactory(name, fieldName, maxStringBytes);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to validate that maxStringBytes >= 0 in both the aggregator factories? I traced through the code and I think an exception will be thrown in the String*BufferAggregator#aggregate because there will be an out of bounds exception.

Also it's not clear to me what the expected result should be if maxStringBytes is 0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation sounds like a nice addition. I can add it after #8834 is merged. Right now, this patch conflicts with that one, and is blocked on it.

I think if maxStringBytes is 0 you'd expect to get an empty string. A little goofy but technically correct. The best kind of correct :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I asked because I was wondering if we could short circuit that special case. You don't need to compare the timestamps in the aggregator - as long as a string exists for any row, we know the result will be an empty string.

This edge case probably never happens - so again, feel free to ignore

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed the change with the validation.

I think short circuiting the special case isn't super needed, because it's crazy, and who would do it? (Famous last words.)

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Nov 6, 2019

This is blocked on #8834. The null handling fixes there are required for the tests to pass.

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Nov 8, 2019

This is unblocked now.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Nov 8, 2019

@suneet-amp any more comments?

@suneet-s
Copy link
Copy Markdown
Contributor

suneet-s commented Nov 8, 2019

Taking a look now - should have a review out in an hour. Doubt I'll find anything else. Feel free to merge and I'll comment retroactively

Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm

.map(rexNode -> toDruidExpressionForSimpleAggregator(plannerContext, rowSignature, rexNode))
.collect(Collectors.toList());

if (args.stream().noneMatch(Objects::isNull)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining why we do this part? It seems not obvious to me

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added method-level javadocs that explain it:

   * @return list of expressions corresponding to aggregator arguments, or null if any cannot be translated

@clintropolis clintropolis merged commit 0e8c3f7 into apache:master Nov 9, 2019
@gianm gianm deleted the sql-earliest-latest branch November 9, 2019 01:29
@jon-wei jon-wei added this to the 0.17.0 milestone Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add last and first aggregators to sql

5 participants