Skip to content

use virtual columns for sql simple aggregators instead of inline expressions#12251

Merged
clintropolis merged 5 commits intoapache:masterfrom
clintropolis:sql-aggs-use-virtual-column-instead-of-inline-expressions
Mar 3, 2022
Merged

use virtual columns for sql simple aggregators instead of inline expressions#12251
clintropolis merged 5 commits intoapache:masterfrom
clintropolis:sql-aggs-use-virtual-column-instead-of-inline-expressions

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Feb 10, 2022

Description

Follow-up to #12241, this PR updates the "simple" SQL aggregators, such as SUM, MIN, and MAX to use VirtualColumn instead of the inline expression to maximize expression re-use for anything more complicated than a literal value.

CalciteQueryTest.testExpressionAggregations has been update to have aggregators which share a common expression, which will now be shared between these two aggregators instead of each specifying it inline. Additionally, this allows these aggregators to participate in the 'specialization' added in #12241.

Finally, did some follow-up javadoc and naming adjustments per comments in the previous PR.


Key changed/added classes in this PR
  • SimpleSqlAggregator
  • DruidExpression
  • VirtualColumnRegistry

This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

String vc = virtualColumnRegistry.getVirtualColumnByExpression(arg, resolutionArg.getType());
fieldName = vc != null ? vc : null;
expression = vc != null ? null : arg.getExpression();
if (arg.getType() == DruidExpression.NodeType.LITERAL) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really feel like somethign that each implementation should have to do to me... Could we not like pass in the virtualColumnRegistry to the DruidExpression and ask it to tell us the expression to use? Where, like, in the literal case, it would give us an expression that's just a literal and in the non-literal case it would register the thing and then return the name to use for it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did consolidate this into a static method in SimpleSqlAggregator while you were typing this comment, I haven't really decided on baking it into VirtualColumnRegistry in some way yet though, will think about it some more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after further thought i've decided to just always use a virtual column, even for literals, so this code is simplified a bit now

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@clintropolis clintropolis removed the WIP label Mar 3, 2022
@clintropolis
Copy link
Copy Markdown
Member Author

I have confirmed this change is safe for min/max aggregators, which have special logic when using the expression field that only impacts STRING and ARRAY typed expressions when druid.generic.useDefaultValueForNull=false, but the SQL planner would have never planned such a thing, as it would have inferred the min/max aggregator to be string typed - which doesn't exist, and so would instead fail with Possible error: Max aggregation is not supported for 'STRING' type.

added a couple of additional tests from this investigation, just to document inconsistencies in behavior. It also appears that the "string wrappers" that the numeric aggregators use, StringColumnDoubleAggregatorWrapper, etc, do not behave correctly in SQL compatible null handling mode (druid.generic.useDefaultValueForNull=false), but i didn't modify them in this PR.

@clintropolis clintropolis merged commit 1c004ea into apache:master Mar 3, 2022
@clintropolis clintropolis deleted the sql-aggs-use-virtual-column-instead-of-inline-expressions branch March 3, 2022 23:05
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants