Fix post-aggregator computation when used with subtotals by abhishekagarwal87 · Pull Request #10653 · apache/druid

abhishekagarwal87 · 2020-12-08T12:08:03Z

Description

Post-aggregators, when used with subtotals, errors if the post-aggregator depends on a dimension that is absent in one of the subtotal. An example query is as follows

SELECT dim2, SUM(cnt), GROUPING(dim2), 
CASE WHEN GROUPING(dim2) = 1 THEN 'ALL' ELSE dim2 END
FROM druid.foo
GROUP BY GROUPING SETS ( (dim2), () )

The exception is

java.lang.IllegalArgumentException: Missing fields [[d0]] for postAggregator [p0]

	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:148)
	at org.apache.druid.query.Queries.prepareAggregations(Queries.java:118)
	at org.apache.druid.query.groupby.GroupByQuery.<init>(GroupByQuery.java:210)
	at org.apache.druid.query.groupby.GroupByQuery.<init>(GroupByQuery.java:88)
	at org.apache.druid.query.groupby.GroupByQuery$Builder.build(GroupByQuery.java:1175)
	at org.apache.druid.query.groupby.GroupByQuery.withDimensionSpecs(GroupByQuery.java:804)
	at org.apache.druid.query.groupby.strategy.GroupByStrategyV2.processSubtotalsSpec(GroupByStrategyV2.java:446)
	at org.apache.druid.query.groupby.GroupByQueryQueryToolChest.mergeGroupByResultsWithoutPushDown(GroupByQueryQueryToolChest.java:250)
	at org.apache.druid.query.groupby.GroupByQueryQueryToolChest.mergeGroupByResults(GroupByQueryQueryToolChest.java:177)
	at org.apache.druid.query.groupby.GroupByQueryQueryToolChest.initAndMergeGroupByResults(GroupByQueryQueryToolChest.java:149)
	at org.apache.druid.query.groupby.GroupByQueryQueryToolChest.lambda$mergeResults$0(GroupByQueryQueryToolChest.java:122)
	at org.apache.druid.query.FinalizeResultsQueryRunner.run(FinalizeResultsQueryRunner.java:110)

I have removed the dimension renaming. We only carry over the dimensions included in the subtotal spec while generating results for any subtotal.

This PR has:

been self-reviewed.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

Key changed/added classes in this PR

GroupyByStrategyV2

gianm · 2020-12-08T16:46:09Z

    testQuery(
        "SELECT dim2, SUM(cnt), GROUPING(dim2), \n"
-        + "CASE WHEN GROUPING(dim2) = 1 THEN 'ALL' ELSE 'INDIVIDUAL' END\n"
+        + "CASE WHEN GROUPING(dim2) = 1 THEN 'ALL' ELSE dim2 END\n"


Why did you change this test case? (As opposed to introducing a new test case.)

I wrote this test when I submitted the patch for the grouping function. I had wanted to write it this way (as is in PR) but couldn't because of the post-aggregation bug. Now changing it as I am fixing the bug. BTW There are two more tests for the grouping function.

gianm · 2020-12-08T16:46:22Z

-            .withLimitSpec(subtotalQueryLimitSpec)
-            .withDimensionSpecs(newDimensions);
+            .withLimitSpec(subtotalQueryLimitSpec);
+            //.withDimensionSpecs(newDimensions);


Please don't include commented-out code.

Yup. I had removed it on my local but forgot to push.

gianm · 2020-12-08T16:49:32Z

-            );
            subTotalDimensionSpec.add(dimensionSpec);
-          } else {
-            // Insert dummy dimension so all subtotals queries have ResultRows with the same shape.


Is this concern no longer valid?

IIRC, it was necessary because otherwise the ResultRows would be different lengths and so the final results wouldn't be correct.

https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L581

We are still keeping all the original dimensions in the query. So result row size should be the same. I think you were concerned that the result should be null for dimensions not part of the subtotal. We are not carrying over the result for those dimensions so it should work out.
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L593

gianm · 2020-12-11T22:51:00Z

@abhishekagarwal87 Could you check the errors in https://travis-ci.com/github/apache/druid/jobs/456420636, including the one in GroupByQueryRunnerTest.testGroupByWithSubtotalsSpecWithLongDimensionColumn?

abhishekagarwal87 · 2020-12-14T19:27:58Z

@abhishekagarwal87 Could you check the errors in https://travis-ci.com/github/apache/druid/jobs/456420636, including the one in GroupByQueryRunnerTest.testGroupByWithSubtotalsSpecWithLongDimensionColumn?

It seems to be an existing bug. It can be reproduced in master by writing a subquery that generates null values for numeric dimensions. RowBasedGrouperHelper#getValueSuppliersForDimensions should handle the scenario when input can be a numeric null.

The bug occurs only when there is a subtotal or subquery used. It was caught here as when I removed the renaming of dimensions, I also removed the type change that was happening earlier (the dummy dimensions were string).

abhishekagarwal87 · 2020-12-15T08:20:10Z

Added the test for a nested groupBy query which fails in the current master.

abhishekagarwal87 · 2020-12-17T09:41:56Z

        case LONG:
          return (InputRawSupplierColumnSelectorStrategy<BaseLongColumnValueSelector>)
-              columnSelector -> columnSelector::getLong;
+              columnSelector -> () -> columnSelector.isNull() ? null : columnSelector.getLong();


FYI, this change could cause a dip in performance when columns are actually strings and being read as a number. Since the parsing first happens in isNull function and then again in getLong

IMO, the selectors themselves should ideally cache this computation, similar to the changes being made in #10614. Therefore, I think this change is OK, and if there are any issues it should be fixed at the selector level.

* Fix post-aggregator computation * remove commented code * Fix numeric null handling * Add test when subquery returns null long

abhishekagarwal87 added 2 commits December 8, 2020 12:05

Fix post-aggregator computation

c69361b

remove commented code

a7847a1

gianm reviewed Dec 8, 2020

View reviewed changes

clintropolis added Area - Querying Bug labels Dec 10, 2020

gianm approved these changes Dec 11, 2020

View reviewed changes

abhishekagarwal87 added 2 commits December 15, 2020 01:23

Fix numeric null handling

a21e287

Add test when subquery returns null long

dbd6526

abhishekagarwal87 commented Dec 17, 2020

View reviewed changes

jon-wei approved these changes Dec 18, 2020

View reviewed changes

jon-wei merged commit 796c255 into apache:master Dec 18, 2020

jihoonson added this to the 0.21.0 milestone Jan 4, 2021

jihoonson mentioned this pull request Jan 13, 2021

[Draft] 0.21.0 Release Notes #10752

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix post-aggregator computation when used with subtotals#10653

Fix post-aggregator computation when used with subtotals#10653
jon-wei merged 4 commits intoapache:masterfrom
abhishekagarwal87:post_aggregator

abhishekagarwal87 commented Dec 8, 2020

Uh oh!

gianm Dec 8, 2020

Uh oh!

abhishekagarwal87 Dec 8, 2020

Uh oh!

gianm Dec 8, 2020

Uh oh!

abhishekagarwal87 Dec 8, 2020

Uh oh!

gianm Dec 8, 2020

Uh oh!

abhishekagarwal87 Dec 8, 2020

Uh oh!

gianm commented Dec 11, 2020

Uh oh!

abhishekagarwal87 commented Dec 14, 2020

Uh oh!

abhishekagarwal87 commented Dec 15, 2020

Uh oh!

abhishekagarwal87 Dec 17, 2020

Uh oh!

gianm Dec 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

abhishekagarwal87 commented Dec 8, 2020

Description

Key changed/added classes in this PR

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm commented Dec 11, 2020

Uh oh!

abhishekagarwal87 commented Dec 14, 2020

Uh oh!

abhishekagarwal87 commented Dec 15, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants