Fix post-aggregator computation when used with subtotals#10653
Fix post-aggregator computation when used with subtotals#10653jon-wei merged 4 commits intoapache:masterfrom
Conversation
| testQuery( | ||
| "SELECT dim2, SUM(cnt), GROUPING(dim2), \n" | ||
| + "CASE WHEN GROUPING(dim2) = 1 THEN 'ALL' ELSE 'INDIVIDUAL' END\n" | ||
| + "CASE WHEN GROUPING(dim2) = 1 THEN 'ALL' ELSE dim2 END\n" |
There was a problem hiding this comment.
Why did you change this test case? (As opposed to introducing a new test case.)
There was a problem hiding this comment.
I wrote this test when I submitted the patch for the grouping function. I had wanted to write it this way (as is in PR) but couldn't because of the post-aggregation bug. Now changing it as I am fixing the bug. BTW There are two more tests for the grouping function.
| .withLimitSpec(subtotalQueryLimitSpec) | ||
| .withDimensionSpecs(newDimensions); | ||
| .withLimitSpec(subtotalQueryLimitSpec); | ||
| //.withDimensionSpecs(newDimensions); |
There was a problem hiding this comment.
Please don't include commented-out code.
There was a problem hiding this comment.
Yup. I had removed it on my local but forgot to push.
| ); | ||
| subTotalDimensionSpec.add(dimensionSpec); | ||
| } else { | ||
| // Insert dummy dimension so all subtotals queries have ResultRows with the same shape. |
There was a problem hiding this comment.
Is this concern no longer valid?
IIRC, it was necessary because otherwise the ResultRows would be different lengths and so the final results wouldn't be correct.
There was a problem hiding this comment.
We are still keeping all the original dimensions in the query. So result row size should be the same. I think you were concerned that the result should be null for dimensions not part of the subtotal. We are not carrying over the result for those dimensions so it should work out.
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L593
|
@abhishekagarwal87 Could you check the errors in https://travis-ci.com/github/apache/druid/jobs/456420636, including the one in GroupByQueryRunnerTest.testGroupByWithSubtotalsSpecWithLongDimensionColumn? |
It seems to be an existing bug. It can be reproduced in master by writing a subquery that generates null values for numeric dimensions. The bug occurs only when there is a subtotal or subquery used. It was caught here as when I removed the renaming of dimensions, I also removed the type change that was happening earlier (the dummy dimensions were string). |
|
Added the test for a nested groupBy query which fails in the current master. |
| case LONG: | ||
| return (InputRawSupplierColumnSelectorStrategy<BaseLongColumnValueSelector>) | ||
| columnSelector -> columnSelector::getLong; | ||
| columnSelector -> () -> columnSelector.isNull() ? null : columnSelector.getLong(); |
There was a problem hiding this comment.
FYI, this change could cause a dip in performance when columns are actually strings and being read as a number. Since the parsing first happens in isNull function and then again in getLong
There was a problem hiding this comment.
IMO, the selectors themselves should ideally cache this computation, similar to the changes being made in #10614. Therefore, I think this change is OK, and if there are any issues it should be fixed at the selector level.
* Fix post-aggregator computation * remove commented code * Fix numeric null handling * Add test when subquery returns null long
Description
Post-aggregators, when used with subtotals, errors if the post-aggregator depends on a dimension that is absent in one of the subtotal. An example query is as follows
The exception is
I have removed the dimension renaming. We only carry over the dimensions included in the subtotal spec while generating results for any subtotal.
This PR has:
Key changed/added classes in this PR
GroupyByStrategyV2