-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Fix post-aggregator computation when used with subtotals #10653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -408,27 +408,10 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
| // Dimension spec including dimension name and output name | ||
| final List<DimensionSpec> subTotalDimensionSpec = new ArrayList<>(dimsInSubtotalSpec.size()); | ||
| final List<DimensionSpec> dimensions = query.getDimensions(); | ||
| final List<DimensionSpec> newDimensions = new ArrayList<>(); | ||
|
|
||
| for (int i = 0; i < dimensions.size(); i++) { | ||
| DimensionSpec dimensionSpec = dimensions.get(i); | ||
| for (DimensionSpec dimensionSpec : dimensions) { | ||
| if (dimsInSubtotalSpec.contains(dimensionSpec.getOutputName())) { | ||
| newDimensions.add( | ||
| new DefaultDimensionSpec( | ||
| dimensionSpec.getOutputName(), | ||
| dimensionSpec.getOutputName(), | ||
| dimensionSpec.getOutputType() | ||
| ) | ||
| ); | ||
| subTotalDimensionSpec.add(dimensionSpec); | ||
| } else { | ||
| // Insert dummy dimension so all subtotals queries have ResultRows with the same shape. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this concern no longer valid? IIRC, it was necessary because otherwise the ResultRows would be different lengths and so the final results wouldn't be correct.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are still keeping all the original dimensions in the query. So result row size should be the same. I think you were concerned that the result should be null for dimensions not part of the subtotal. We are not carrying over the result for those dimensions so it should work out. |
||
| // Use a field name that does not appear in the main query result, to assure the result will be null. | ||
| String dimName = "_" + i; | ||
| while (query.getResultRowSignature().indexOf(dimName) >= 0) { | ||
| dimName = "_" + dimName; | ||
| } | ||
| newDimensions.add(DefaultDimensionSpec.of(dimName)); | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -442,8 +425,7 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
| } | ||
|
|
||
| GroupByQuery subtotalQuery = baseSubtotalQuery | ||
| .withLimitSpec(subtotalQueryLimitSpec) | ||
| .withDimensionSpecs(newDimensions); | ||
| .withLimitSpec(subtotalQueryLimitSpec); | ||
|
|
||
| final GroupByRowProcessor.ResultSupplier resultSupplierOneFinal = resultSupplierOne; | ||
| if (Utils.isPrefix(subtotalSpec, queryDimNames)) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12159,23 +12159,23 @@ public void testGroupingAggregatorWithPostAggregator() throws Exception | |
| List<Object[]> resultList; | ||
| if (NullHandling.sqlCompatible()) { | ||
| resultList = ImmutableList.of( | ||
| new Object[]{NULL_STRING, 2L, 0L, "INDIVIDUAL"}, | ||
| new Object[]{"", 1L, 0L, "INDIVIDUAL"}, | ||
| new Object[]{"a", 2L, 0L, "INDIVIDUAL"}, | ||
| new Object[]{"abc", 1L, 0L, "INDIVIDUAL"}, | ||
| new Object[]{NULL_STRING, 2L, 0L, NULL_STRING}, | ||
| new Object[]{"", 1L, 0L, ""}, | ||
| new Object[]{"a", 2L, 0L, "a"}, | ||
| new Object[]{"abc", 1L, 0L, "abc"}, | ||
| new Object[]{NULL_STRING, 6L, 1L, "ALL"} | ||
| ); | ||
| } else { | ||
| resultList = ImmutableList.of( | ||
| new Object[]{"", 3L, 0L, "INDIVIDUAL"}, | ||
| new Object[]{"a", 2L, 0L, "INDIVIDUAL"}, | ||
| new Object[]{"abc", 1L, 0L, "INDIVIDUAL"}, | ||
| new Object[]{"", 3L, 0L, ""}, | ||
| new Object[]{"a", 2L, 0L, "a"}, | ||
| new Object[]{"abc", 1L, 0L, "abc"}, | ||
| new Object[]{NULL_STRING, 6L, 1L, "ALL"} | ||
| ); | ||
| } | ||
| testQuery( | ||
| "SELECT dim2, SUM(cnt), GROUPING(dim2), \n" | ||
| + "CASE WHEN GROUPING(dim2) = 1 THEN 'ALL' ELSE 'INDIVIDUAL' END\n" | ||
| + "CASE WHEN GROUPING(dim2) = 1 THEN 'ALL' ELSE dim2 END\n" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you change this test case? (As opposed to introducing a new test case.)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wrote this test when I submitted the patch for the grouping function. I had wanted to write it this way (as is in PR) but couldn't because of the post-aggregation bug. Now changing it as I am fixing the bug. BTW There are two more tests for the grouping function. |
||
| + "FROM druid.foo\n" | ||
| + "GROUP BY GROUPING SETS ( (dim2), () )", | ||
| ImmutableList.of( | ||
|
|
@@ -12200,7 +12200,7 @@ public void testGroupingAggregatorWithPostAggregator() throws Exception | |
| ) | ||
| .setPostAggregatorSpecs(Collections.singletonList(new ExpressionPostAggregator( | ||
| "p0", | ||
| "case_searched((\"a1\" == 1),'ALL','INDIVIDUAL')", | ||
| "case_searched((\"a1\" == 1),'ALL',\"d0\")", | ||
| null, | ||
| ExprMacroTable.nil() | ||
| ))) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this change could cause a dip in performance when columns are actually strings and being read as a number. Since the parsing first happens in
isNullfunction and then again ingetLongThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the selectors themselves should ideally cache this computation, similar to the changes being made in #10614. Therefore, I think this change is OK, and if there are any issues it should be fixed at the selector level.