These two queries (if posed in GroupByQueryRunnerTest) return different results, but I think they should return the same results. The only difference between query 1 and query 2 is that query 2 has an additional dimension that is not referenced in the subtotalsSpec.
I believe the issue is that GroupByStrategyV2.processSubtotalsSpec calls mergeResults on rows returned by GroupByRowProcessor.getRowsFromGrouper for each subtotal dimension list, but this isn't enough to fully group them. The result rows from the grouper are sorted based on the original dimension set, and mergeResults only merges adjacent rows.
i.e. Imagine you have two dimensions and they each have values A, B, and C. The original grouper might have rows like this:
AA
AB
BA
BB
Passing "dimsToInclude" with just the second dim would yield these rows from the Grouper:
A
B
A
B
Which cannot be merged by mergeResults.
I can think of a couple of fixes:
- Re-sorting the Grouper rows each time they are pulled out. However, not all Groupers support re-sorting (e.g. SpillingGrouper cannot change its sort order once it has spilled to disk) so it might not always be possible.
- Using two Groupers (with two merge buffers), one to store the initially grouped rows and one to store subtotal groupings. The act of adding results to the subtotal Grouper will properly and fully group them. Takes more memory but should work in all cases.
Query 1:
GroupByQuery query = makeQueryBuilder()
.setDataSource(QueryRunnerTestHelper.dataSource)
.setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird)
.setDimensions(
ImmutableList.of(
new DefaultDimensionSpec("market", "market")
)
)
.setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount))
.setGranularity(QueryRunnerTestHelper.allGran)
.setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market")))
.build();
Query 2:
GroupByQuery query = makeQueryBuilder()
.setDataSource(QueryRunnerTestHelper.dataSource)
.setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird)
.setDimensions(
ImmutableList.of(
new DefaultDimensionSpec("quality", "quality"),
new DefaultDimensionSpec("market", "market")
)
)
.setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount))
.setGranularity(QueryRunnerTestHelper.allGran)
.setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market")))
.build();
/cc @himanshug
These two queries (if posed in GroupByQueryRunnerTest) return different results, but I think they should return the same results. The only difference between query 1 and query 2 is that query 2 has an additional dimension that is not referenced in the subtotalsSpec.
I believe the issue is that
GroupByStrategyV2.processSubtotalsSpeccallsmergeResultson rows returned byGroupByRowProcessor.getRowsFromGrouperfor each subtotal dimension list, but this isn't enough to fully group them. The result rows from the grouper are sorted based on the original dimension set, andmergeResultsonly merges adjacent rows.i.e. Imagine you have two dimensions and they each have values A, B, and C. The original grouper might have rows like this:
AA
AB
BA
BB
Passing "dimsToInclude" with just the second dim would yield these rows from the Grouper:
A
B
A
B
Which cannot be merged by
mergeResults.I can think of a couple of fixes:
Query 1:
Query 2:
/cc @himanshug