Skip to content

groupBy with subtotalsSpec doesn't fully group each set #8091

@gianm

Description

@gianm

These two queries (if posed in GroupByQueryRunnerTest) return different results, but I think they should return the same results. The only difference between query 1 and query 2 is that query 2 has an additional dimension that is not referenced in the subtotalsSpec.

I believe the issue is that GroupByStrategyV2.processSubtotalsSpec calls mergeResults on rows returned by GroupByRowProcessor.getRowsFromGrouper for each subtotal dimension list, but this isn't enough to fully group them. The result rows from the grouper are sorted based on the original dimension set, and mergeResults only merges adjacent rows.

i.e. Imagine you have two dimensions and they each have values A, B, and C. The original grouper might have rows like this:

AA
AB
BA
BB

Passing "dimsToInclude" with just the second dim would yield these rows from the Grouper:

A
B
A
B

Which cannot be merged by mergeResults.

I can think of a couple of fixes:

  1. Re-sorting the Grouper rows each time they are pulled out. However, not all Groupers support re-sorting (e.g. SpillingGrouper cannot change its sort order once it has spilled to disk) so it might not always be possible.
  2. Using two Groupers (with two merge buffers), one to store the initially grouped rows and one to store subtotal groupings. The act of adding results to the subtotal Grouper will properly and fully group them. Takes more memory but should work in all cases.

Query 1:

    GroupByQuery query = makeQueryBuilder()
        .setDataSource(QueryRunnerTestHelper.dataSource)
        .setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird)
        .setDimensions(
            ImmutableList.of(
                new DefaultDimensionSpec("market", "market")
            )
        )
        .setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount))
        .setGranularity(QueryRunnerTestHelper.allGran)
        .setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market")))
        .build();

Query 2:

    GroupByQuery query = makeQueryBuilder()
        .setDataSource(QueryRunnerTestHelper.dataSource)
        .setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird)
        .setDimensions(
            ImmutableList.of(
                new DefaultDimensionSpec("quality", "quality"),
                new DefaultDimensionSpec("market", "market")
            )
        )
        .setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount))
        .setGranularity(QueryRunnerTestHelper.allGran)
        .setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market")))
        .build();

/cc @himanshug

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions