Skip to content

[Proposal] Add support for multiple grouping specs in groupBy query #5179

@himanshug

Description

@himanshug

Recently I came across following use case.

User sends 3 groupBy queries similar to following ...

  1. Report unique-users(user-id-sketch) for groupBy "quarter", "product" .
  2. Report unique-users(user-id-sketch) for groupBy "quarter"
  3. Report unique-users overall.

Note that, it is possible for Druid Broker to run first query and construct results for 2nd and 3rd query by doing further aggregation on that result set and not touching the data nodes at all and also saving any extra network round trips.

So, the proposal is to add another field "roleupSpec" to the groupBy query, that would look something like below...

{
  "queryType": "groupBy",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "dimensions": ["quarter", "product"],
  "aggregations": [ .. ],
  "postAggregations": [ .. ],
  "rollupSpecs": [         # Note extra rollup specs here for just quarter , and no dimensions
    ["quarter"],
    []
  }
  "intervals": [ .. ],
}

There could be additional configs in the rollupSpecs to say whether original results set for <quarter,product> needs to be reported back or not etc.

I haven't thought it through so likely some parts of above proposal would change but this description is more to describe the idea.

As per implementation, it would be something like
Broker runs the groupBy query (as if no rollup spec is there)
then stores above result set in an IncrementalIndex (or may be use off-heap impls or store result set in a segment) and then further runs groupBy queries on this index to compute result sets for the rollup specs provided.

Result reported to the user would still be sequence of rows of results sets of different rollups one after another.

Oracle provides similar features via "ROLLUP" and "CUBE" functions as described in https://docs.oracle.com/cd/B28359_01/server.111/b28314/tdpdw_sql.htm#TDPDW00712 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions