Recently I came across following use case.
User sends 3 groupBy queries similar to following ...
- Report unique-users(user-id-sketch) for groupBy "quarter", "product" .
- Report unique-users(user-id-sketch) for groupBy "quarter"
- Report unique-users overall.
Note that, it is possible for Druid Broker to run first query and construct results for 2nd and 3rd query by doing further aggregation on that result set and not touching the data nodes at all and also saving any extra network round trips.
So, the proposal is to add another field "roleupSpec" to the groupBy query, that would look something like below...
{
"queryType": "groupBy",
"dataSource": "sample_datasource",
"granularity": "day",
"dimensions": ["quarter", "product"],
"aggregations": [ .. ],
"postAggregations": [ .. ],
"rollupSpecs": [ # Note extra rollup specs here for just quarter , and no dimensions
["quarter"],
[]
}
"intervals": [ .. ],
}
There could be additional configs in the rollupSpecs to say whether original results set for <quarter,product> needs to be reported back or not etc.
I haven't thought it through so likely some parts of above proposal would change but this description is more to describe the idea.
As per implementation, it would be something like
Broker runs the groupBy query (as if no rollup spec is there)
then stores above result set in an IncrementalIndex (or may be use off-heap impls or store result set in a segment) and then further runs groupBy queries on this index to compute result sets for the rollup specs provided.
Result reported to the user would still be sequence of rows of results sets of different rollups one after another.
Oracle provides similar features via "ROLLUP" and "CUBE" functions as described in https://docs.oracle.com/cd/B28359_01/server.111/b28314/tdpdw_sql.htm#TDPDW00712 .
Recently I came across following use case.
User sends 3 groupBy queries similar to following ...
Note that, it is possible for Druid Broker to run first query and construct results for 2nd and 3rd query by doing further aggregation on that result set and not touching the data nodes at all and also saving any extra network round trips.
So, the proposal is to add another field "roleupSpec" to the groupBy query, that would look something like below...
There could be additional configs in the rollupSpecs to say whether original results set for <quarter,product> needs to be reported back or not etc.
I haven't thought it through so likely some parts of above proposal would change but this description is more to describe the idea.
As per implementation, it would be something like
Broker runs the groupBy query (as if no rollup spec is there)
then stores above result set in an IncrementalIndex (or may be use off-heap impls or store result set in a segment) and then further runs groupBy queries on this index to compute result sets for the rollup specs provided.
Result reported to the user would still be sequence of rows of results sets of different rollups one after another.
Oracle provides similar features via "ROLLUP" and "CUBE" functions as described in https://docs.oracle.com/cd/B28359_01/server.111/b28314/tdpdw_sql.htm#TDPDW00712 .