-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add SQL GROUPING SETS support. #9122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
81ad380
3c7cef8
d68dd92
a3b0a41
a9ede1a
9222634
3074be8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure: | |||||
| SELECT [ ALL | DISTINCT ] { * | exprs } | ||||||
| FROM table | ||||||
| [ WHERE expr ] | ||||||
| [ GROUP BY exprs ] | ||||||
| [ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ] | ||||||
| [ HAVING expr ] | ||||||
| [ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ] | ||||||
| [ LIMIT limit ] | ||||||
|
|
@@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer | |||||
| can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected | ||||||
| column). | ||||||
|
|
||||||
| The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS, | ||||||
| for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city` | ||||||
| followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to | ||||||
| better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For | ||||||
| example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )` | ||||||
| and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand | ||||||
| total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example, | ||||||
| `GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`. | ||||||
| Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing | ||||||
| `GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the | ||||||
| "country" and "city" columns. | ||||||
|
|
||||||
| When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other paragraphs in this section use plain text for simple keywords and preformatted text for examples; see https://druid.apache.org/docs/latest/querying/sql.html. This can be changed but I wanted to keep it consistent in this patch. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can keep to the prevailing convention of plain text for inline keywords, with all caps providing the formatting distinction. The underlying point is taken though—a series of keywords like this makes the text a little hard to scan. One possible workaround, if the problem warrants it, would be to restructure the sentence to put the keywords in bullet format. For example --
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to change if this is an existing pattern |
||||||
| order that you specify your grouping sets in the query. If you need results to be generated in a particular order, use | ||||||
| the ORDER BY clause. | ||||||
|
|
||||||
| The HAVING clause refers to columns that are present after execution of GROUP BY. It can be used to filter on either | ||||||
| grouping expressions or aggregated values. It can only be used together with GROUP BY. | ||||||
|
|
||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -794,6 +794,11 @@ public GroupByQuery withQuerySegmentSpec(QuerySegmentSpec spec) | |
| return new Builder(this).setQuerySegmentSpec(spec).build(); | ||
| } | ||
|
|
||
| public GroupByQuery withVirtualColumns(final VirtualColumns virtualColumns) | ||
| { | ||
| return new Builder(this).setVirtualColumns(virtualColumns).build(); | ||
| } | ||
|
|
||
| public GroupByQuery withDimFilter(@Nullable final DimFilter dimFilter) | ||
| { | ||
| return new Builder(this).setDimFilter(dimFilter).build(); | ||
|
|
@@ -1198,6 +1203,7 @@ public String toString() | |
| ", dimensions=" + dimensions + | ||
| ", aggregatorSpecs=" + aggregatorSpecs + | ||
| ", postAggregatorSpecs=" + postAggregatorSpecs + | ||
| (subtotalsSpec != null ? (", subtotalsSpec=" + subtotalsSpec) : "") + | ||
| ", havingSpec=" + havingSpec + | ||
| ", context=" + getContext() + | ||
| '}'; | ||
|
|
@@ -1222,7 +1228,8 @@ public boolean equals(final Object o) | |
| Objects.equals(dimFilter, that.dimFilter) && | ||
| Objects.equals(dimensions, that.dimensions) && | ||
| Objects.equals(aggregatorSpecs, that.aggregatorSpecs) && | ||
| Objects.equals(postAggregatorSpecs, that.postAggregatorSpecs); | ||
| Objects.equals(postAggregatorSpecs, that.postAggregatorSpecs) && | ||
| Objects.equals(subtotalsSpec, that.subtotalsSpec); | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -1236,7 +1243,8 @@ public int hashCode() | |
| dimFilter, | ||
| dimensions, | ||
| aggregatorSpecs, | ||
| postAggregatorSpecs | ||
| postAggregatorSpecs, | ||
| subtotalsSpec | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. EqualsVerifier Test for this please
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I added it. |
||
| ); | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -67,6 +67,7 @@ | |
| import org.apache.druid.query.groupby.resource.GroupByQueryResource; | ||
| import org.apache.druid.query.spec.MultipleIntervalSegmentSpec; | ||
| import org.apache.druid.segment.StorageAdapter; | ||
| import org.apache.druid.segment.VirtualColumns; | ||
|
|
||
| import java.nio.ByteBuffer; | ||
| import java.util.ArrayList; | ||
|
|
@@ -355,7 +356,11 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
| GroupByRowProcessor.ResultSupplier resultSupplierOne = null; | ||
|
|
||
| try { | ||
| GroupByQuery queryWithoutSubtotalsSpec = query | ||
| // baseSubtotalQuery is the original query with dimensions and aggregators rewritten to apply to the *results* | ||
| // rather than *inputs* of that query. It has its virtual columns and dim filter removed, because those only | ||
| // make sense when applied to inputs. Finally, it has subtotalsSpec removed, since we'll be computing them | ||
| // one-by-one soon enough. | ||
| GroupByQuery baseSubtotalQuery = query | ||
| .withDimensionSpecs(query.getDimensions().stream().map( | ||
| dimSpec -> new DefaultDimensionSpec( | ||
| dimSpec.getOutputName(), | ||
|
|
@@ -369,13 +374,13 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
| .map(AggregatorFactory::getCombiningFactory) | ||
| .collect(Collectors.toList()) | ||
| ) | ||
| .withSubtotalsSpec(null) | ||
| .withDimFilter(null); | ||
|
|
||
| .withVirtualColumns(VirtualColumns.EMPTY) | ||
| .withDimFilter(null) | ||
| .withSubtotalsSpec(null); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not really part of this change - maybe worth filing an issue for. The
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by 'make sense'? It makes sense to me in that you have an immutable object (the GroupByQuery) and you are using chaining to create a derived copy.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I should have phrased that better. I meant that there seems to be a lot of overlap between the Nothing to change here - just my stream of thought while reading the code. |
||
|
|
||
| resultSupplierOne = GroupByRowProcessor.process( | ||
| queryWithoutSubtotalsSpec, | ||
| queryWithoutSubtotalsSpec, | ||
| baseSubtotalQuery, | ||
| baseSubtotalQuery, | ||
| queryResult, | ||
| configSupplier.get(), | ||
| resource, | ||
|
|
@@ -384,13 +389,13 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
| processingConfig.intermediateComputeSizeBytes() | ||
| ); | ||
|
|
||
| List<String> queryDimNames = queryWithoutSubtotalsSpec.getDimensions().stream().map(DimensionSpec::getOutputName) | ||
| .collect(Collectors.toList()); | ||
| List<String> queryDimNames = baseSubtotalQuery.getDimensions().stream().map(DimensionSpec::getOutputName) | ||
| .collect(Collectors.toList()); | ||
|
|
||
| // Only needed to make LimitSpec.filterColumns(..) call later in case base query has a non default LimitSpec. | ||
| Set<String> aggsAndPostAggs = null; | ||
| if (queryWithoutSubtotalsSpec.getLimitSpec() != null && !(queryWithoutSubtotalsSpec.getLimitSpec() instanceof NoopLimitSpec)) { | ||
| aggsAndPostAggs = getAggregatorAndPostAggregatorNames(queryWithoutSubtotalsSpec); | ||
| if (!(baseSubtotalQuery.getLimitSpec() instanceof NoopLimitSpec)) { | ||
| aggsAndPostAggs = getAggregatorAndPostAggregatorNames(baseSubtotalQuery); | ||
| } | ||
|
|
||
| List<List<String>> subtotals = query.getSubtotalsSpec(); | ||
|
|
@@ -425,14 +430,14 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
|
|
||
| // Create appropriate LimitSpec for subtotal query | ||
| LimitSpec subtotalQueryLimitSpec = NoopLimitSpec.instance(); | ||
| if (queryWithoutSubtotalsSpec.getLimitSpec() != null && !(queryWithoutSubtotalsSpec.getLimitSpec() instanceof NoopLimitSpec)) { | ||
| Set<String> columns = new HashSet(aggsAndPostAggs); | ||
| if (!(baseSubtotalQuery.getLimitSpec() instanceof NoopLimitSpec)) { | ||
| Set<String> columns = new HashSet<>(aggsAndPostAggs); | ||
| columns.addAll(subtotalSpec); | ||
|
|
||
| subtotalQueryLimitSpec = queryWithoutSubtotalsSpec.getLimitSpec().filterColumns(columns); | ||
| subtotalQueryLimitSpec = baseSubtotalQuery.getLimitSpec().filterColumns(columns); | ||
| } | ||
|
|
||
| GroupByQuery subtotalQuery = queryWithoutSubtotalsSpec | ||
| GroupByQuery subtotalQuery = baseSubtotalQuery | ||
| .withLimitSpec(subtotalQueryLimitSpec) | ||
| .withDimensionSpecs(newDimensions); | ||
|
|
||
|
|
@@ -451,7 +456,7 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
| // Also note, we can't create the ResultSupplier eagerly here or as we don't want to eagerly allocate | ||
| // merge buffers for processing subtotal. | ||
| Supplier<GroupByRowProcessor.ResultSupplier> resultSupplierTwo = () -> GroupByRowProcessor.process( | ||
| queryWithoutSubtotalsSpec, | ||
| baseSubtotalQuery, | ||
| subtotalQuery, | ||
| resultSupplierOneFinal.results(subtotalSpec), | ||
| configSupplier.get(), | ||
|
|
@@ -468,7 +473,7 @@ public Sequence<ResultRow> processSubtotalsSpec( | |
| } | ||
|
|
||
| return Sequences.withBaggage( | ||
| Sequences.concat(subtotalsResults), | ||
| query.postProcess(Sequences.concat(subtotalsResults)), | ||
| resultSupplierOne //this will close resources allocated by resultSupplierOne after sequence read | ||
| ); | ||
| } | ||
|
|
@@ -489,21 +494,17 @@ private Sequence<ResultRow> processSubtotalsResultAndOptionallyClose( | |
| // on sequence read if closeOnSequenceRead is true. | ||
| try { | ||
| Supplier<GroupByRowProcessor.ResultSupplier> memoizedSupplier = Suppliers.memoize(baseResultsSupplier); | ||
| return applyPostProcessing( | ||
| mergeResults( | ||
| (queryPlus, responseContext) -> | ||
| new LazySequence<>( | ||
| () -> Sequences.withBaggage( | ||
| memoizedSupplier.get().results(dimsToInclude), | ||
| closeOnSequenceRead ? () -> CloseQuietly.close(memoizedSupplier.get()) : () -> {} | ||
| ) | ||
| ), | ||
| subtotalQuery, | ||
| null | ||
| ), | ||
| subtotalQuery | ||
| return mergeResults( | ||
| (queryPlus, responseContext) -> | ||
| new LazySequence<>( | ||
| () -> Sequences.withBaggage( | ||
| memoizedSupplier.get().results(dimsToInclude), | ||
| closeOnSequenceRead ? () -> CloseQuietly.close(memoizedSupplier.get()) : () -> {} | ||
| ) | ||
| ), | ||
| subtotalQuery, | ||
| null | ||
| ); | ||
|
|
||
| } | ||
| catch (Exception ex) { | ||
| CloseQuietly.close(baseResultsSupplier.get()); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool... I learnt some new SQL today 👍
Based on the examples below, the parenthesis are required correct?
I think in other places in the docs parenthesis don't mean literal characters, but you want them to be literal characters here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parentheses mean literal characters here and everywhere else they appear in this example. The brackets, however, don't.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thrown off by this in the docs
[ EXPLAIN PLAN FOR ] [ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]I didn't realize the parenthesis were required in this statement . I couldn't get an EXPLAIN query to work locally for me in the UI.