Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion docs/querying/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Druid SQL supports SELECT queries with the following structure:
SELECT [ ALL | DISTINCT ] { * | exprs }
FROM table
[ WHERE expr ]
[ GROUP BY exprs ]
[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool... I learnt some new SQL today 👍

Suggested change
[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]
[ GROUP BY [ exprs | GROUPING SETS ( \(exprs\), ... ) | ROLLUP \(exprs\) | CUBE \(exprs\) ] ]

Based on the examples below, the parenthesis are required correct?

I think in other places in the docs parenthesis don't mean literal characters, but you want them to be literal characters here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parentheses mean literal characters here and everywhere else they appear in this example. The brackets, however, don't.

Copy link
Copy Markdown
Contributor

@suneet-s suneet-s Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thrown off by this in the docs

[ EXPLAIN PLAN FOR ] [ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]

I didn't realize the parenthesis were required in this statement . I couldn't get an EXPLAIN query to work locally for me in the UI.

[ HAVING expr ]
[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]
[ LIMIT limit ]
Expand All @@ -86,6 +86,22 @@ trigger an aggregation query using one of Druid's [three native aggregation quer
can refer to an expression or a select clause ordinal position (like `GROUP BY 2` to group by the second selected
column).

The GROUP BY clause can also refer to multiple grouping sets in three ways. The most flexible is GROUP BY GROUPING SETS,
for example `GROUP BY GROUPING SETS ( (country, city), () )`. This example is equivalent to a `GROUP BY country, city`
followed by `GROUP BY ()` (a grand total). With GROUPING SETS, the underlying data is only scanned one time, leading to
better efficiency. Second, GROUP BY ROLLUP computes a grouping set for each level of the grouping expressions. For
example `GROUP BY ROLLUP (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), () )`
and will produce grouped rows for each country / city pair, along with subtotals for each country, along with a grand
total. Finally, GROUP BY CUBE computes a grouping set for each combination of grouping expressions. For example,
`GROUP BY CUBE (country, city)` is equivalent to `GROUP BY GROUPING SETS ( (country, city), (country), (city), () )`.
Grouping columns that do not apply to a particular row will contain `NULL`. For example, when computing
`GROUP BY GROUPING SETS ( (country, city), () )`, the grand total row corresponding to `()` will have `NULL` for the
"country" and "city" columns.

When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using GROUP BY GROUPING SETS, GROUP BY ROLLUP, or GROUP BY CUBE, be aware that results may not be generated in the
When using `GROUP BY GROUPING SETS`, `GROUP BY ROLLUP`, or `GROUP BY CUBE`, be aware that results may not be generated in the

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other paragraphs in this section use plain text for simple keywords and preformatted text for examples; see https://druid.apache.org/docs/latest/querying/sql.html.

This can be changed but I wanted to keep it consistent in this patch.

What do you think?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep to the prevailing convention of plain text for inline keywords, with all caps providing the formatting distinction.

The underlying point is taken though—a series of keywords like this makes the text a little hard to scan. One possible workaround, if the problem warrants it, would be to restructure the sentence to put the keywords in bullet format. For example --

When using the following, be aware that results may not be generated in the order that you specify your grouping sets in the query:

  • GROUP BY GROUPING SETS
  • GROUP BY ROLLUP
  • GROUP BY CUBE

If you need results... "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change if this is an existing pattern

order that you specify your grouping sets in the query. If you need results to be generated in a particular order, use
the ORDER BY clause.

The HAVING clause refers to columns that are present after execution of GROUP BY. It can be used to filter on either
grouping expressions or aggregated values. It can only be used together with GROUP BY.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -794,6 +794,11 @@ public GroupByQuery withQuerySegmentSpec(QuerySegmentSpec spec)
return new Builder(this).setQuerySegmentSpec(spec).build();
}

public GroupByQuery withVirtualColumns(final VirtualColumns virtualColumns)
{
return new Builder(this).setVirtualColumns(virtualColumns).build();
}

public GroupByQuery withDimFilter(@Nullable final DimFilter dimFilter)
{
return new Builder(this).setDimFilter(dimFilter).build();
Expand Down Expand Up @@ -1198,6 +1203,7 @@ public String toString()
", dimensions=" + dimensions +
", aggregatorSpecs=" + aggregatorSpecs +
", postAggregatorSpecs=" + postAggregatorSpecs +
(subtotalsSpec != null ? (", subtotalsSpec=" + subtotalsSpec) : "") +
", havingSpec=" + havingSpec +
", context=" + getContext() +
'}';
Expand All @@ -1222,7 +1228,8 @@ public boolean equals(final Object o)
Objects.equals(dimFilter, that.dimFilter) &&
Objects.equals(dimensions, that.dimensions) &&
Objects.equals(aggregatorSpecs, that.aggregatorSpecs) &&
Objects.equals(postAggregatorSpecs, that.postAggregatorSpecs);
Objects.equals(postAggregatorSpecs, that.postAggregatorSpecs) &&
Objects.equals(subtotalsSpec, that.subtotalsSpec);
}

@Override
Expand All @@ -1236,7 +1243,8 @@ public int hashCode()
dimFilter,
dimensions,
aggregatorSpecs,
postAggregatorSpecs
postAggregatorSpecs,
subtotalsSpec
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EqualsVerifier Test for this please

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added it.

);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
import org.apache.druid.query.groupby.resource.GroupByQueryResource;
import org.apache.druid.query.spec.MultipleIntervalSegmentSpec;
import org.apache.druid.segment.StorageAdapter;
import org.apache.druid.segment.VirtualColumns;

import java.nio.ByteBuffer;
import java.util.ArrayList;
Expand Down Expand Up @@ -355,7 +356,11 @@ public Sequence<ResultRow> processSubtotalsSpec(
GroupByRowProcessor.ResultSupplier resultSupplierOne = null;

try {
GroupByQuery queryWithoutSubtotalsSpec = query
// baseSubtotalQuery is the original query with dimensions and aggregators rewritten to apply to the *results*
// rather than *inputs* of that query. It has its virtual columns and dim filter removed, because those only
// make sense when applied to inputs. Finally, it has subtotalsSpec removed, since we'll be computing them
// one-by-one soon enough.
GroupByQuery baseSubtotalQuery = query
.withDimensionSpecs(query.getDimensions().stream().map(
dimSpec -> new DefaultDimensionSpec(
dimSpec.getOutputName(),
Expand All @@ -369,13 +374,13 @@ public Sequence<ResultRow> processSubtotalsSpec(
.map(AggregatorFactory::getCombiningFactory)
.collect(Collectors.toList())
)
.withSubtotalsSpec(null)
.withDimFilter(null);

.withVirtualColumns(VirtualColumns.EMPTY)
.withDimFilter(null)
.withSubtotalsSpec(null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really part of this change - maybe worth filing an issue for. The with... interface returns a query object, so chaining doesn't really make sense here. Maybe we should consider making that return the Builder so callers can call .build() when they're done setting all the fields. In this example it looks like it creates 4 intermediate GroupByQyery objects and a Builder for each object

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by 'make sense'? It makes sense to me in that you have an immutable object (the GroupByQuery) and you are using chaining to create a derived copy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have phrased that better.

I meant that there seems to be a lot of overlap between the .with... functions and the GroupByQuery.Builder functions. When I saw that pattern, I was wondering how do you choose between the 2. And since both exist, I may not pick the correct one.

Nothing to change here - just my stream of thought while reading the code.


resultSupplierOne = GroupByRowProcessor.process(
queryWithoutSubtotalsSpec,
queryWithoutSubtotalsSpec,
baseSubtotalQuery,
baseSubtotalQuery,
queryResult,
configSupplier.get(),
resource,
Expand All @@ -384,13 +389,13 @@ public Sequence<ResultRow> processSubtotalsSpec(
processingConfig.intermediateComputeSizeBytes()
);

List<String> queryDimNames = queryWithoutSubtotalsSpec.getDimensions().stream().map(DimensionSpec::getOutputName)
.collect(Collectors.toList());
List<String> queryDimNames = baseSubtotalQuery.getDimensions().stream().map(DimensionSpec::getOutputName)
.collect(Collectors.toList());

// Only needed to make LimitSpec.filterColumns(..) call later in case base query has a non default LimitSpec.
Set<String> aggsAndPostAggs = null;
if (queryWithoutSubtotalsSpec.getLimitSpec() != null && !(queryWithoutSubtotalsSpec.getLimitSpec() instanceof NoopLimitSpec)) {
aggsAndPostAggs = getAggregatorAndPostAggregatorNames(queryWithoutSubtotalsSpec);
if (!(baseSubtotalQuery.getLimitSpec() instanceof NoopLimitSpec)) {
aggsAndPostAggs = getAggregatorAndPostAggregatorNames(baseSubtotalQuery);
}

List<List<String>> subtotals = query.getSubtotalsSpec();
Expand Down Expand Up @@ -425,14 +430,14 @@ public Sequence<ResultRow> processSubtotalsSpec(

// Create appropriate LimitSpec for subtotal query
LimitSpec subtotalQueryLimitSpec = NoopLimitSpec.instance();
if (queryWithoutSubtotalsSpec.getLimitSpec() != null && !(queryWithoutSubtotalsSpec.getLimitSpec() instanceof NoopLimitSpec)) {
Set<String> columns = new HashSet(aggsAndPostAggs);
if (!(baseSubtotalQuery.getLimitSpec() instanceof NoopLimitSpec)) {
Set<String> columns = new HashSet<>(aggsAndPostAggs);
columns.addAll(subtotalSpec);

subtotalQueryLimitSpec = queryWithoutSubtotalsSpec.getLimitSpec().filterColumns(columns);
subtotalQueryLimitSpec = baseSubtotalQuery.getLimitSpec().filterColumns(columns);
}

GroupByQuery subtotalQuery = queryWithoutSubtotalsSpec
GroupByQuery subtotalQuery = baseSubtotalQuery
.withLimitSpec(subtotalQueryLimitSpec)
.withDimensionSpecs(newDimensions);

Expand All @@ -451,7 +456,7 @@ public Sequence<ResultRow> processSubtotalsSpec(
// Also note, we can't create the ResultSupplier eagerly here or as we don't want to eagerly allocate
// merge buffers for processing subtotal.
Supplier<GroupByRowProcessor.ResultSupplier> resultSupplierTwo = () -> GroupByRowProcessor.process(
queryWithoutSubtotalsSpec,
baseSubtotalQuery,
subtotalQuery,
resultSupplierOneFinal.results(subtotalSpec),
configSupplier.get(),
Expand All @@ -468,7 +473,7 @@ public Sequence<ResultRow> processSubtotalsSpec(
}

return Sequences.withBaggage(
Sequences.concat(subtotalsResults),
query.postProcess(Sequences.concat(subtotalsResults)),
resultSupplierOne //this will close resources allocated by resultSupplierOne after sequence read
);
}
Expand All @@ -489,21 +494,17 @@ private Sequence<ResultRow> processSubtotalsResultAndOptionallyClose(
// on sequence read if closeOnSequenceRead is true.
try {
Supplier<GroupByRowProcessor.ResultSupplier> memoizedSupplier = Suppliers.memoize(baseResultsSupplier);
return applyPostProcessing(
mergeResults(
(queryPlus, responseContext) ->
new LazySequence<>(
() -> Sequences.withBaggage(
memoizedSupplier.get().results(dimsToInclude),
closeOnSequenceRead ? () -> CloseQuietly.close(memoizedSupplier.get()) : () -> {}
)
),
subtotalQuery,
null
),
subtotalQuery
return mergeResults(
(queryPlus, responseContext) ->
new LazySequence<>(
() -> Sequences.withBaggage(
memoizedSupplier.get().results(dimsToInclude),
closeOnSequenceRead ? () -> CloseQuietly.close(memoizedSupplier.get()) : () -> {}
)
),
subtotalQuery,
null
);

}
catch (Exception ex) {
CloseQuietly.close(baseResultsSupplier.get());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,8 @@ public ByteBuffer get()
@Parameterized.Parameters(name = "{0}")
public static Collection<Object[]> constructorFeeder()
{
NullHandling.initializeForTests();

final List<Object[]> constructors = new ArrayList<>();
for (GroupByQueryConfig config : testConfigs()) {
final Pair<GroupByQueryRunnerFactory, Closer> factoryAndCloser = makeQueryRunnerFactory(config);
Expand Down Expand Up @@ -7196,38 +7198,13 @@ public void testGroupByWithSubtotalsSpecWithOrderLimit()
.addOrderByColumn("idx")
.addOrderByColumn("alias")
.addOrderByColumn("market")
.setLimit(1)
.setLimit(3)
.build();

List<ResultRow> expectedResults = Arrays.asList(
makeRow(
query,
"2011-04-01",
"alias",
"technology",
"rows",
1L,
"idx",
78L
),
makeRow(
query,
"2011-04-01T00:00:00.000Z",
"market",
"spot",
"rows",
9L,
"idx",
1102L
),
makeRow(
query,
"2011-04-01T00:00:00.000Z",
"rows",
13L,
"idx",
6619L
)
makeRow(query, "2011-04-01", "alias", "technology", "rows", 1L, "idx", 78L),
makeRow(query, "2011-04-01", "alias", "business", "rows", 1L, "idx", 118L),
makeRow(query, "2011-04-01", "alias", "travel", "rows", 1L, "idx", 119L)
);

Iterable<ResultRow> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.Ordering;
import nl.jqno.equalsverifier.EqualsVerifier;
import nl.jqno.equalsverifier.Warning;
import org.apache.druid.java.util.common.Intervals;
import org.apache.druid.java.util.common.granularity.Granularities;
import org.apache.druid.query.BaseQuery;
Expand Down Expand Up @@ -121,4 +123,23 @@ public void testSegmentLookUpForNestedQueries()
.build();
Assert.assertEquals(innerQuerySegmentSpec, BaseQuery.getQuerySegmentSpecForLookUp(query));
}

@Test
public void testEquals()
{
EqualsVerifier.forClass(GroupByQuery.class)
.usingGetClass()
// The 'duration' field is used by equals via getDuration(), which computes it lazily in a way
// that confuses EqualsVerifier.
.suppress(Warning.NULL_FIELDS, Warning.NONFINAL_FIELDS)
// Fields derived from other fields are not included in equals/hashCode
.withIgnoredFields(
"applyLimitPushDown",
"postProcessingFn",
"resultRowOrder",
"resultRowPositionLookup",
"universalTimestamp"
)
.verify();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,8 @@ private static List<RelOptRule> baseRuleSet(final PlannerContext plannerContext)
rules.addAll(SUB_QUERY_REMOVE_RULES);

if (!plannerConfig.isUseApproximateCountDistinct()) {
// We'll need this to expand COUNT DISTINCTs.
// Avoid AggregateExpandDistinctAggregatesRule.INSTANCE; it uses grouping sets and we don't support those.
// For some reason, even though we support grouping sets, using AggregateExpandDistinctAggregatesRule.INSTANCE
// here causes CalciteQueryTest#testExactCountDistinctWithGroupingAndOtherAggregators to fail.
rules.add(AggregateExpandDistinctAggregatesRule.JOIN);
}

Expand Down
Loading