Return empty result when a group by gets optimized to a timeseries query by LakshSingla · Pull Request #12065 · apache/druid

LakshSingla · 2021-12-14T12:39:58Z

Description

Related to #11188

The above mentioned PR allowed timeseries queries to return a default result, when queries of type: select count(*) from table where dim1="_not_present_dim_" were executed. Before the PR, it returned no row, after the PR, it would return a row with value of count(*) as 0 (as expected by SQL standards of different dbs).

In Grouping#applyProject, we can sometimes perform optimization of a groupBy query to a timeseries query if possible (when the keys of the groupBy are constants, as generated by automated tools). For example, in select count(*) from table where dim1="_present_dim_" group by "dummy_key", the groupBy clause can be removed. However, in the case when the filter doesn't return anything, i.e. select count(*) from table where dim1="_not_present_dim_" group by "dummy_key", the behavior of general databases would be to return nothing, while druid (due to above change) returns an empty row. This PR aims to fix this divergence of behavior.

Example cases:

select count(*) from table where dim1="_not_present_dim_" group by "dummy_key".
CURRENT: Returns a row with count(*) = 0
EXPECTED: Return no row
select 'A', dim1 from foo where m1 = 123123 and dim1 = '_not_present_again_' group by dim1
CURRENT: Returns a row with ('A', 'wat')
EXPECTED: Return no row

To do this, a boolean droppedDimensionsWhileApplyingProject has been added to Grouping which is true whenever we make changes to the original shape with optimization. Hence if a timeseries query has a grouping with this set to true, we set skipEmptyBuckets=true in the query context (i.e. donot return any row).

Key changed/added classes in this PR

Grouping.java
DruidQuery#toTimeseriesQuery

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

abhishekagarwal87

Minor comments. LGTM otherwise.

abhishekagarwal87 · 2022-01-03T09:43:28Z

+  // When optimization in Grouping#applyProject is applied, and it reduces a Group By query to a timeseries, we
+  // want it to return empty bucket if no row matches
+  @Test
+  public void testReturnEmptyRowWhenGroupByIsConvertedToTimeseries() throws Exception


can you add few more queries such as one that groups on a dummy dimension

Added a few more test cases, which show the consequences of optimization in different cases.

abhishekagarwal87 · 2022-01-03T09:49:27Z

+  {
+    skipVectorize();
+    testQuery(
+        "SELECT 'A', dim1 from foo WHERE m1 = 50 AND dim1 = 'wat' GROUP BY dim1",


btw I noticed that it runs as a group-by query if we remove the m1 = 50 clause. do you know why?

On removing the m1=50 clause, the dim1 is not getting reduced to wat literal in the Calcite planner phase, so the optimization in Grouping.java to eliminate the literals is not getting applied.
I checked the place where this reduction is happening, and it's in the ProjectMergeRule of Calcite. When there's a single project, then ProjectMergeRule is not getting invoked.

abhishekagarwal87 · 2022-01-07T16:24:01Z

Merged since test failures are unrelated.

jihoonson · 2022-01-07T17:46:22Z

@abhishekagarwal87 I feel our test is too flaky recently. Can you file an issue for flaky tests in this PR, so that we can promote and fix it? If we have one already, please link it to here.

abhishekagarwal87 · 2022-01-07T18:17:25Z

Sure thing @jihoonson

jihoonson · 2022-01-07T18:22:24Z

Thank you!

skipEmptyBuckets=true when GroupBy is optimized to Timeseries

c0b28ae

asdf2014 added the Area - SQL label Dec 14, 2021

LakshSingla added 3 commits December 15, 2021 00:30

Trigger Build

4b15f14

Trigger Build

42efe63

Trigger Build

6c23bd6

LakshSingla marked this pull request as draft December 15, 2021 07:25

LakshSingla marked this pull request as ready for review December 15, 2021 07:25

abhishekagarwal87 reviewed Jan 3, 2022

View reviewed changes

LakshSingla added 3 commits January 4, 2022 11:45

Rename the flag

3b94cbf

Rename the flag, DruidQuery

c1417d8

Add test cases, improve flagname

8a4d8e6

abhishekagarwal87 approved these changes Jan 7, 2022

View reviewed changes

abhishekagarwal87 merged commit 7c17341 into apache:master Jan 7, 2022

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

abhishekagarwal87 mentioned this pull request May 25, 2022

[Draft] 0.23.0 Release notes #12510

Closed

LakshSingla deleted the grouping-skip-empty-bucket branch May 14, 2024 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return empty result when a group by gets optimized to a timeseries query#12065

Return empty result when a group by gets optimized to a timeseries query#12065
abhishekagarwal87 merged 7 commits intoapache:masterfrom
LakshSingla:grouping-skip-empty-bucket

LakshSingla commented Dec 14, 2021

Uh oh!

abhishekagarwal87 left a comment

Uh oh!

Uh oh!

abhishekagarwal87 Jan 3, 2022

Uh oh!

LakshSingla Jan 7, 2022

Uh oh!

abhishekagarwal87 Jan 3, 2022

Uh oh!

LakshSingla Jan 3, 2022 •

edited

Loading

Uh oh!

abhishekagarwal87 commented Jan 7, 2022

Uh oh!

jihoonson commented Jan 7, 2022

Uh oh!

abhishekagarwal87 commented Jan 7, 2022

Uh oh!

jihoonson commented Jan 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

LakshSingla commented Dec 14, 2021

Description

Key changed/added classes in this PR

Uh oh!

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abhishekagarwal87 Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

LakshSingla Jan 7, 2022

Choose a reason for hiding this comment

Uh oh!

abhishekagarwal87 Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

LakshSingla Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhishekagarwal87 commented Jan 7, 2022

Uh oh!

jihoonson commented Jan 7, 2022

Uh oh!

abhishekagarwal87 commented Jan 7, 2022

Uh oh!

jihoonson commented Jan 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LakshSingla Jan 3, 2022 •

edited

Loading