remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query by somu-imply · Pull Request #15237 · apache/druid

somu-imply · 2023-10-23T23:41:57Z

Consider this query

with t AS (SELECT distinct(m2) as mo, APPROX_COUNT_DISTINCT(m1) / ( TIMESTAMPDIFF(DAY,min( __time ), CURRENT_TIMESTAMP  ) + 0.0000000001) as trend_score
FROM "foo"
GROUP BY 1
ORDER BY trend_score DESC
LIMIT 2)
select mo, (MAX(trend_score)) from t
where mo > 2
GROUP BY 1 
ORDER BY 2 DESC

This currently fails to plan with the stack trace

Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=DRUID, sort=[1 DESC].
Missing conversion is LogicalSort[convention: NONE -> DRUID]
There is 1 empty subset: rel#126:RelSubset#7.DRUID.[1 DESC], the relevant part of the original plan is as follows
124:LogicalSort(sort0=[$1], dir0=[DESC], fetch=[1001:BIGINT])
  120:LogicalFilter(subset=[rel#128:RelSubset#5.NONE.[]], condition=[>($0, 2)])
    118:LogicalSort(subset=[rel#119:RelSubset#4.NONE.[1 DESC]], sort0=[$1], dir0=[DESC], fetch=[2])
      116:LogicalProject(subset=[rel#117:RelSubset#3.NONE.[]], mo=[$0], trend_score=[/($1, +(CAST(/INT(Reinterpret(-(2023-10-17 23:46:28.328, $2)), 86400000)):INTEGER NOT NULL, 1E-10:DECIMAL(11, 10)))])
        114:LogicalAggregate(subset=[rel#115:RelSubset#2.NONE.[]], group=[{0}], agg#0=[APPROX_COUNT_DISTINCT($1)], agg#1=[MIN($2)])
          112:LogicalProject(subset=[rel#113:RelSubset#1.NONE.[]], mo=[$5], m1=[$4], __time=[$0])
            63:LogicalTableScan(subset=[rel#111:RelSubset#0.NONE.[]], table=[[druid, foo]])

Root: rel#126:RelSubset#7.DRUID.[1 DESC]
Original rel:
LogicalSort(subset=[rel#126:RelSubset#7.DRUID.[1 DESC]], sort0=[$1], dir0=[DESC], fetch=[1001:BIGINT]): rowcount = 1.0, cumulative cost = {1.0 rows, 20.0 cpu, 0.0 io}, id = 124
  LogicalFilter(subset=[rel#128:RelSubset#5.NONE.[]], condition=[>($0, 2)]): rowcount = 1.0, cumulative cost = {1.0 rows, 2.0 cpu, 0.0 io}, id = 120
    LogicalSort(subset=[rel#119:RelSubset#4.NONE.[1 DESC]], sort0=[$1], dir0=[DESC], fetch=[2]): rowcount = 2.0, cumulative cost = {2.0 rows, 200.0 cpu, 0.0 io}, id = 118
      LogicalProject(subset=[rel#117:RelSubset#3.NONE.[]], mo=[$0], trend_score=[/($1, +(CAST(/INT(Reinterpret(-(2023-10-17 23:46:28.328, $2)), 86400000)):INTEGER NOT NULL, 1E-10:DECIMAL(11, 10)))]): rowcount = 10.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 io}, id = 116
        LogicalAggregate(subset=[rel#115:RelSubset#2.NONE.[]], group=[{0}], agg#0=[APPROX_COUNT_DISTINCT($1)], agg#1=[MIN($2)]): rowcount = 10.0, cumulative cost = {12.5 rows, 0.0 cpu, 0.0 io}, id = 114
          LogicalProject(subset=[rel#113:RelSubset#1.NONE.[]], mo=[$5], m1=[$4], __time=[$0]): rowcount = 100.0, cumulative cost = {100.0 rows, 300.0 cpu, 0.0 io}, id = 112
            LogicalTableScan(subset=[rel#111:RelSubset#0.NONE.[]], table=[[druid, foo]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 63

        at org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:718) ~[calcite-core-1.35.0.jar:1.35.0]
        at org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:391) ~[calcite-core-1.35.0.jar:1.35.0]
        at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:533) ~[calcite-core-1.35.0.jar:1.35.0]
        at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:317) ~[calcite-core-1.35.0.jar:1.35.0]
        at org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:337) ~[calcite-core-1.35.0.jar:1.35.0]
        at org.apache.druid.sql.calcite.planner.CalcitePlanner.transform(CalcitePlanner.java:452) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
        at org.apache.druid.sql.calcite.planner.QueryHandler.planWithDruidConvention(QueryHandler.java:590) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
        at org.apache.druid.sql.calcite.planner.QueryHandler$SelectHandler.planForDruid(QueryHandler.java:738) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
        at org.apache.druid.sql.calcite.planner.QueryHandler.plan(QueryHandler.java:220) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
        ... 84 more

This is due to the fact that AggregateRemoveRule removes the aggregates and plans the query as a scan over a group by subquery with ordering on a non-time column which Druid does not support atm. In this PR we fix planning by removing the Aggregate remove rule which plans these queries as scan over group by with order by on a non-time column which Druid does not support.

Note: Removal of this rule is the worst case fix for this case. The best case would have been to allow druid to order by on non-time columns. With removal of this rule which would have removed the aggregate from the outer query if the same is present in the inner query we'll see some performance regression for those queries. However with this we ensure that everything in Druid works as same before the Calcite upgrade from 1.21 to 1.35.

This PR has:

somu-imply · 2023-10-23T23:49:35Z

 false	302662813.16785714333333333333
 false	304608131.82107142900000000000
-false	303537640.51502361272727272727
+false	303537640.515023651272727272727


These queries do not compare the query plan. The underlying query plan changed a bit due to removal of the rule hence the change in the terminating decimal places

clintropolis

sad since we lose some vectorization and it forces some queries into using subqueries, but this will at least let the queries work correctly for now. We really need to revisit this though since this isn't the ideal solution

…order by in outer query (apache#15237) * Fixing nested group by query with order by in outer query * Adding examples

somu-imply · 2023-10-26T17:09:51Z

This will be reverted back in #15241 as it introduces a better solution to solve the issue without needing to change plans for existing queries

…order by in outer query (apache#15237) * Fixing nested group by query with order by in outer query * Adding examples

Fixing nested group by query with order by in outer query

3572771

github-actions Bot added the Area - Querying label Oct 23, 2023

Adding examples

9dca217

somu-imply commented Oct 23, 2023

View reviewed changes

clintropolis changed the title ~~Fixing nested group by query with order by in outer query~~ remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query Oct 24, 2023

clintropolis approved these changes Oct 24, 2023

View reviewed changes

soumyava added this to the 28.0 milestone Oct 24, 2023

soumyava merged commit 06f40a0 into apache:master Oct 24, 2023

LakshSingla mentioned this pull request Oct 25, 2023

[Backport] (DONOT MERGE) remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query #15250

Closed

CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023

remove calcite AggregateRemoveRule to fix nested group by query with …

eaffc84

…order by in outer query (apache#15237) * Fixing nested group by query with order by in outer query * Adding examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query#15237

remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query#15237
soumyava merged 2 commits intoapache:masterfrom
somu-imply:removeAggRule

somu-imply commented Oct 23, 2023 •

edited by soumyava

Loading

Uh oh!

somu-imply Oct 23, 2023

Uh oh!

clintropolis left a comment

Uh oh!

somu-imply commented Oct 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

somu-imply commented Oct 23, 2023 • edited by soumyava Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

somu-imply Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

somu-imply commented Oct 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

somu-imply commented Oct 23, 2023 •

edited by soumyava

Loading