Decoupled planning: improve join support by kgyrtkirk · Pull Request #17039 · apache/druid

kgyrtkirk · 2024-09-12T10:20:44Z

There were some problematic cases

join branches are run with finalize=false instead of finalize=true like normal subqueries
- this inconsistency is not good - but fixing it is a bigger thing
ensure that right hand sides of joins are always subqueries - or accessible globally

To achieve the above:

operand indexes were needed for the upstream reltree nodes in the generator
source unwrapping now takes the join situation into account as well

This reverts commit 3becf05.

…lize-and-right

imply-cheddar · 2024-09-13T14:49:11Z

    this.vertexFactory = new PDQVertexFactory(plannerContext, rexBuilder);
  }

+  static class DruidNodeStack extends Stack<DruidLogicalNode>


Another option instead of maintaining another parallel Stack would be to add a mutable operandIndex field to DruidLogicalNode and walk the logical nodes assigning ids once before doing the translation.

I tend to think that mutating the object to have an id and then referencing that id is a bit less error prone than the parallel Stack approach, is there a reason that it cannot work?

I've tried different approaches; DruidLogicalNode -s are immutable - and supposed to describe a subtree - the index of that node in its parent is an external detail - so I don't feel it right to add it.

I've covered 2 stacks with DruidNodeStack - which was better as it made possible to introduce methods which could made the logic more readable.

And as alternate approach to attach the data more naturally I've added an Entry class which contains the index and the DruidLogicalNode - I've retained the custom methods...but it did disturbed some existing places a bit.

imply-cheddar · 2024-09-13T14:50:59Z

+    static JoinSupportTweaks analyze(DruidNodeStack stack)
+    {
+      if (stack.size() < 2) {
+        return NONE;
+      }
+      DruidLogicalNode possibleJoin = stack.get(stack.size() - 2);
+      if (!(possibleJoin instanceof DruidJoin)) {
+        return NONE;
+      }
+      if (stack.operandIndexStack.get(stack.size() - 1) == 1) {
+        return RIGHT;
+      } else {
+        return LEFT;
+      }
+    }


I find this method to be a bit obtuse. I think it's trying to figure out what to put on the right hand side (prioritizing queries on the right?). But, == 1 in the final if doesn't really carry semantic meaning for me. It would be nice if you could either adjust the code to be a bit more descriptive in what it's doing or add comments that explain the magic numbers.

part of the problem was that a Join may have the same subtree on both left and right - which mandated the need for the operandIndex.

By adding some usefull methods to DruidNodeStack ; these magic numbers and substraction have more or less disappeared :)

I don't think JoinSupportTweaks will be a long lived enum ; I just haven't figured it out yet how many more details I'll need later - I have a feeling that later this will become something like: VertexOptions with a set of booleans/etc....

imply-cheddar · 2024-09-13T14:54:11Z

+    EQUIV_PLAN_EXTRA_COLUMNS,
+    EQUIV_PLAN_CAST_MATERIALIZED_EARLIER;


I think I know what EXTRA_COLUMNS means, but CAST_MATERIALIZED_EARLIER I'm unclear on. Can you add the comments for these like you have on the others?

imply-cheddar · 2024-09-13T14:59:12Z

 +-------+----+
-| dummy |    |
 | dummy | b  |
+| dummy |    |


I'm surprised by this. null tends to sort first rather than last. This makes me think that SQL compat mode has inverted null comparisons? More than that, I really don't understand why your changes here would've caused this?

I wasn't taking a closer look as its a grouping on t1,t2 so it looked okayish :)
its very strange as even the native query is the same for the 2 variants - no idea why its different

diff

dev@decouple-joins:~/druid$ diff -Naur -U30 `f testGroupByWithLiteralInSubqueryGrouping*iq` --- ./sql/src/test/quidem/org.apache.druid.sql.calcite.DecoupledPlanningCalciteQueryTest/testGroupByWithLiteralInSubqueryGrouping@NullHandling=sql.iq 2024-09-13 19:29:16.364090523 +0000 +++ ./sql/src/test/quidem/org.apache.druid.sql.calcite.DecoupledPlanningCalciteQueryTest/testGroupByWithLiteralInSubqueryGrouping@NullHandling=default.iq 2024-09-13 09:08:49.372218528 +0000 @@ -1,64 +1,64 @@ -# testGroupByWithLiteralInSubqueryGrouping@NullHandling=sql case-crc:a63c5a2f +# testGroupByWithLiteralInSubqueryGrouping@NullHandling=default case-crc:a63c5a2f # quidem testcase reason: IMPROVED_PLAN !set debug true !set defaultTimeout 300000 !set maxScatterGatherBytes 9223372036854775807 !set plannerStrategy DECOUPLED !set sqlCurrentTimestamp 2000-01-01T00:00:00Z !set sqlQueryId dummy !set outputformat mysql !use druidtest:/// SELECT t1, t2 FROM ( SELECT 'dummy' as t1, CASE WHEN dim4 = 'b' THEN dim4 ELSE NULL END AS t2 FROM numfoo GROUP BY dim4 ) GROUP BY t1,t2 ; +-------+----+ | t1 | t2 | +-------+----+ -| dummy | b | | dummy | | +| dummy | b | +-------+----+ (2 rows) !ok LogicalProject(t1=['dummy'], t2=[$0]) LogicalAggregate(group=[{0}]) LogicalProject(t2=[CASE(=($0, 'b'), $0, null:VARCHAR)]) LogicalAggregate(group=[{4}]) LogicalTableScan(table=[[druid, numfoo]]) !logicalPlan DruidProject(t1=['dummy'], t2=[$0], druid=[logical]) DruidAggregate(group=[{0}], druid=[logical]) DruidProject(t2=[CASE(=($0, 'b'), $0, null:VARCHAR)], druid=[logical]) DruidAggregate(group=[{4}], druid=[logical]) DruidTableScan(table=[[druid, numfoo]], druid=[logical]) !druidPlan { "queryType" : "groupBy", "dataSource" : { "type" : "query", "query" : { "queryType" : "groupBy", "dataSource" : { "type" : "table", "name" : "numfoo" }, "intervals" : { "type" : "intervals",

This reverts commit 5f1412a.

kfaraz · 2024-10-04T14:41:24Z

Added to milestone 31 as this is required to fix compilation failures in the backport #17251

[ERROR] /Users/kfaraz/Workspace/github/druid/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:[7730,59] cannot find symbol
[ERROR]   symbol:   variable EQUIV_PLAN_EXTRA_COLUMNS
[ERROR]   location: class org.apache.druid.sql.calcite.DecoupledTestConfig.QuidemTestCaseReason
[ERROR]

There were some problematic cases join branches are run with finalize=false instead of finalize=true like normal subqueries this inconsistency is not good - but fixing it is a bigger thing ensure that right hand sides of joins are always subqueries - or accessible globally To achieve the above: operand indexes were needed for the upstream reltree nodes in the generator source unwrapping now takes the join situation into account as well

…) (#17251) * SQL: Use regular filters for time filtering in subqueries. (#17173) * RunWorkOrder: Account for two simultaneous statistics collectors. (#17216) * DartTableInputSpecSlicer: Fix for TLS workers. (#17224) * Upgrade avro - minor version (#17230) * SuperSorter: Don't set allDone if it's already set. (#17238) * Decoupled planning: improve join support (#17039) --------- Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Zoltan Haindrich <kirk@rxd.hu>

kgyrtkirk added 12 commits September 11, 2024 14:14

Revert "undo decoupled fix"

0281f37

This reverts commit 3becf05.

fix?

6d430fc

forceSubqQidea

30d375e

some

1a41475

u

a027d04

enable some shadows

7860aa3

fix style

0188bea

updates

e8e224a

u

9bf0d7f

remove unrelated

5e60508

add

cce8f02

add

99d85c5

github-actions Bot added the Area - Querying label Sep 12, 2024

missed updates

5e086e0

github-advanced-security AI found potential problems Sep 12, 2024

View reviewed changes

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/querygen/DruidQueryGenerator.java Fixed

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/querygen/DruidQueryGenerator.java Fixed

kgyrtkirk added 2 commits September 12, 2024 15:49

fix warnings

0cb422c

Merge remote-tracking branch 'apache/master' into decouple-joins-fina…

6df4cc8

…lize-and-right

imply-cheddar reviewed Sep 13, 2024

View reviewed changes

kgyrtkirk added 8 commits September 13, 2024 18:50

try

5f1412a

Revert "try"

8d2b71b

This reverts commit 5f1412a.

a bit better

6273a4e

add apidoc

6888f60

up

8dad313

up

65c65ab

add docs

4972aa3

undo

d4f4ef0

imply-cheddar approved these changes Sep 17, 2024

View reviewed changes

abhishekagarwal87 approved these changes Sep 18, 2024

View reviewed changes

abhishekagarwal87 merged commit d84d53c into apache:master Sep 18, 2024

kfaraz added this to the 31.0.0 milestone Oct 4, 2024

kfaraz mentioned this pull request Oct 4, 2024

[Backport] Patches (#17039) (#17173) (#17216) (#17224) (#17230) (#17238) #17251

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoupled planning: improve join support#17039

Decoupled planning: improve join support#17039
abhishekagarwal87 merged 23 commits intoapache:masterfrom
kgyrtkirk:decouple-joins-finalize-and-right

kgyrtkirk commented Sep 12, 2024

Uh oh!

Uh oh!

Uh oh!

imply-cheddar Sep 13, 2024

Uh oh!

kgyrtkirk Sep 13, 2024

Uh oh!

imply-cheddar Sep 13, 2024

Uh oh!

kgyrtkirk Sep 13, 2024

Uh oh!

imply-cheddar Sep 13, 2024

Uh oh!

imply-cheddar Sep 13, 2024

Uh oh!

kgyrtkirk Sep 13, 2024

Uh oh!

kfaraz commented Oct 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		EQUIV_PLAN_EXTRA_COLUMNS,
		EQUIV_PLAN_CAST_MATERIALIZED_EARLIER;

Conversation

kgyrtkirk commented Sep 12, 2024

Uh oh!

Uh oh!

Uh oh!

imply-cheddar Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

kgyrtkirk Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

imply-cheddar Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

kgyrtkirk Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

imply-cheddar Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

imply-cheddar Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

kgyrtkirk Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

kfaraz commented Oct 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants