SQL: More straightforward handling of join planning. by gianm · Pull Request #9648 · apache/druid

gianm · 2020-04-08T11:40:39Z

Two changes that simplify how joins are planned:

Stop using JoinProjectTransposeRule as a way of guiding subquery
removal. Instead, add logic to DruidJoinRule that identifies removable
subqueries and removes them at the point of creating a DruidJoinQueryRel.
This approach reduces the size of the planning space and allows the
planner to complete quickly.
Remove rules that reorder joins. Not because of an impact on the
planning time (it seems minimal), but because the decisions that the
planner was making in the new tests were sometimes worse than the
user-provided order. I think we'll need to go with the user-provided
order for now, and revisit reordering when we can add more smarts to
the cost estimator.

A third change updates numeric ExprEval classes to store their
value as a boxed type that corresponds to what it is supposed to be.
This is useful because it affects the behavior of "asString", and
is included in this patch because it is needed for the new test
"testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This
test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an
actual double would.

Fixes #9646.

Two changes that simplify how joins are planned: 1) Stop using JoinProjectTransposeRule as a way of guiding subquery removal. Instead, add logic to DruidJoinRule that identifies removable subqueries and removes them at the point of creating a DruidJoinQueryRel. This approach reduces the size of the planning space and allows the planner to complete quickly. 2) Remove rules that reorder joins. Not because of an impact on the planning time (it seems minimal), but because the decisions that the planner was making in the new tests were sometimes worse than the user-provided order. I think we'll need to go with the user-provided order for now, and revisit reordering when we can add more smarts to the cost estimator. A third change updates numeric ExprEval classes to store their value as a boxed type that corresponds to what it is supposed to be. This is useful because it affects the behavior of "asString", and is included in this patch because it is needed for the new test "testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an actual double would. Fixes apache#9646.

clintropolis

🤘

clintropolis · 2020-04-09T10:16:36Z

+      this.literalSubConditions = literalSubConditions;
+    }
+
+    public ConditionAnalysis pushThroughLeftProject(final Project leftProject)


nit: javadocs might make these a bit friendlier (comments and javadoc are very nice up to this point)

Yeah, good call. If you don't mind I'd like to add them in a follow up patch, because this patch is release-blocking.

clintropolis · 2020-04-09T10:23:51Z

+      return new ConditionAnalysis(
+          leftProject.getInput().getRowType().getFieldCount(),
+          equalitySubConditions
+              .stream()


I can't think of any specific examples off the top of my head, but I know we've seen a handful of performance issues using .stream(); I don't think anything needs to change right now, just commenting to increase the chances I'll remember this area as much as anything else as a potential area to look at to make small gains in planning time in the future.

I don't imagine these collections will be so large, so maybe isn't a very big deal, but then again could maybe add up depending how planning goes and how many times this stuff gets evaluated.

I have some benchmark results for stream (StreamBenchmark and HighlyNestedStreamBenchmark).

StreamBenchmark.flatMapIterator avgt 10 9.584 ± 0.055 ms/op StreamBenchmark.flatMapSequenceAccumulate avgt 10 5.908 ± 0.041 ms/op StreamBenchmark.flatMapSequenceYielder avgt 10 22.976 ± 0.064 ms/op StreamBenchmark.flatMapStream avgt 10 6.249 ± 0.050 ms/op StreamBenchmark.fluentIteratorToList avgt 10 18.455 ± 0.254 ms/op StreamBenchmark.sequenceToList avgt 10 14.326 ± 0.662 ms/op StreamBenchmark.streamToList avgt 10 17.720 ± 0.173 ms/op StreamBenchmark.sumIterator avgt 10 2.648 ± 0.023 ms/op <-- native for loop StreamBenchmark.sumIteratorFlatMap avgt 10 7.270 ± 0.095 ms/op StreamBenchmark.sumSequence avgt 10 6.654 ± 0.137 ms/op StreamBenchmark.sumStream avgt 10 2.162 ± 0.048 ms/op

Benchmark Mode Cnt Score Error Units HighlyNestedStreamBenchmark.flatMapIterator avgt 10 2413.571 ± 15.151 ms/op HighlyNestedStreamBenchmark.flatMapSequenceAccumulate avgt 10 1157.817 ± 13.575 ms/op HighlyNestedStreamBenchmark.flatMapSequenceYielder avgt 10 6308.657 ± 28.802 ms/op HighlyNestedStreamBenchmark.flatMapStream avgt 10 953.539 ± 6.755 ms/op HighlyNestedStreamBenchmark.sumIteratorFlatMap avgt 10 2129.499 ± 20.541 ms/op HighlyNestedStreamBenchmark.sumNestedFor avgt 10 297.307 ± 0.626 ms/op <-- native for loop HighlyNestedStreamBenchmark.sumSequence avgt 10 1503.816 ± 5.720 ms/op HighlyNestedStreamBenchmark.sumStream avgt 10 1136.636 ± 12.353 ms/op

I did notice that stream is sometimes bad and these benchmarks were to see what makes it bad. Unfortunately, these benchmarks show that stream is pretty good for computing a sum, it can compete with even native for loop when the stream is not highly nested. For highly nested stream, it's worse than native for loop, but still best among others. For toList, stream seems worse than others which I'm not sure how much it matters.

I'm suspecting that stream could be very efficient in these benchmarks because the benchmark code could be easily inlined with stream. I was planning to do another benchmark with a map function which is complex enough to hinder inlining, but haven't done yet.

Personally my strategy has been to use whatever seems most readable for code that isn't performance critical (small collections, not called tons of times). And for situations where performance is critical then to benchmark.

clintropolis · 2020-04-09T10:27:18Z

+   * inlined and then the join must run on the Broker.
   */
-  static final double MULTIPLIER_JOIN_SUBQUERY = 1000000000;
+  static final double COST_JOIN_SUBQUERY = 1e5;


Out of curiosity where did these numbers come from? Experiments I guess?

Mostly I made them up, but then verified through experiments that they achieve the plans that we want to achieve.

suneet-s · 2020-04-09T05:38:21Z

+
+    if (joinRel.getCondition().isA(SqlKind.LITERAL) && !joinRel.getCondition().isAlwaysFalse()) {
+      cost += CostEstimates.COST_JOIN_CROSS;
+    }


should we add a branch to set the cost to 0 if the joinCondition is a literal and is always false?

since a false join condition means nothing will match therefore you don't need to do work for either the left or the right hand side?

Currently, the native query join handling isn't smart enough to totally eliminate a join that has a false condition. It will still evaluate the left and right hand sides if they are subqueries. And it will still walk through every row on the left hand side. So I think it is fair to keep considering these costs in the cost estimator.

ah makes sense

suneet-s · 2020-04-09T05:50:36Z

+    if (child instanceof DruidRel) {
+      return (DruidRel<?>) child;
+    } else {
+      final RelSubset subset = (RelSubset) child;


how do we know this will be a RelSubset? I couldn't trace that path down here

My understanding of Calcite's planner tells me that the children will either be single rels or will be a subset of equivalent rels. So DruidRel and RelSubset are the two cases that can happen.

I hope I understand the planner correctly — I can't point to any specific code that guarantees what I am saying is true.

when I wrote this, I was wondering whether bindable parameters would change the node here somehow. The calcite query tests are pretty comprehensive around different types of JOINs and nested queries, that I feel pretty confident to agree with your understanding. Any chance you can test this with parameterized sql.

I will look at adding some tests for this in the same followup as #9648 (comment).

"subset of equivalent rels" -> Does this imply that the cost of each RelNode will be equal? I saw that computeLeftRequiresSubquery can returns different depending on which child we pick from the list of RelList. This then result in a very very different cost since a child that results in requiring subquery will have very high cost and a child that doesnt will have a much lower cost.

More specifically, I saw two RelNode in the list. One RelNode has a filter = null and the other has "filter":{"type":"selector","dimension":"v","value":"xa","extractionFn":null}

suneet-s

Thanks for the explanations. LGTM.

Don't consider any of my comments blockers - they're more just ramblings of a mad man :)

suneet-s · 2020-04-09T21:26:05Z

+
+    if (joinRel.getCondition().isA(SqlKind.LITERAL) && !joinRel.getCondition().isAlwaysFalse()) {
+      cost += CostEstimates.COST_JOIN_CROSS;
+    }


ah makes sense

suneet-s · 2020-04-09T21:27:52Z

+    if (child instanceof DruidRel) {
+      return (DruidRel<?>) child;
+    } else {
+      final RelSubset subset = (RelSubset) child;


when I wrote this, I was wondering whether bindable parameters would change the node here somehow. The calcite query tests are pretty comprehensive around different types of JOINs and nested queries, that I feel pretty confident to agree with your understanding. Any chance you can test this with parameterized sql.

jihoonson

LGTM

* SQL: More straightforward handling of join planning. Two changes that simplify how joins are planned: 1) Stop using JoinProjectTransposeRule as a way of guiding subquery removal. Instead, add logic to DruidJoinRule that identifies removable subqueries and removes them at the point of creating a DruidJoinQueryRel. This approach reduces the size of the planning space and allows the planner to complete quickly. 2) Remove rules that reorder joins. Not because of an impact on the planning time (it seems minimal), but because the decisions that the planner was making in the new tests were sometimes worse than the user-provided order. I think we'll need to go with the user-provided order for now, and revisit reordering when we can add more smarts to the cost estimator. A third change updates numeric ExprEval classes to store their value as a boxed type that corresponds to what it is supposed to be. This is useful because it affects the behavior of "asString", and is included in this patch because it is needed for the new test "testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an actual double would. Fixes apache#9646. * Fix comments. * Fix tests.

* SQL: More straightforward handling of join planning. Two changes that simplify how joins are planned: 1) Stop using JoinProjectTransposeRule as a way of guiding subquery removal. Instead, add logic to DruidJoinRule that identifies removable subqueries and removes them at the point of creating a DruidJoinQueryRel. This approach reduces the size of the planning space and allows the planner to complete quickly. 2) Remove rules that reorder joins. Not because of an impact on the planning time (it seems minimal), but because the decisions that the planner was making in the new tests were sometimes worse than the user-provided order. I think we'll need to go with the user-provided order for now, and revisit reordering when we can add more smarts to the cost estimator. A third change updates numeric ExprEval classes to store their value as a boxed type that corresponds to what it is supposed to be. This is useful because it affects the behavior of "asString", and is included in this patch because it is needed for the new test "testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an actual double would. Fixes #9646. * Fix comments. * Fix tests.

* SQL: More straightforward handling of join planning. Two changes that simplify how joins are planned: 1) Stop using JoinProjectTransposeRule as a way of guiding subquery removal. Instead, add logic to DruidJoinRule that identifies removable subqueries and removes them at the point of creating a DruidJoinQueryRel. This approach reduces the size of the planning space and allows the planner to complete quickly. 2) Remove rules that reorder joins. Not because of an impact on the planning time (it seems minimal), but because the decisions that the planner was making in the new tests were sometimes worse than the user-provided order. I think we'll need to go with the user-provided order for now, and revisit reordering when we can add more smarts to the cost estimator. A third change updates numeric ExprEval classes to store their value as a boxed type that corresponds to what it is supposed to be. This is useful because it affects the behavior of "asString", and is included in this patch because it is needed for the new test "testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an actual double would. Fixes apache#9646. * Fix comments. * Fix tests.

gianm added the Area - SQL label Apr 8, 2020

gianm added this to the 0.18.0 milestone Apr 8, 2020

gianm force-pushed the joins-planner-fixes branch from 5cf1e12 to b35fc89 Compare April 8, 2020 11:43

Fix comments.

37bdd60

jihoonson added the Bug label Apr 8, 2020

Fix tests.

915c324

clintropolis approved these changes Apr 9, 2020

View reviewed changes

suneet-s reviewed Apr 9, 2020

View reviewed changes

suneet-s approved these changes Apr 9, 2020

View reviewed changes

jihoonson approved these changes Apr 9, 2020

View reviewed changes

gianm merged commit 75c543b into apache:master Apr 9, 2020

gianm deleted the joins-planner-fixes branch April 9, 2020 23:21

gianm mentioned this pull request Apr 9, 2020

[Backport] SQL: More straightforward handling of join planning. #9659

Merged

gianm mentioned this pull request May 7, 2020

Bad plan for table-lookup-lookup join with filter on first lookup and outer limit #9773

Merged

8 tasks

Conversation

gianm commented Apr 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suneet-s Apr 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maytasm Apr 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suneet-s left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gianm commented Apr 8, 2020 •

edited

Loading

suneet-s Apr 9, 2020 •

edited

Loading

maytasm Apr 25, 2020 •

edited

Loading