Skip to content

SQL planning: Consider subqueries in fewer scenarios.#14123

Merged
gianm merged 3 commits intoapache:masterfrom
gianm:sql-consider-fewer-subqueries
Apr 21, 2023
Merged

SQL planning: Consider subqueries in fewer scenarios.#14123
gianm merged 3 commits intoapache:masterfrom
gianm:sql-consider-fewer-subqueries

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Apr 20, 2023

Further adjusts logic in DruidRules that was previously adjusted in #13902. The reason for the original change was that the comment "Subquery must be a groupBy, so stage must be >= AGGREGATE" was no longer accurate. Subqueries do not need to be groupBy anymore; they can really be any type of query. If I recall correctly, the change was needed for certain window queries to be able to plan on top of Scan queries.

However, this impacts performance negatively, because it causes many additional outer-query scenarios to be considered, which is expensive.

So, this patch updates the matching logic to consider fewer scenarios. The skipped scenarios are ones where we expect that, for one reason or another, it isn't necessary to consider a subquery.

Further adjusts logic in DruidRules that was previously adjusted in apache#13902.
The reason for the original change was that the comment "Subquery must be
a groupBy, so stage must be >= AGGREGATE" was no longer accurate. Subqueries
do not need to be groupBy anymore; they can really be any type of query.
If I recall correctly, the change was needed for certain window queries
to be able to plan on top of Scan queries.

However, this impacts performance negatively, because it causes many
additional outer-query scenarios to be considered, which is expensive.

So, this patch updates the matching logic to consider fewer scenarios. The
skipped scenarios are ones where we expect that, for one reason or another,
it isn't necessary to consider a subquery.
Copy link
Copy Markdown
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the various changes in the tests? They all seem to actually be better plans anyway (well, there's the addition of a filter on an effectively constant virtual column, which I hope primarily ends up as an always-true bitmap and effectively becomes a noop), so I don't think they are problematic. Just wondering if we know why they changed or if it's more of a "well, they changed, but not necessarily in a bad way, so consider it good"?

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Apr 20, 2023

What's the reason for the various changes in the tests? They all seem to actually be better plans anyway (well, there's the addition of a filter on an effectively constant virtual column, which I hope primarily ends up as an always-true bitmap and effectively becomes a noop), so I don't think they are problematic. Just wondering if we know why they changed or if it's more of a "well, they changed, but not necessarily in a bad way, so consider it good"?

Analysis:

  • CalciteQueryTest.testGroupByTimeFloorAndDimOnGroupByTimeFloorAndDim: We're no longer considering plans that involve a filter in an outer query when havingSpec in the inner query would do. So, in this one, an outer filter got pushed into an inner havingSpec.
  • CalciteJoinQueryTest.testLeftJoinOnTwoInlineDataSourcesWithTimeFilter, CalciteJoinQueryTest.testLeftJoinOnTwoInlineDataSourcesWithOuterWhere, CalciteJoinQueryTest.testInnerJoinOnTwoInlineDataSourcesWithOuterWhere: These don't use DruidOuterQueryRel (the rel corresponding to the rule that was changed). Likely, the new plan and old plan were considered equally good by the planner, and there's some path-dependency to which one gets picked. So, that's a "well, they changed, but not necessarily in a bad way, so consider it good".

@gianm gianm merged commit f643abd into apache:master Apr 21, 2023
gianm added a commit to gianm/druid that referenced this pull request Apr 21, 2023
* SQL planning: Consider subqueries in fewer scenarios.

Further adjusts logic in DruidRules that was previously adjusted in apache#13902.
The reason for the original change was that the comment "Subquery must be
a groupBy, so stage must be >= AGGREGATE" was no longer accurate. Subqueries
do not need to be groupBy anymore; they can really be any type of query.
If I recall correctly, the change was needed for certain window queries
to be able to plan on top of Scan queries.

However, this impacts performance negatively, because it causes many
additional outer-query scenarios to be considered, which is expensive.

So, this patch updates the matching logic to consider fewer scenarios. The
skipped scenarios are ones where we expect that, for one reason or another,
it isn't necessary to consider a subquery.

* Remove unnecessary escaping.

* Fix test.
@gianm gianm deleted the sql-consider-fewer-subqueries branch April 21, 2023 15:36
@abhishekagarwal87 abhishekagarwal87 added this to the 26.0 milestone Apr 21, 2023
vogievetsky pushed a commit that referenced this pull request Apr 21, 2023
* SQL planning: Consider subqueries in fewer scenarios.

Further adjusts logic in DruidRules that was previously adjusted in #13902.
The reason for the original change was that the comment "Subquery must be
a groupBy, so stage must be >= AGGREGATE" was no longer accurate. Subqueries
do not need to be groupBy anymore; they can really be any type of query.
If I recall correctly, the change was needed for certain window queries
to be able to plan on top of Scan queries.

However, this impacts performance negatively, because it causes many
additional outer-query scenarios to be considered, which is expensive.

So, this patch updates the matching logic to consider fewer scenarios. The
skipped scenarios are ones where we expect that, for one reason or another,
it isn't necessary to consider a subquery.

* Remove unnecessary escaping.

* Fix test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants