-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Fix row schema of nested queries with ORDER BY clause to get proper row type during planning validation #13173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cbbbcb7
86a435d
2167337
2ace39a
a410564
7a0a96e
b974335
1c25c15
d5cb451
79277de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,8 +32,8 @@ | |
| import org.apache.calcite.rel.metadata.RelMetadataQuery; | ||
| import org.apache.calcite.rel.type.RelDataType; | ||
| import org.apache.druid.java.util.common.StringUtils; | ||
| import org.apache.druid.query.Druids; | ||
| import org.apache.druid.query.QueryDataSource; | ||
| import org.apache.druid.query.TableDataSource; | ||
| import org.apache.druid.segment.column.RowSignature; | ||
| import org.apache.druid.sql.calcite.planner.PlannerContext; | ||
| import org.apache.druid.sql.calcite.table.RowSignatures; | ||
|
|
@@ -46,9 +46,7 @@ | |
| */ | ||
| public class DruidOuterQueryRel extends DruidRel<DruidOuterQueryRel> | ||
| { | ||
| private static final QueryDataSource DUMMY_DATA_SOURCE = new QueryDataSource( | ||
| Druids.newScanQueryBuilder().dataSource("__subquery__").eternityInterval().build() | ||
| ); | ||
| private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__"); | ||
|
|
||
| private final PartialDruidQuery partialQuery; | ||
| private RelNode sourceRel; | ||
|
|
@@ -116,12 +114,20 @@ public DruidQuery toDruidQuery(final boolean finalizeAggregations) | |
| @Override | ||
| public DruidQuery toDruidQueryForExplaining() | ||
| { | ||
| final RowSignature sourceRowSignature; | ||
| if (sourceRel instanceof DruidRel) { | ||
| final DruidQuery subQuery = ((DruidRel) sourceRel).toDruidQueryForExplaining(); | ||
| sourceRowSignature = subQuery.getOutputRowSignature(); | ||
| } else { | ||
| // fallback for cases other than group by inner query with order by with alias on __time column | ||
| sourceRowSignature = RowSignatures.fromRelDataType( | ||
| sourceRel.getRowType().getFieldNames(), | ||
| sourceRel.getRowType() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was a case encountered during aliasing. If a query was specified with an alias say for example
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If aliasing is used, Calcite should ensure that the signature of the inner If, on the other hand, we want the column name Suppose we have to know that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another concern. If we return as the row signature the signature of the source, we may miss computed columns: SELECT * FROM
SELECT __time AS t, myCol as c, a + b AS d FROM fooHere, the signature of the inner query is Let's be sure to test this case to validate the workaround.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the test |
||
| ); | ||
| } | ||
| return partialQuery.build( | ||
| DUMMY_DATA_SOURCE, | ||
| RowSignatures.fromRelDataType( | ||
| sourceRel.getRowType().getFieldNames(), | ||
| sourceRel.getRowType() | ||
| ), | ||
| sourceRowSignature, | ||
| getPlannerContext(), | ||
| getCluster().getRexBuilder(), | ||
| false | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1298,15 +1298,6 @@ private ScanQuery toScanQuery() | |
| ); | ||
| return null; | ||
| } | ||
| if (!dataSource.isConcrete()) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an unfortunate limitation. There is another PR to fix this limitation. I wonder, why was the check removed in this PR?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was introduced during #12418 as we moved to using a QueryDataSource. A TableDataSource always return true for |
||
| // Cannot handle this ordering. | ||
| // Scan cannot ORDER BY non-time columns. | ||
| plannerContext.setPlanningError( | ||
| "SQL query is a scan and requires order by on a datasource[%s], which is not supported.", | ||
| dataSource | ||
| ); | ||
| return null; | ||
| } | ||
| } | ||
|
|
||
| // Compute the list of columns to select, sorted and deduped. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A large foo table here does not affect planning time. But if it was instead a filter with inline data like
dimSequential in ('1', '2', '3' .. '100')that we know does increase the planning time.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abhishekagarwal87 I have updated the benchmark tests by adding 2 queries with ~1000 values of IN filters.
Before (using 0.22.0 as baseline)
After (this PR build )
There has been an increase of ~40ms for a query with 1000 IN filters where in the other nested query the increase has been ~10ms. Please suggest if this much change is acceptable.