Process pure ordering changes with windowing operators#15241
Process pure ordering changes with windowing operators#15241abhishekagarwal87 merged 87 commits intoapache:masterfrom
Conversation
This reverts commit 3bcfb197c6156d41036ee4edb60d4ad63a60d85a.
This reverts commit c3a3535.
| { | ||
| private final Operator child; | ||
| private final ArrayList<ColumnWithDirection> sortColumns; | ||
| private final List<ColumnWithDirection> sortColumns; |
There was a problem hiding this comment.
whats this change for?
There was a problem hiding this comment.
- I think
ArrayListhave spread to a lot of places when these classes were created - its harder to test with
ArrayListtypes on the interfaces...I didn't wanted to go down further; I think wrapping stuff into theArrayListwhen the sorter is created should be rare enough - intellij checks also encourages to use
Collections.singletonListandemptyListwhich are notArrayListseither....
would you like to take an alternate approach?
There was a problem hiding this comment.
No. makes sense though that makes me wonder why not just do it for NaiveSorter and its implementations too. So you don't have to create an ArrayList at line 61
| throw DruidException.defensive( | ||
| "Cannot compute toIndex due to overflow [%s]", | ||
| this); |
There was a problem hiding this comment.
Shouldn't this be checked in the constructor itself? Pretty wild if we hit this one. 😄
There was a problem hiding this comment.
we have some places where Long.MAX_VALUE is substituted - I'm now feel like its even safer to not add this throw... or at least not now - I'll get back to it in a later refactor
| } | ||
|
|
||
| /** | ||
| * Return this query as a Scan query, or null if this query is not compatible with Scan. |
There was a problem hiding this comment.
could you add a few more details here as to how considerSorting parameter is used?
| } | ||
| } | ||
|
|
||
| private ColumnType getDataSoruceColumnType(DataSource dataSource, String columnName) |
There was a problem hiding this comment.
typo, should be getDataSourceColumnType, also mark @Nullable
There was a problem hiding this comment.
jeez; I make so many typos lately that I should
probably there are some plugins to warn about issues like this :D
I see this ; but one which could be integrated into the build would be probably even better...
fixed it
| @Test | ||
| public void testEquals() | ||
| { | ||
| new EqualsTester() |
There was a problem hiding this comment.
nit: we've been using EqualsVerifier for many equals/hashcode tests, is it easier to use this one for these classes for some reason? (I don't see anything using this anywhere else in codebase)
There was a problem hiding this comment.
didn't seen that sorry - its not easier at all to use it... but its nice to declare equalityGroup-s :D
would you like me to remove them?
| public class DruidOuterQueryRel extends DruidRel<DruidOuterQueryRel> | ||
| { | ||
| private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__"); | ||
| public static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__"); |
There was a problem hiding this comment.
i wonder if we should just override isConcrete to return false instead of making this public and checking specifically for this (or maybe make a special dummy datasource class to use in planner that also implements isConcrete as false), because we could simplify the scan query conversion to just check for !isConcrete instead of checking for this instance or concrete
There was a problem hiding this comment.
yes - totally agree that is not the best like this...
I think its rather odd to pass this constant here as its being used to show the internal parts that this outer query has a "TableDataSource"...
I think we should probably
- remove that
?:: - change the
DUMMY_QUERY_DATA_SOURCEto not look like aScanQuery- instead some opaquequery; so that it can't interfere with things - possibly reconsider the
instanceofat DruidQuery#computeQuery as when there is nowindowbecause of that?:it essentially never becomes true (during early stages of planning) because it passes a non-query datasource ....
...but I can try the isConcreate() impl approach for now - but if that breaks stuff; I would probably go back to the current reference checking approach - and submit the above changes in a followup PR
| plannerContext.setPlanningError( | ||
| "SQL query requires ordering a table by non-time column [%s], which is not supported.", | ||
| orderByColumnNames); |
There was a problem hiding this comment.
super nitpick, but much of codebase puts trailing ) on newlines so stuff is symmetrical ⚖️ (noticed this in lots of other places too)
| plannerContext.setPlanningError( | |
| "SQL query requires ordering a table by non-time column [%s], which is not supported.", | |
| orderByColumnNames); | |
| plannerContext.setPlanningError( | |
| "SQL query requires ordering a table by non-time column [%s], which is not supported.", | |
| orderByColumnNames | |
| ); |
not a blocker by any means, just throwing this out there since i find it more aesthetically pleasing 😅
There was a problem hiding this comment.
I noticed it as well - however didn't digged into it before; I think the eclipse formatter of the project is outdated - I had to fix a few issues already (because checkstyle followed different rules).
regarding this one: it was off as well; but I don't yet see a way to enable formatting to (1) but was able to set only (2)
format_1(123,
23123,
3213,
2321321
);
format_2(
123,
23123,
3213,
2321321
);
as a matter of fact I don't know if format_1 would be the preffered or not.
I've set format_2; as putting the closing ) on the same line have caused quite a few annoying mistakes for me in the last couple days...
more-or-less related: we could enforce these formatting things (see this PR ); but I didn't wanted to do it for the whole project before we agree on the rules we should follow
|
@somu-imply, @abhishekagarwal87, @clintropolis could you please take another look? |
soumyava
left a comment
There was a problem hiding this comment.
Took another pass, thanks for addressing my comments. LGTM !
abhishekagarwal87
left a comment
There was a problem hiding this comment.
Nit comments that need not block the PR.
| { | ||
| private final Operator child; | ||
| private final ArrayList<ColumnWithDirection> sortColumns; | ||
| private final List<ColumnWithDirection> sortColumns; |
There was a problem hiding this comment.
No. makes sense though that makes me wonder why not just do it for NaiveSorter and its implementations too. So you don't have to create an ArrayList at line 61
| private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__") | ||
| { | ||
| @Override | ||
| public boolean isConcrete() |
There was a problem hiding this comment.
should be helpful to have some comments associated with this override.
|
@kgyrtkirk - You will also need to change the docs here https://druid.apache.org/docs/latest/querying/scan-query and https://druid.apache.org/docs/latest/querying/sql#order-by |
- adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset (cherry picked from commit f4a7471)
- adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset (cherry picked from commit f4a7471)
|
@abhishekagarwal87 I was taking a look and although this does raise the limitation in some cases - it will be fully usable when it will be allowed to also run window operators in the root query as well; right now this will work for |
- adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset
DruidQuery#toScanAndSortQuerywhich:ScanQuerywithout considering the currentorderingnullstring to"null"literal string conversion in the frame serializer codeDrillWindowQueryTestcasesNaiveSortOperatorin case there was no inputCoreRules.AGGREGATE_REMOVEprocessinglevelOffsetLimitclass and uses that instead of just thelimitin theracparts