Process pure ordering changes with windowing operators by kgyrtkirk · Pull Request #15241 · apache/druid

kgyrtkirk · 2023-10-24T12:36:43Z

adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
fixes a null string to "null" literal string conversion in the frame serializer code
fixes some DrillWindowQueryTest cases
fix NPE in NaiveSortOperator in case there was no input
enables back CoreRules.AGGREGATE_REMOVE
adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
earlier window expressions on top of a subquery with an offset may have ignored the offset

  SELECT
    FLOOR(__time TO DAY) t,
    COUNT(1) OVER ()
  FROM ( select * from foo offset 3 ) f

…dowing

This reverts commit 3bcfb197c6156d41036ee4edb60d4ad63a60d85a.

This reverts commit c3a3535.

abhishekagarwal87 · 2023-10-27T09:01:55Z

 {
  private final Operator child;
-  private final ArrayList<ColumnWithDirection> sortColumns;
+  private final List<ColumnWithDirection> sortColumns;


whats this change for?

I think ArrayList have spread to a lot of places when these classes were created

its harder to test with ArrayList types on the interfaces...I didn't wanted to go down further; I think wrapping stuff into the ArrayList when the sorter is created should be rare enough

intellij checks also encourages to use Collections.singletonList and emptyList which are not ArrayLists either....

would you like to take an alternate approach?

No. makes sense though that makes me wonder why not just do it for NaiveSorter and its implementations too. So you don't have to create an ArrayList at line 61

abhishekagarwal87 · 2023-10-27T09:06:00Z

+        throw DruidException.defensive(
+            "Cannot compute toIndex due to overflow [%s]",
+            this);


Shouldn't this be checked in the constructor itself? Pretty wild if we hit this one. 😄

we have some places where Long.MAX_VALUE is substituted - I'm now feel like its even safer to not add this throw... or at least not now - I'll get back to it in a later refactor

abhishekagarwal87 · 2023-10-27T11:35:39Z

+  }
+
  /**
   * Return this query as a Scan query, or null if this query is not compatible with Scan.


could you add a few more details here as to how considerSorting parameter is used?

…dowing

clintropolis · 2023-10-27T22:46:46Z

    }
  }

+  private ColumnType getDataSoruceColumnType(DataSource dataSource, String columnName)


typo, should be getDataSourceColumnType, also mark @Nullable

jeez; I make so many typos lately that I should
probably there are some plugins to warn about issues like this :D
I see this ; but one which could be integrated into the build would be probably even better...

fixed it

clintropolis · 2023-10-27T22:48:25Z

+  @Test
+  public void testEquals()
+  {
+    new EqualsTester()


nit: we've been using EqualsVerifier for many equals/hashcode tests, is it easier to use this one for these classes for some reason? (I don't see anything using this anywhere else in codebase)

didn't seen that sorry - its not easier at all to use it... but its nice to declare equalityGroup-s :D

would you like me to remove them?

clintropolis · 2023-10-27T22:53:51Z

 public class DruidOuterQueryRel extends DruidRel<DruidOuterQueryRel>
 {
-  private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__");
+  public static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__");


i wonder if we should just override isConcrete to return false instead of making this public and checking specifically for this (or maybe make a special dummy datasource class to use in planner that also implements isConcrete as false), because we could simplify the scan query conversion to just check for !isConcrete instead of checking for this instance or concrete

yes - totally agree that is not the best like this...

I think its rather odd to pass this constant here as its being used to show the internal parts that this outer query has a "TableDataSource"...

I think we should probably

remove that ?::

change the DUMMY_QUERY_DATA_SOURCE to not look like a ScanQuery - instead some opaque query; so that it can't interfere with things

possibly reconsider the instanceof at DruidQuery#computeQuery as when there is no window because of that ?: it essentially never becomes true (during early stages of planning) because it passes a non-query datasource ....

...but I can try the isConcreate() impl approach for now - but if that breaks stuff; I would probably go back to the current reference checking approach - and submit the above changes in a followup PR

clintropolis · 2023-10-27T23:06:05Z

+      plannerContext.setPlanningError(
+          "SQL query requires ordering a table by non-time column [%s], which is not supported.",
+          orderByColumnNames);


super nitpick, but much of codebase puts trailing ) on newlines so stuff is symmetrical ⚖️ (noticed this in lots of other places too)

Suggested change

plannerContext.setPlanningError(

"SQL query requires ordering a table by non-time column [%s], which is not supported.",

orderByColumnNames);

plannerContext.setPlanningError(

"SQL query requires ordering a table by non-time column [%s], which is not supported.",

orderByColumnNames

);

not a blocker by any means, just throwing this out there since i find it more aesthetically pleasing 😅

I noticed it as well - however didn't digged into it before; I think the eclipse formatter of the project is outdated - I had to fix a few issues already (because checkstyle followed different rules).

regarding this one: it was off as well; but I don't yet see a way to enable formatting to (1) but was able to set only (2)

format_1(123, 23123, 3213, 2321321 ); format_2( 123, 23123, 3213, 2321321 );

as a matter of fact I don't know if format_1 would be the preffered or not.
I've set format_2; as putting the closing ) on the same line have caused quite a few annoying mistakes for me in the last couple days...

more-or-less related: we could enforce these formatting things (see this PR ); but I didn't wanted to do it for the whole project before we agree on the rules we should follow

kgyrtkirk · 2023-10-28T17:58:38Z

@somu-imply, @abhishekagarwal87, @clintropolis could you please take another look?

soumyava

Took another pass, thanks for addressing my comments. LGTM !

abhishekagarwal87

Nit comments that need not block the PR.

abhishekagarwal87 · 2023-10-29T11:00:29Z

 {
  private final Operator child;
-  private final ArrayList<ColumnWithDirection> sortColumns;
+  private final List<ColumnWithDirection> sortColumns;


No. makes sense though that makes me wonder why not just do it for NaiveSorter and its implementations too. So you don't have to create an ArrayList at line 61

abhishekagarwal87 · 2023-10-29T11:05:00Z

+  private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__")
+  {
+    @Override
+    public boolean isConcrete()


should be helpful to have some comments associated with this override.

abhishekagarwal87 · 2023-10-29T11:12:13Z

@kgyrtkirk - You will also need to change the docs here https://druid.apache.org/docs/latest/querying/scan-query and https://druid.apache.org/docs/latest/querying/sql#order-by

- adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset (cherry picked from commit f4a7471)

kgyrtkirk · 2023-10-30T13:02:53Z

@abhishekagarwal87 I was taking a look and although this does raise the limitation in some cases - it will be fully usable when it will be allowed to also run window operators in the root query as well; right now this will work for DruidOuterQueryRel-s ; however; for example for joins it doesn't work because during early phases of planning a TableDataSource is passed as input; which returns true for isConcreate so this path will be disabled.

- adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset

kgyrtkirk added 9 commits October 24, 2023 08:16

some tests/etc

c59d1ab

add 0

2065847

implement stuff

3623bfc

fix resultset

bed6b3d

add/fix/etc

d791268

rejects too many; better fix it

4792873

drill test stuff

ad54163

rename/etc

b3216df

cleanup

fd89a9a

github-actions Bot added the Area - Querying label Oct 24, 2023

github-advanced-security AI found potential problems Oct 24, 2023

View reviewed changes

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java Fixed

kgyrtkirk added 19 commits October 24, 2023 13:41

remove unrelated test

870a851

undo extract

0d0854b

remove orderby translator stuff

d170c0b

Merge remote-tracking branch 'apache/master' into accept-order-as-win…

d671d3d

…dowing

put back aggregate_remove

c492d4c

undo some more

a23b62a

remove expected query from test

7bedac9

use support for limit in scanOpFactory; partially add offset

8f79534

add offset to test

fd66ef8

remove offset for now

c03bd11

Add int offset approach

c3a3535

This reverts commit 3bcfb197c6156d41036ee4edb60d4ad63a60d85a.

Revert "Add int offset approach"

0569672

This reverts commit c3a3535.

add myOffsetLimit stuff

67c5b66

add back NYS

61eb85e

rename class; add convinience method

4cc671b

less convoluted

4f77cdd

cleanup

41e4442

fix checkstyle/test/etc

07bfa7f

cleanup

4443f3b

mark decoupled testcase

ad4a425

abhishekagarwal87 reviewed Oct 27, 2023

View reviewed changes

kgyrtkirk added 5 commits October 27, 2023 11:51

fix intellig

65a8cae

apidoc/etc

31091c4

fixes

735bc06

Merge remote-tracking branch 'apache/master' into accept-order-as-win…

ed10131

…dowing

fix test

2906bb1

clintropolis reviewed Oct 27, 2023

View reviewed changes

kgyrtkirk added 4 commits October 28, 2023 07:00

fix typo

e66977c

implement isConcrete() instead of reference check

fdc1fc0

format

1181200

format some sources

bb5ebda

github-advanced-security AI found potential problems Oct 28, 2023

View reviewed changes

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidOuterQueryRel.java Fixed

kgyrtkirk added 3 commits October 28, 2023 08:35

remove equalstester

08a9286

add missing override

fde6e3f

remove unused method

2692a19

kgyrtkirk requested review from abhishekagarwal87, clintropolis and somu-imply October 28, 2023 17:44

soumyava approved these changes Oct 28, 2023

View reviewed changes

pranavbhole approved these changes Oct 28, 2023

View reviewed changes

abhishekagarwal87 approved these changes Oct 29, 2023

View reviewed changes

abhishekagarwal87 merged commit f4a7471 into apache:master Oct 29, 2023

LakshSingla mentioned this pull request Nov 2, 2023

Fix an issue with passing order by and limit to realtime tasks #15301

Merged

10 tasks

Conversation

kgyrtkirk commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kgyrtkirk commented Oct 28, 2023

Uh oh!

soumyava left a comment

Choose a reason for hiding this comment

Uh oh!

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhishekagarwal87 commented Oct 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kgyrtkirk commented Oct 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kgyrtkirk commented Oct 24, 2023 •

edited

Loading

clintropolis Oct 27, 2023 •

edited

Loading

abhishekagarwal87 commented Oct 29, 2023 •

edited

Loading