Skip to content

Process pure ordering changes with windowing operators#15241

Merged
abhishekagarwal87 merged 87 commits intoapache:masterfrom
kgyrtkirk:accept-order-as-windowing
Oct 29, 2023
Merged

Process pure ordering changes with windowing operators#15241
abhishekagarwal87 merged 87 commits intoapache:masterfrom
kgyrtkirk:accept-order-as-windowing

Conversation

@kgyrtkirk
Copy link
Copy Markdown
Member

@kgyrtkirk kgyrtkirk commented Oct 24, 2023

  • adds a new query build path: DruidQuery#toScanAndSortQuery which:
    • builds a ScanQuery without considering the current ordering
    • builds an operator to execute the sort
  • fixes a null string to "null" literal string conversion in the frame serializer code
  • fixes some DrillWindowQueryTest cases
  • fix NPE in NaiveSortOperator in case there was no input
  • enables back CoreRules.AGGREGATE_REMOVE
  • adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
  • earlier window expressions on top of a subquery with an offset may have ignored the offset
  SELECT
    FLOOR(__time TO DAY) t,
    COUNT(1) OVER ()
  FROM ( select * from foo offset 3 ) f

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java Fixed
{
private final Operator child;
private final ArrayList<ColumnWithDirection> sortColumns;
private final List<ColumnWithDirection> sortColumns;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats this change for?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think ArrayList have spread to a lot of places when these classes were created
  • its harder to test with ArrayList types on the interfaces...I didn't wanted to go down further; I think wrapping stuff into the ArrayList when the sorter is created should be rare enough
  • intellij checks also encourages to use Collections.singletonList and emptyList which are not ArrayLists either....

would you like to take an alternate approach?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. makes sense though that makes me wonder why not just do it for NaiveSorter and its implementations too. So you don't have to create an ArrayList at line 61

Comment thread processing/src/main/java/org/apache/druid/query/operator/OffsetLimit.java Outdated
Comment on lines +130 to +132
throw DruidException.defensive(
"Cannot compute toIndex due to overflow [%s]",
this);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be checked in the constructor itself? Pretty wild if we hit this one. 😄

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have some places where Long.MAX_VALUE is substituted - I'm now feel like its even safer to not add this throw... or at least not now - I'll get back to it in a later refactor

}

/**
* Return this query as a Scan query, or null if this query is not compatible with Scan.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a few more details here as to how considerSorting parameter is used?

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java Outdated
Comment thread sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java Outdated
}
}

private ColumnType getDataSoruceColumnType(DataSource dataSource, String columnName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, should be getDataSourceColumnType, also mark @Nullable

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jeez; I make so many typos lately that I should
probably there are some plugins to warn about issues like this :D
I see this ; but one which could be integrated into the build would be probably even better...

fixed it

@Test
public void testEquals()
{
new EqualsTester()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we've been using EqualsVerifier for many equals/hashcode tests, is it easier to use this one for these classes for some reason? (I don't see anything using this anywhere else in codebase)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't seen that sorry - its not easier at all to use it... but its nice to declare equalityGroup-s :D

would you like me to remove them?

public class DruidOuterQueryRel extends DruidRel<DruidOuterQueryRel>
{
private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__");
public static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__");
Copy link
Copy Markdown
Member

@clintropolis clintropolis Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we should just override isConcrete to return false instead of making this public and checking specifically for this (or maybe make a special dummy datasource class to use in planner that also implements isConcrete as false), because we could simplify the scan query conversion to just check for !isConcrete instead of checking for this instance or concrete

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - totally agree that is not the best like this...

I think its rather odd to pass this constant here as its being used to show the internal parts that this outer query has a "TableDataSource"...

I think we should probably

  • remove that ?::
  • change the DUMMY_QUERY_DATA_SOURCE to not look like a ScanQuery - instead some opaque query; so that it can't interfere with things
  • possibly reconsider the instanceof at DruidQuery#computeQuery as when there is no window because of that ?: it essentially never becomes true (during early stages of planning) because it passes a non-query datasource ....

...but I can try the isConcreate() impl approach for now - but if that breaks stuff; I would probably go back to the current reference checking approach - and submit the above changes in a followup PR

Comment on lines +1511 to +1513
plannerContext.setPlanningError(
"SQL query requires ordering a table by non-time column [%s], which is not supported.",
orderByColumnNames);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitpick, but much of codebase puts trailing ) on newlines so stuff is symmetrical ⚖️ (noticed this in lots of other places too)

Suggested change
plannerContext.setPlanningError(
"SQL query requires ordering a table by non-time column [%s], which is not supported.",
orderByColumnNames);
plannerContext.setPlanningError(
"SQL query requires ordering a table by non-time column [%s], which is not supported.",
orderByColumnNames
);

not a blocker by any means, just throwing this out there since i find it more aesthetically pleasing 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed it as well - however didn't digged into it before; I think the eclipse formatter of the project is outdated - I had to fix a few issues already (because checkstyle followed different rules).

regarding this one: it was off as well; but I don't yet see a way to enable formatting to (1) but was able to set only (2)

format_1(123,
    23123,
    3213,
    2321321
);

format_2(
    123,
    23123,
    3213,
    2321321
);

as a matter of fact I don't know if format_1 would be the preffered or not.
I've set format_2; as putting the closing ) on the same line have caused quite a few annoying mistakes for me in the last couple days...

more-or-less related: we could enforce these formatting things (see this PR ); but I didn't wanted to do it for the whole project before we agree on the rules we should follow

@kgyrtkirk
Copy link
Copy Markdown
Member Author

@somu-imply, @abhishekagarwal87, @clintropolis could you please take another look?

Copy link
Copy Markdown
Contributor

@soumyava soumyava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass, thanks for addressing my comments. LGTM !

Copy link
Copy Markdown
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit comments that need not block the PR.

{
private final Operator child;
private final ArrayList<ColumnWithDirection> sortColumns;
private final List<ColumnWithDirection> sortColumns;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. makes sense though that makes me wonder why not just do it for NaiveSorter and its implementations too. So you don't have to create an ArrayList at line 61

private static final TableDataSource DUMMY_DATA_SOURCE = new TableDataSource("__subquery__")
{
@Override
public boolean isConcrete()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be helpful to have some comments associated with this override.

@abhishekagarwal87 abhishekagarwal87 merged commit f4a7471 into apache:master Oct 29, 2023
@abhishekagarwal87
Copy link
Copy Markdown
Contributor

abhishekagarwal87 commented Oct 29, 2023

@kgyrtkirk - You will also need to change the docs here https://druid.apache.org/docs/latest/querying/scan-query and https://druid.apache.org/docs/latest/querying/sql#order-by

kgyrtkirk added a commit to kgyrtkirk/druid that referenced this pull request Oct 30, 2023
- adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
- fixes a null string to "null" literal string conversion in the frame serializer code
- fixes some DrillWindowQueryTest cases
- fix NPE in NaiveSortOperator in case there was no input
- enables back CoreRules.AGGREGATE_REMOVE
- adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
- earlier window expressions on top of a subquery with an offset may have ignored the offset

(cherry picked from commit f4a7471)
LakshSingla pushed a commit that referenced this pull request Oct 30, 2023
- adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
- fixes a null string to "null" literal string conversion in the frame serializer code
- fixes some DrillWindowQueryTest cases
- fix NPE in NaiveSortOperator in case there was no input
- enables back CoreRules.AGGREGATE_REMOVE
- adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
- earlier window expressions on top of a subquery with an offset may have ignored the offset

(cherry picked from commit f4a7471)
@kgyrtkirk
Copy link
Copy Markdown
Member Author

@abhishekagarwal87 I was taking a look and although this does raise the limitation in some cases - it will be fully usable when it will be allowed to also run window operators in the root query as well; right now this will work for DruidOuterQueryRel-s ; however; for example for joins it doesn't work because during early phases of planning a TableDataSource is passed as input; which returns true for isConcreate so this path will be disabled.

CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
- adds a new query build path: DruidQuery#toScanAndSortQuery which:
- builds a ScanQuery without considering the current ordering
- builds an operator to execute the sort
- fixes a null string to "null" literal string conversion in the frame serializer code
- fixes some DrillWindowQueryTest cases
- fix NPE in NaiveSortOperator in case there was no input
- enables back CoreRules.AGGREGATE_REMOVE
- adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts
- earlier window expressions on top of a subquery with an offset may have ignored the offset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants