Fix issues with partitioning boundaries for MSQ window functions#16729
Fix issues with partitioning boundaries for MSQ window functions#16729asdf2014 merged 11 commits intoapache:masterfrom
Conversation
sreemanamala
left a comment
There was a problem hiding this comment.
Alongside, I think its better to not use the .q and .e files to run the normal test cases. It checks only the output, which is expected for drill tests. In general we would loose the capability of the asserting other aspects of planning and execution (for example - test cases under shuffle columns). For native engine window tests, we have parameterised tests which are still able to do stuff like operator validation.
@sreemanamala It's intentional. I wanted to add a bunch of tests where we validate that the outputs of MSQ and sql-native engine are the same. We have a very nice wiring already for it in the Drill tests, so I re-used that. We have a But to your point, I will add more specific tests to the MSQ specific test file as well for more detailed assertions. I'm kinda hoping to revamp the rest of the logic in the next 2-3 PRs, detailed tests would make more sense then. Hope that sounds good! :) |
ad1b6ad to
fe1c300
Compare
kgyrtkirk
left a comment
There was a problem hiding this comment.
the WindowQueryTestBase has a lot of things relating to drill - why the need to move those?
| outputRow = currentRow; | ||
| objectsOfASingleRac.add(currentRow); | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColsIndex)) { | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColumnNames)) { |
There was a problem hiding this comment.
this is a little bit confusing with that runAllOpsOnSingleRac method; I believe the operators should only be run once...and not construct all of them for every RAC
what happens here seems to be quite similar to what the NaivePartitioningOperator does - but in a streaming fashion...
I think it would be better to implement this as an operator - that way the partitionColumnNames could also live inside the operators - and not need a different path to get passed.
but since this is a bug fix pr - this might be out of scope...
There was a problem hiding this comment.
and not construct all of them for every RAC
We aren't constructing the sort, partitioning and window operator for every RAC, if that's what you meant. They are coming from operatorFactoryList declared at class level.
runAllOpsOnSingleRac does have new Operator() though, do you mean that this need not be constructed for every RAC?
There was a problem hiding this comment.
I meaned that in the else branch there is a call to runAllOpsOnSingleRac which launches to process an operator list - but that gets desctructed after the frame is processed and a new one is built for the next rac...
as a rac in this case could mean even a single row - that makes it a bit inefficient; as setup/shutdown cost is added to every processed rac
| outputRow = currentRow; | ||
| objectsOfASingleRac.add(currentRow); | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColsIndex)) { | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColumnNames)) { |
There was a problem hiding this comment.
looking at what comparePartitionKeys is doing (produces garbage) - and that it gets called for-each-row...I'm not sure if this is the right approach...
it would be probably better to:
- push all rows until it hits the roof into the rac
- use
ArrayListRowsAndColumns's partitioning to identify the smaller sections - submit all partitions except the last
- move those rows into a new starter rac; restart from the begining
There was a problem hiding this comment.
This seems like a much bigger refactoring task, hence beyond the scope of this PR? 😅
I do like the idea though.
There was a problem hiding this comment.
totally agree - I've either missed the review; or more likely I haven't realized that the above is a possible alternate approach which could work better
| log.info("Using row signature [%s] for window stage.", stageRowSignature); | ||
|
|
||
| boolean partitionOperatorExists = false; | ||
| List<String> currentPartitionColumns = new ArrayList<>(); | ||
| for (OperatorFactory of : operatorList.get(i)) { | ||
| if (of instanceof NaivePartitioningOperatorFactory) { | ||
| for (String s : ((NaivePartitioningOperatorFactory) of).getPartitionColumns()) { | ||
| currentPartitionColumns.add(s); | ||
| partitionOperatorExists = true; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (partitionOperatorExists) { | ||
| partitionColumnNames = currentPartitionColumns; | ||
| } | ||
|
|
||
| log.info( | ||
| "Columns which would be used to define partitioning boundaries for this window stage are [%s]", | ||
| partitionColumnNames | ||
| ); |
There was a problem hiding this comment.
wouldn't it make it a bit more readable to have this inside a method?
I don't agree with going thru all the operators and adding all's partition column to a list...
all the code and stuff here naturally wants to have an object like:
class WndStage {
PartitionOperator partitionOperator;
SortOperator sortOperator;
List<Operator> workOperators;
}
even the existance of such a class will ensure that there is no more than 1 partitionoperator in a stage and also gives a home for methods like this
There was a problem hiding this comment.
I love this idea!
Can I take it up in a separate future PR though?
For my next PR, I'm working on revamping the logic of getOperatorListFromQuery() method to fix scenarios with empty over() clauses. I can either make this refactoring change in that PR, or the one after that.
Thoughts?
There was a problem hiding this comment.
of course!
I would recommend to separate these as much as possible - many small PRs used to get reviews faster and because of the size them the feedback is usually also much better!
I created this base class since I added DruidWindowQueryTest, which had a lot of common methods with DrillWindowQueryTest. So I moved the common methods and logic to WindowQueryTestBase. |
… to DrillWindowQueryTest
…id/msq/querykit/WindowOperatorQueryKit.java
| outputRow = currentRow; | ||
| objectsOfASingleRac.add(currentRow); | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColsIndex)) { | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColumnNames)) { |
There was a problem hiding this comment.
totally agree - I've either missed the review; or more likely I haven't realized that the above is a possible alternate approach which could work better
| outputRow = currentRow; | ||
| objectsOfASingleRac.add(currentRow); | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColsIndex)) { | ||
| } else if (comparePartitionKeys(outputRow, currentRow, partitionColumnNames)) { |
There was a problem hiding this comment.
I meaned that in the else branch there is a call to runAllOpsOnSingleRac which launches to process an operator list - but that gets desctructed after the frame is processed and a new one is built for the next rac...
as a rac in this case could mean even a single row - that makes it a bit inefficient; as setup/shutdown cost is added to every processed rac
| log.info("Using row signature [%s] for window stage.", stageRowSignature); | ||
|
|
||
| boolean partitionOperatorExists = false; | ||
| List<String> currentPartitionColumns = new ArrayList<>(); | ||
| for (OperatorFactory of : operatorList.get(i)) { | ||
| if (of instanceof NaivePartitioningOperatorFactory) { | ||
| for (String s : ((NaivePartitioningOperatorFactory) of).getPartitionColumns()) { | ||
| currentPartitionColumns.add(s); | ||
| partitionOperatorExists = true; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (partitionOperatorExists) { | ||
| partitionColumnNames = currentPartitionColumns; | ||
| } | ||
|
|
||
| log.info( | ||
| "Columns which would be used to define partitioning boundaries for this window stage are [%s]", | ||
| partitionColumnNames | ||
| ); |
There was a problem hiding this comment.
of course!
I would recommend to separate these as much as possible - many small PRs used to get reviews faster and because of the size them the feedback is usually also much better!
| @JsonProperty("emptyOver") boolean emptyOver, | ||
| @JsonProperty("maxRowsMaterializedInWindow") int maxRowsMaterializedInWindow | ||
| @JsonProperty("maxRowsMaterializedInWindow") int maxRowsMaterializedInWindow, | ||
| @JsonProperty("partitionColumnNames") List<String> partitionColumnNames |
There was a problem hiding this comment.
This should be marked null-able to maintain backward compatibility.
There was a problem hiding this comment.
I am removing emptyOver in #16754, as it's redundant with partitionColumnNames. My thinking was that it's okay to not worry about backward compatibility, in favor of keeping a cleaner codebase - considering this feature isn't GA yet.
Thoughts?
| // Later we should also check if these can be parallelized. | ||
| // Check if there is an empty OVER() clause or not. | ||
| RowSignature rowSignature = originalQuery.getRowSignature(); | ||
| log.info("Row signature received for query is [%s].", rowSignature); |
There was a problem hiding this comment.
This log statement does not add any value for the end user.
There was a problem hiding this comment.
I wanted to add logs for better debuggability. We can certainly tone down the logging when this has had some soak time and we have more confidence on the stability of it. Thoughts?
…che#16729) * Fix issues with partitioning boundaries for MSQ window functions * Address review comments * Address review comments * Add test for coverage check failure * Address review comment * Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java * Address review comments * Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory * Address review comment * Fix checkstyle --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>
…che#16729) * Fix issues with partitioning boundaries for MSQ window functions * Address review comments * Address review comments * Add test for coverage check failure * Address review comment * Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java * Address review comments * Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory * Address review comment * Fix checkstyle --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>
…che#16729) * Fix issues with partitioning boundaries for MSQ window functions * Address review comments * Address review comments * Add test for coverage check failure * Address review comment * Remove DruidWindowQueryTest and WindowQueryTestBase, move those tests to DrillWindowQueryTest * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java * Address review comments * Add test for equals and hashcode for WindowOperatorQueryFrameProcessorFactory * Address review comment * Fix checkstyle --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>
Description
This PR fixes some issues with MSQ window functions.
Issue 1: NPE issues when multiple windows are used
Currently, queries like the following run into a NPE when using MSQ:
because the
List<OperatorFactory>we get inWindowOperatorQueryKitlayer is[Sort, Partition, Window1, Window2].WindowOperatorQueryKit was trying to group the window factories into different groups, and was asserting a partition operator factory to be present in each group, which wasn't valid, and ended up giving NPE.
Issue 2: Query correctness issues because of incorrect trimming of row signature
Currently, queries like the following give incorrect results when using MSQ:
because we are incorrectly trimming the row signature in
WindowOperatorQueryKitin the following part of code:Solution
Issue 1 is fixed by removing the assertion, and returning null when no partition operator factory is present for the current window stage evaluation. This indicates that we already have the data partitioned correctly, and hence we don't need to do any shuffling.
Issue 2 is fixed by revamping the logic of computing the row signature for every window stage. Changes done to achieve this:
getOutputColumnNames()method inProcessorinterfaceWindowOperatorQueryFrameProcessor#comparePartitionKeys.Test Plan
WindowQueryTestBase. We decided to create a new layer as we didn't want to add non-drill tests into the existing drill test layer.Key changed/added classes in this PR
WindowOperatorQueryKitWindowOperatorQueryFrameProcessorDruidWindowQueryTest,MSQDruidWindowQueryTest,DrillWindowQueryTest,MSQDrillWindowQueryTest,WindowQueryTestBaseRelease Note
This change is backwards incompatible, and can cause issues for MSQ queries with window functions during the upgrade.
This PR has: