-
Notifications
You must be signed in to change notification settings - Fork 3.8k
MSQ window functions: Fix query correctness issues when using multiple workers #16804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6b4b7eb
28cad3b
cd84b50
1709259
0dc01b4
0992d7b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,6 @@ | |
| package org.apache.druid.msq.querykit; | ||
|
|
||
| import com.fasterxml.jackson.databind.ObjectMapper; | ||
| import com.google.common.collect.ImmutableMap; | ||
| import org.apache.druid.frame.key.ClusterBy; | ||
| import org.apache.druid.frame.key.KeyColumn; | ||
| import org.apache.druid.frame.key.KeyOrder; | ||
|
|
@@ -88,17 +87,6 @@ public QueryDefinition makeQueryDefinition( | |
| List<List<OperatorFactory>> operatorList = getOperatorListFromQuery(originalQuery); | ||
| log.info("Created operatorList with operator factories: [%s]", operatorList); | ||
|
|
||
| ShuffleSpec nextShuffleSpec = findShuffleSpecForNextWindow(operatorList.get(0), maxWorkerCount); | ||
| // add this shuffle spec to the last stage of the inner query | ||
|
|
||
| final QueryDefinitionBuilder queryDefBuilder = QueryDefinition.builder(queryId); | ||
| if (nextShuffleSpec != null) { | ||
| final ClusterBy windowClusterBy = nextShuffleSpec.clusterBy(); | ||
| originalQuery = (WindowOperatorQuery) originalQuery.withOverriddenContext(ImmutableMap.of( | ||
| MultiStageQueryContext.NEXT_WINDOW_SHUFFLE_COL, | ||
| windowClusterBy | ||
| )); | ||
| } | ||
| final DataSourcePlan dataSourcePlan = DataSourcePlan.forDataSource( | ||
| queryKit, | ||
| queryId, | ||
|
|
@@ -112,7 +100,8 @@ public QueryDefinition makeQueryDefinition( | |
| false | ||
| ); | ||
|
|
||
| dataSourcePlan.getSubQueryDefBuilder().ifPresent(queryDefBuilder::addAll); | ||
| ShuffleSpec nextShuffleSpec = findShuffleSpecForNextWindow(operatorList.get(0), maxWorkerCount); | ||
| final QueryDefinitionBuilder queryDefBuilder = makeQueryDefinitionBuilder(queryId, dataSourcePlan, nextShuffleSpec); | ||
|
|
||
| final int firstStageNumber = Math.max(minStageNumber, queryDefBuilder.getNextStageNumber()); | ||
| final WindowOperatorQuery queryToRun = (WindowOperatorQuery) originalQuery.withDataSource(dataSourcePlan.getNewDataSource()); | ||
|
|
@@ -309,12 +298,16 @@ private ShuffleSpec findShuffleSpecForNextWindow(List<OperatorFactory> operatorF | |
| } | ||
| } | ||
|
|
||
| if (partition == null || partition.getPartitionColumns().isEmpty()) { | ||
| if (partition == null) { | ||
| // If operatorFactories doesn't have any partitioning factory, then we should keep the shuffle spec from previous stage. | ||
| // This indicates that we already have the data partitioned correctly, and hence we don't need to do any shuffling. | ||
| return null; | ||
| } | ||
|
|
||
| if (partition.getPartitionColumns().isEmpty()) { | ||
| return MixShuffleSpec.instance(); | ||
| } | ||
|
|
||
| List<KeyColumn> keyColsOfWindow = new ArrayList<>(); | ||
| for (String partitionColumn : partition.getPartitionColumns()) { | ||
| KeyColumn kc; | ||
|
|
@@ -328,4 +321,29 @@ private ShuffleSpec findShuffleSpecForNextWindow(List<OperatorFactory> operatorF | |
|
|
||
| return new HashShuffleSpec(new ClusterBy(keyColsOfWindow, 0), maxWorkerCount); | ||
| } | ||
|
|
||
| /** | ||
| * Override the shuffle spec of the last stage based on the shuffling required by the first window stage. | ||
| * @param queryId | ||
| * @param dataSourcePlan | ||
| * @param shuffleSpec | ||
|
Akshat-Jain marked this conversation as resolved.
|
||
| * @return | ||
| */ | ||
| private QueryDefinitionBuilder makeQueryDefinitionBuilder(String queryId, DataSourcePlan dataSourcePlan, ShuffleSpec shuffleSpec) | ||
| { | ||
| final QueryDefinitionBuilder queryDefBuilder = QueryDefinition.builder(queryId); | ||
| int previousStageNumber = dataSourcePlan.getSubQueryDefBuilder().get().build().getFinalStageDefinition().getStageNumber(); | ||
| for (final StageDefinition stageDef : dataSourcePlan.getSubQueryDefBuilder().get().build().getStageDefinitions()) { | ||
| if (stageDef.getStageNumber() == previousStageNumber) { | ||
| RowSignature rowSignature = QueryKitUtils.sortableSignature( | ||
| stageDef.getSignature(), | ||
| shuffleSpec.clusterBy().getColumns() | ||
| ); | ||
| queryDefBuilder.add(StageDefinition.builder(stageDef).shuffleSpec(shuffleSpec).signature(rowSignature)); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to be 100% sure that we are not causing correctness issues; can we validate if we are ok to override the old I guess the cases when its not safe to do so may need further investigation(s) - as those shuffles could possibly be moved "after" the window query.... note: I think if we don't handle
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we throw error when the shuffleSpec is non-null for the stage that's getting overridden, a lot of tests fail in MSQDrillWindowQueryTest: 754 tests failed, 114 tests passed. We can't throw an error when it's non-null 😅
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, it is entirely possible that the final old shuffle spec contains non null shuffling. When creating the previous stages, it is the windowQueryKit which gives hints on what final it wants the shuffling to be based on I think it might be better to change the resultShuffleSpec we pass from windowQueryKit, and then do an assert on the shuffle spec. @kgyrtkirk Do you know of any example query which might cause issues if we change the shuffle spec? I have thought about it a bit, but I can't think of one. Since it is window function which is reading the results of this shuffle, and it does not care about the ordering (I'm assuming this is the case since we want to change it to mixShuffleSpec), there should not be an issue with this change, from what I can tell, but I might have to think more about this. |
||
| } else { | ||
| queryDefBuilder.add(StageDefinition.builder(stageDef)); | ||
| } | ||
| } | ||
| return queryDefBuilder; | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.