Fix RuntimeException during ordered scan MSQ query#16661
Fix RuntimeException during ordered scan MSQ query#16661nozjkoitop wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thanks a lot for the contribution 🎉 Funny enough, there's also #16643 which solves a very closely related bug, but not quite the same. I am curious if that inadvertently resolves this one as well. |
It's fixes a bug, although I'm not sure why input to the limit stage should be a single partition if it merges it anyway, thanks for pointing out another PR I'm not sure how it'll gonna work, need to test that, as we'll be writing results to a multiple partitions but at the end of the day everything will be written to one file in the durable storage. |
That shouldn't happen - output of a query like the following in durable storage should produce multiple partitions (which it isn't doing so at the moment) SELECT * FROM foo LIMIT 1000000 |
I was thinking about that, OffsetLimitFrameProcessorFactory should be updated in this case, now the output channel is hardcoded |
|
@nozjkoitop There was a more comprehensive change for similar issues with all the queries containing a LIMIT clause. It was merged recently #16643 and it should potentially fix the problem that you are seeing. Can you please verify if that patch fixes the issue? |
Hi @LakshSingla, thanks for the suggestion, it actually does fix the problem, although WDYT about the scan stage which always results to 1 partition if it hasLimitOrOffset? It could have pretty big output partition |
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
Resolved a runtime exception triggered by executing limited and ordered scan queries via the MSQ engine.
The main problem was that the system expected multiple partitions to be made, but in reality, only one partition was created by the OffsetLimitFrameProcessorFactory.
During handling process caused by ShuffleSpec == null, expected partitions were counted based on workerInputs, which were created based on slicer from the stageTracker of previous stage.
When using MixShuffleSpec at the limit stage, it results in creating just one partition for the stage's outcomes, which seems to be the behavior we expect.
This PR has: