Fixing an issue in sequential merge #14574
Conversation
… key statistics would get stuck because controller did not change the worker state.
LakshSingla
left a comment
There was a problem hiding this comment.
Instead of the controller determining that no boundary received means an empty partition, I think it would be more appropriate if this logic is built into the worker. If the output row is none, it would report ClusterByStatisticsSnapshot.empty(), then it should report the empty partition.
Seems cleaner to me because, in that way, we are verifying that the output rows are 0 on the worker, WDYT?
|
Can we add a test in MSQSelectTests that verify that this is working as expected? |
|
@LakshSingla Thanks for the review.
I thought about this. The trickiness lies in how we calculate
Since we donot gather stats in select q's I cannot do this via MSQSelectTests. |
I think @adarshsanjeev did make some progress in this department where we could simulate multiple workers easily. However, since this might be a blocker, we can punt it for later (though we should revisit this for sure). |
|
Thanks for the fix! Could there be a test for this? |
|
Nevermind, I just noticed the discussion above about testing. This patch LGTM but let's try to get multi-worker unit testing going as soon as we can. |
|
@gianm @LakshSingla I have added an IT for the same. |
| if (sqlTaskStatus.getState().isFailure()) { | ||
| Assert.fail(StringUtils.format( | ||
| "Unable to start the task successfully.\nPossible exception: %s", | ||
| sqlTaskStatus.getError() |
Check notice
Code scanning / CodeQL
Use of default toString()
|
Going ahead and merging this since the IT has passed : |
* Fixing an issue in sequential merge where workers without any partial key statistics would get stuck because controller did not change the worker state. * Removing empty check * Adding IT for MSQ sequential bug fix. (cherry picked from commit 89aee6c)
|
Thanks for the PR! 🚀 |
* Fixing an issue in sequential merge where workers without any partial key statistics would get stuck because controller did not change the worker state. * Removing empty check * Adding IT for MSQ sequential bug fix.
Fixing an issue in sequential merge where workers without any partial key statistics would get stuck because controller did not change the worker state.
The symptom's are the controller task getting stuck after some of the worker boundaries are sent.
Key changed/added classes in this PR
WorkerSketchFetcherThis PR has: