Multiple fixes for the MSQ stats merging piece#13463
Conversation
| .build(); | ||
| } | ||
| catch (ISE e) { | ||
| log.error(e, "Invalid request for key statistics"); |
There was a problem hiding this comment.
Should we return the error message to the client ?
There was a problem hiding this comment.
We can do that, I wasn't sure if this needed to be returned to the caller or just logged. I would also assume that we should call ISE.sanitize() before we do so, correct?
|
@cryptoe , it seems that the coverage is low. |
Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly.
kfaraz
left a comment
There was a problem hiding this comment.
Minor queries and suggestions, otherwise LGTM.
| .indexIO(indexIO) | ||
| .indexMergerV9(indexMerger) | ||
| .taskReportFileWriter( | ||
| new TaskReportFileWriter() |
There was a problem hiding this comment.
Nit: You could use the existing NoopTaskReportFileWriter here instead.
There was a problem hiding this comment.
I was not able to use that because it was in the indexing module.
There was a problem hiding this comment.
It did not make sense to add a dependency for just NoopTaskReportFIleWriter.
| OffHeapMemorySegmentWriteOutMediumFactory.instance() | ||
| ); | ||
|
|
||
| mocks = MockitoAnnotations.openMocks(this); |
There was a problem hiding this comment.
Style: Since there is a single mocked instance, I suppose it might be simpler to just use Mockito.mock() for the HttpServletRequest req rather than using MockitoAnnotations.
If you do want to use @Mock however, the preferred method would be annotate the class with @RunWith(MockitoJUnitRunner)
| if (stageKernelMap.get(stageId) == null) { | ||
| throw new ISE("Requested statistics snapshot for non-existent stageId %s.", stageId); | ||
| } | ||
| if (stageKernelMap.get(stageId).getResultKeyStatisticsSnapshot() == null) { | ||
| throw new ISE( | ||
| "Requested statistics snapshot is not generated yet for stageId[%s]", | ||
| stageId | ||
| ); | ||
| } | ||
| return stageKernelMap.get(stageId).getResultKeyStatisticsSnapshot(); |
There was a problem hiding this comment.
Style: might be easier to read as an if-else-if chain.
| if (queryKernel.getStagePhase(stageId).equals(ControllerStagePhase.MERGING_STATISTICS)) { | ||
| List<String> workerTaskIds = workerTaskLauncher.getTaskList(); | ||
| // we only need tasks which are active for this stage. | ||
| List<String> workerTaskIds = workerTaskLauncher.getTaskList() |
There was a problem hiding this comment.
Doesn't getTaskList() already return the list of active tasks only?
Or can it include active tasks from other stages too?
If yes, can we be sure of the order of the items in the returned list such that subList always works as expected?
There was a problem hiding this comment.
Yes, we can be sure that getTaskList() is active tasks only.
It's kind of a coincidence that the moment you commented, I changed this logic to a less brittle one.
There was a problem hiding this comment.
Thanks for the fix, @cryptoe , the new method does seem better. I would also suggest just doing the filtering of workerTaskIds in the controller itself rather than passing an extra argument to the WorkerSketchFetcher just to filter the task ids later. Then the logic in the WorkerSketchFetcher wouldn't have to change and it wouldn't have to be aware of the worker indexes (which seems like an impl detail of the controller).
Hope this makes sense.
There was a problem hiding this comment.
Actually this logic would change in #13353 hence added a filtering step.
As we cannot use intstream we still have to change the workerSketcherFetcher
|
Changes LGTM, thanks for taking up this PR! |
|
Failures look unrelated. |
* Add validation checks to worker chat handler apis * Merge things and polishing the error messages. * Minor error message change * Fixing race and adding some tests * Fixing controller fetching stats from wrong workers. Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly. * Changing to kernel worker count * Added a better logic to figure out assigned worker for a stage. * Nits * Moving to existing kernel methods * Adding more coverage Co-authored-by: cryptoe <karankumar1100@gmail.com> (cherry picked from commit 2b605aa)
* Add validation checks to worker chat handler apis * Merge things and polishing the error messages. * Minor error message change * Fixing race and adding some tests * Fixing controller fetching stats from wrong workers. Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly. * Changing to kernel worker count * Added a better logic to figure out assigned worker for a stage. * Nits * Moving to existing kernel methods * Adding more coverage Co-authored-by: cryptoe <karankumar1100@gmail.com> (cherry picked from commit 2b605aa) Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com>
Multiple fixes for the MSQ stats merging piece which was revamped as part of #13205 .
This PR has: