Task reports for parallel task: single phase and sequential mode#11688
Task reports for parallel task: single phase and sequential mode#11688jon-wei merged 3 commits intoapache:masterfrom
Conversation
b89447d to
3f2efbe
Compare
3f2efbe to
386e4bb
Compare
jihoonson
left a comment
There was a problem hiding this comment.
The approach polling subtasks only when it's necessary LGTM. I left some trivial comments.
| { | ||
| if (buildSegmentsRowStats instanceof RowIngestionMetersTotals) { | ||
| return (RowIngestionMetersTotals) buildSegmentsRowStats; | ||
| } else if (buildSegmentsRowStats instanceof Map) { |
There was a problem hiding this comment.
Can you add some comment explaining when buildSegmentsRowStats can be a Map or RowIngestionMetersTotals?
There was a problem hiding this comment.
Added a comment, the first case was just for unit tests
| private IndexTask sequentialIndexTask; | ||
| private ParallelIndexTaskRunner<SinglePhaseSubTask, PushedSegmentsReport> parallelSinglePhaseRunner; |
There was a problem hiding this comment.
I suppose you can get the current runner or task from currentSubTaskHolder and cast it properly depending on isParallelMode() instead of adding these?
There was a problem hiding this comment.
I got rid of the references and adjusted things to use currentSubTaskHolder
clintropolis
left a comment
There was a problem hiding this comment.
lgtm, agree that some structure a bit more opinionated than Map would probably be nice and make the code a bit easier to follow, not sure if it makes sense to consider if there are any differences between these reports and shuffle task reports before doing such a refactor
| @QueryParam("full") String full | ||
| ) | ||
| { | ||
| IndexTaskUtils.datasourceAuthorizationCheck(req, Action.READ, getDataSource(), authorizerMapper); |
There was a problem hiding this comment.
I don't think this needs to change in this PR, but be aware that there has been some discussion/activity on moving ingestion APIs to use WRITE instead of READ, see #11680.
This PR allows the parallel native batch task to provide task reports for the sequential and single phase mode (e.g., used with dynamic partitioning), as well as single phase mode subtasks. The multiphase mode is not supported in this patch.
For the single phase mode, completed subtasks provide their task report to the supervisor task as part of the existing PushedSegmentsReport.
The supervisor task calculates total row counts and gathers saved parse exceptions from these submitted reports for completed tasks, and then makes a call to the task report API on the overlord to get the same information for any currently running subtasks.
I think a useful follow on would be to introduce more structure to the task report objects, instead of the plentiful untyped
Map<String, Object>currently used in the reports, but I wanted to keep the scope of this PR smaller.This PR has: