Skip to content

Task reports for parallel task: single phase and sequential mode#11688

Merged
jon-wei merged 3 commits intoapache:masterfrom
jon-wei:parallel_task_report
Sep 16, 2021
Merged

Task reports for parallel task: single phase and sequential mode#11688
jon-wei merged 3 commits intoapache:masterfrom
jon-wei:parallel_task_report

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Sep 10, 2021

This PR allows the parallel native batch task to provide task reports for the sequential and single phase mode (e.g., used with dynamic partitioning), as well as single phase mode subtasks. The multiphase mode is not supported in this patch.

For the single phase mode, completed subtasks provide their task report to the supervisor task as part of the existing PushedSegmentsReport.

The supervisor task calculates total row counts and gathers saved parse exceptions from these submitted reports for completed tasks, and then makes a call to the task report API on the overlord to get the same information for any currently running subtasks.


I think a useful follow on would be to introduce more structure to the task report objects, instead of the plentiful untyped Map<String, Object> currently used in the reports, but I wanted to keep the scope of this PR smaller.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jon-wei jon-wei force-pushed the parallel_task_report branch from 3f2efbe to 386e4bb Compare September 11, 2021 00:04
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach polling subtasks only when it's necessary LGTM. I left some trivial comments.

{
if (buildSegmentsRowStats instanceof RowIngestionMetersTotals) {
return (RowIngestionMetersTotals) buildSegmentsRowStats;
} else if (buildSegmentsRowStats instanceof Map) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comment explaining when buildSegmentsRowStats can be a Map or RowIngestionMetersTotals?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, the first case was just for unit tests

Comment on lines +176 to +177
private IndexTask sequentialIndexTask;
private ParallelIndexTaskRunner<SinglePhaseSubTask, PushedSegmentsReport> parallelSinglePhaseRunner;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you can get the current runner or task from currentSubTaskHolder and cast it properly depending on isParallelMode() instead of adding these?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of the references and adjusted things to use currentSubTaskHolder

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, agree that some structure a bit more opinionated than Map would probably be nice and make the code a bit easier to follow, not sure if it makes sense to consider if there are any differences between these reports and shuffle task reports before doing such a refactor

@QueryParam("full") String full
)
{
IndexTaskUtils.datasourceAuthorizationCheck(req, Action.READ, getDataSource(), authorizerMapper);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to change in this PR, but be aware that there has been some discussion/activity on moving ingestion APIs to use WRITE instead of READ, see #11680.

Copy link
Copy Markdown
Contributor

@imply-jhan imply-jhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants