Skip to content

Enable partition stats on streaming task completion report#15930

Merged
kfaraz merged 9 commits intoapache:masterfrom
ac9817:adithyachakilam/enable-partition-stats
Feb 23, 2024
Merged

Enable partition stats on streaming task completion report#15930
kfaraz merged 9 commits intoapache:masterfrom
ac9817:adithyachakilam/enable-partition-stats

Conversation

@ac9817
Copy link
Copy Markdown
Contributor

@ac9817 ac9817 commented Feb 20, 2024

Description

This adds visibility into number of partitions processed by each streaming task and the count of records processed from each of those partitions.

This PR updates IngestionStatsAndErrorsTaskReportData to take new map which represent count by partition and SeekableStreamIndexTaskRunner populates that map while processing each record. We are trying to analyze the work done by each task so as to efficiently assign appropriate resources in future to the worker nodes.


Release note

Streaming Task completion report now has an extra field recordsProcessed which lists all the partitions processed by that task and count of records for each partition. One can look at this field to see the actual throughput of tasks and make decision as to vertically/horizontally scale there workers.


Key changed/added classes in this PR
  • SeekableStreamIndexTaskRunner.java
  • IngestionStatsAndErrorsTaskReportData.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

This adds visibility into number of partitions processed by each task
and the count of records processes from each of those partitions.
@ac9817 ac9817 requested a review from YongGang February 21, 2024 00:01
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @adithyachakilam!

Overall, the change makes sense to me. Could you add some info of why you needed this stat and if it helped you debug any failure scenario?

Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your first contribution @adithyachakilam ! I have one suggestion on naming the object in the task report. Kashif also has some good suggestions to improve code readability.

LGTM after the rename for the object in the task report and associated docs.

In the PR description, can you update the Release note section to answer why someone may want to read the recordsProcessed field from the task report.

Comment thread docs/ingestion/tasks.md Outdated
Adithya Chakilam added 2 commits February 22, 2024 15:06
@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Feb 22, 2024

Thanks for the comments @kfaraz / @suneet-s. Tried to address them, let me know if you need anything more.

@ac9817 ac9817 requested review from kfaraz and suneet-s February 22, 2024 23:24
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, +1 after CI passes.

@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Feb 23, 2024

@kfaraz seems like an unrelated flaky test that has failed.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Feb 23, 2024

Thanks for your first contribution, @adithyachakilam !
Merging the PR as all comments have been addressed and the CI failure is due to unrelated failing integration test.

@kfaraz kfaraz merged commit 1f443d2 into apache:master Feb 23, 2024
@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Feb 23, 2024

I have already merged the PR but taking another look at the final payload, I feel recordsProcessedPerPartition or partitionRecordsProcessed might be a more appropriate name. Right now the payload would look something like:

"recordsProcessed": {
   "1": 1234,
   "2": 2345
}

Looking at this payload, it is difficult to tell exactly what each entry in the map represents.

@suneet-s , @adithyachakilam , what do you think?

ac9817 pushed a commit to ac9817/druid that referenced this pull request Feb 23, 2024
@ac9817 ac9817 deleted the adithyachakilam/enable-partition-stats branch February 23, 2024 17:58
@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Feb 23, 2024

@kfaraz makes sense, made the change here: #15958

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants