Skip to content

Add indexer task success and failure metrics #16829

Merged
abhishekagarwal87 merged 5 commits intoapache:masterfrom
confluentinc:indexer-task-failure-metric
Aug 5, 2024
Merged

Add indexer task success and failure metrics #16829
abhishekagarwal87 merged 5 commits intoapache:masterfrom
confluentinc:indexer-task-failure-metric

Conversation

@rbankar7
Copy link
Copy Markdown
Contributor

@rbankar7 rbankar7 commented Aug 1, 2024

Adds indexer level task success and failure metrics

Description

This PR adds indexer-level task metrics-

"indexer/task/failed/count"
"indexer/task/success/count"

the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks

Release note

Adds indexer/task/failed/count and indexer/task/failed/count metrics


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have already been discussed, but if the purpose is only to track failed and succeeded tasks, we already have task/running/count, task/success/count, task/failed/count metrics emitted by the Overlord. Are those not sufficient?

https://druid.apache.org/docs/latest/operations/metrics#indexing-service

@rbankar7
Copy link
Copy Markdown
Contributor Author

rbankar7 commented Aug 3, 2024

This might have already been discussed, but if the purpose is only to track failed and succeeded tasks, we already have task/running/count, task/success/count, task/failed/count metrics emitted by the Overlord. Are those not sufficient?

https://druid.apache.org/docs/latest/operations/metrics#indexing-service

We wanted to have this metric for each indexer node. The current metric doesn't provide per-host information
Also we are already emitting completed task metric from each indexer so this is just adding more refinement to that information

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise looks good to me.

Comment thread docs/operations/metrics.md Outdated
Comment thread docs/operations/metrics.md Outdated
Comment thread docs/operations/metrics.md Outdated
@rbankar7 rbankar7 requested a review from kfaraz August 5, 2024 06:40
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@rbankar7
Copy link
Copy Markdown
Contributor Author

rbankar7 commented Aug 5, 2024

@kfaraz looks like the checks passed
could you please help me merge the PR?
Thanks for the review!

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Aug 5, 2024

Sure, @rbankar7 !
@abhishekagarwal87 , do you have any further feedback on this PR?

@abhishekagarwal87 abhishekagarwal87 merged commit c8323d1 into apache:master Aug 5, 2024
@abhishekagarwal87
Copy link
Copy Markdown
Contributor

Just merged. Thank you for the contribution @rbankar7

@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants