Skip to content

Add new coordinator metrics for coordinator duty runtimes#10603

Merged
himanshug merged 6 commits intoapache:masterfrom
capistrant:add-coordinator-runtime-metrics
Nov 29, 2020
Merged

Add new coordinator metrics for coordinator duty runtimes#10603
himanshug merged 6 commits intoapache:masterfrom
capistrant:add-coordinator-runtime-metrics

Conversation

@capistrant
Copy link
Copy Markdown
Contributor

@capistrant capistrant commented Nov 24, 2020

Description

Adds two new metrics to the coordinator. These metrics help operators evaluate coordinator runtimes and allow for analyzing runtimes for individual duties if the metrics are emitted (as of now, only HistoricalManagementDuties Runnable will emit the individual duty runtime metrics).

coordinator/time = the time for an individual duty to execute
coordinator/global/time = the time for the whole duties runnable to execute

Example metric json:

coordinator/global/time

{"feed":"metrics","timestamp":"2020-11-24T22:22:54.288Z","service":"druid/coordinator","host":"coordinator:8081","version":"0.21.0-SNAPSHOT","metric":"coordinator/global/time","value":250,"dutyGroup":"HistoricalManagementDuties"}

coordinator/time

{"feed":"metrics","timestamp":"2020-11-24T22:22:54.287Z","service":"druid/coordinator","host":"coordinator:8081","version":"0.21.0-SNAPSHOT","metric":"coordinator/time","value":1,"duty":"LogUsedSegments"}

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • CoordinatorStats
  • DruidCoordinator
  • CoordinatorDuty Interface
  • EmitClusterStatsAndMetrics

// This duty wanted to cancel the run. No log message, since the duty should have logged a reason.
return;
} else {
params.getCoordinatorStats().addToDutyStat("runtime", duty.getDutyAlias(), TimeUnit.NANOSECONDS.toMillis(end - start));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be ok to just use "duty.getClass().getName()" for this use case instead of extending the interface?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, I realized that would probably have been better after I looked at what I ended up with for alias values

@himanshug himanshug merged commit 2560bf0 into apache:master Nov 29, 2020
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@himanshug @capistrant I reviewed this PR as it is labeled as Design Review which requires 3 approvals including one from author himself. The PR LGTM.

@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
* Add new coordinator metrics for duty runtimes

* fix spelling for a constant variable value

* add comment clarifying why the global runtime metric is emitted where it is

* Remove duty alias in lieu of using the class name for metrics

* fix docs

* CoordinatorStats tests + add duty stats to accumulate() logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants