Skip to content

KAFKA-15129;[3/N] Remove metrics in AbstractFetcherManager when fetcher manager instance shutdown#13929

Closed
hudeqi wants to merge 9 commits intoapache:trunkfrom
hudeqi:clean_metric_replicaManager
Closed

KAFKA-15129;[3/N] Remove metrics in AbstractFetcherManager when fetcher manager instance shutdown#13929
hudeqi wants to merge 9 commits intoapache:trunkfrom
hudeqi:clean_metric_replicaManager

Conversation

@hudeqi
Copy link
Copy Markdown
Contributor

@hudeqi hudeqi commented Jun 29, 2023

This pr is used to remove the metrics in AbstractFetcherManager when ReplicaFetcherManager/ReplicaAlterLogDirsManager instance shutdown.
This pr has passed the corresponding unit test, and it is part of KAFKA-15129.

@hudeqi hudeqi marked this pull request as draft June 29, 2023 11:04
@clolov clolov self-requested a review June 29, 2023 12:37
@clolov clolov added the core Kafka Broker label Jun 29, 2023
@hudeqi hudeqi marked this pull request as ready for review June 29, 2023 16:46
@hudeqi hudeqi marked this pull request as draft June 30, 2023 04:28
@hudeqi hudeqi marked this pull request as ready for review June 30, 2023 08:16
Copy link
Copy Markdown
Contributor

@clolov clolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the other two PRs I have a preference for variable names to start with a lowercase unless there is a reason (i.e. a convention) for them to be with uppercase, but beyond that, this seems reasonable to me.

@hudeqi
Copy link
Copy Markdown
Contributor Author

hudeqi commented Jun 30, 2023

And please help to review this @divijvaidya

Copy link
Copy Markdown
Member

@divijvaidya divijvaidya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these code changes @hudeqi. I have seen cases of OOM caused by metric leaks in production and the changes you are making go a long way towards ensure safe code.

Comment thread core/src/main/scala/kafka/server/ReplicaAlterLogDirsManager.scala Outdated
Comment thread core/src/main/scala/kafka/server/AbstractFetcherManager.scala Outdated
Comment thread core/src/main/scala/kafka/server/ReplicaAlterLogDirsManager.scala Outdated
Comment thread core/src/main/scala/kafka/server/ReplicaFetcherManager.scala Outdated
@hudeqi hudeqi closed this Jul 5, 2023
@hudeqi hudeqi deleted the clean_metric_replicaManager branch July 5, 2023 03:28
@hudeqi hudeqi reopened this Jul 5, 2023
@hudeqi hudeqi requested a review from divijvaidya July 5, 2023 03:33
Comment thread core/src/main/scala/kafka/server/AbstractFetcherManager.scala Outdated
@hudeqi hudeqi requested a review from divijvaidya July 6, 2023 15:12
@hudeqi
Copy link
Copy Markdown
Contributor Author

hudeqi commented Jul 10, 2023

Are there any hidden problems in this PR? @divijvaidya

@divijvaidya
Copy link
Copy Markdown
Member

I will get to all your PRs on metrics close by end of this week @hudeqi. Right now I am focusing on releasing 3.5.1.

@hudeqi
Copy link
Copy Markdown
Contributor Author

hudeqi commented Jul 17, 2023

I will get to all your PRs on metrics close by end of this week @hudeqi. Right now I am focusing on releasing 3.5.1.

@divijvaidya hi, reminding~

@divijvaidya
Copy link
Copy Markdown
Member

I still need to get 3.5.1 out and focus on KIP-405 PRs (targeting 3.6 release) before I get some more time @hudeqi!

@hudeqi
Copy link
Copy Markdown
Contributor Author

hudeqi commented Aug 4, 2023

I still need to get 3.5.1 out and focus on KIP-405 PRs (targeting 3.6 release) before I get some more time @hudeqi!

Hi, do you have time to review this PR now? I want to combine all the currently opened PRs about metric leaks into one PR. I don’t know if this will help or burden your review? @divijvaidya

@hudeqi
Copy link
Copy Markdown
Contributor Author

hudeqi commented Sep 19, 2023

@nizhikov Hi, can you help review the PR of this series? Looks like @divijvaidya doesn't have time.

@nizhikov
Copy link
Copy Markdown
Contributor

Hello. I will review this PR in the nearest time.

private val metricsGroup = new KafkaMetricsGroup(this.getClass)

// Visible for test
private[server] var metricNamesToTags: Map[String, java.util.Map[String, String]] = Map.empty
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use mutable map like it's done for fetcherThreadMap?
Or even Set<MetricName>, see comment below

}, tags)
private[server] def newMetrics(): Unit = {
val tags = Map("clientId" -> clientId).asJava
metricsGroup.newGauge(MaxLagMetricName, () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use

Set<MetricName> metrics = ...;

MetricName name = metricGroup.metricName(MaxLagMetricName, tags);
KafkaYammerMetrics.defaultRegistry().newGauge(name, () => { ... });
metrics.add(name);

And store explicit MetricName instead of implicit name and tags.

metrics.forEach(KafkaYammerMetrics.defaultRegistry()::removeMetric);
`` `

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have metrigGroup#meter and other methods that takes MetricName as parameter.
It's look convinient to create one for Gauge.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, do we really need metricNamesToTags set. It's just 4 metric so may be it will be more clear to create explicit MetricName for each and use them while closing fetcher manager.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @nizhikov , thank you for taking the time to review. The significance of metricNamesToTags is that it can be used in the unit test to check whether all the metrics of the corresponding module are removed when closed, to prevent someone from adding new metrics but forgetting remove, with this collection, you only need to add the new metric to this collection.
Removing the leaking metrics is a long and potentially fixed process, as can be seen from the original reviews reached with @divijvaidya.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you can use MetricName to check metrics registered and removed during test.

@nizhikov
Copy link
Copy Markdown
Contributor

Hello. @hudeqi , thanks for the PR.
Left couple of comments.

@hudeqi hudeqi closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Kafka Broker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants