KAFKA-2668; Add a metric that records the total number of metrics by lindong28 · Pull Request #328 · apache/kafka

lindong28 · 2015-10-19T05:37:21Z

@onurkaraman @becketqin Do you have time to review this patch? It addresses the ticket that @jjkoshy filed in KAFKA-2668.

onurkaraman · 2015-10-19T07:49:47Z

I think this may cause problems when there are multiple clients. I ran a sample application that makes two KafkaConsumers. Here's what jconsole shows me:

kafka.consumer
| app-info
  | consumer-1
    | Attributes
      | ...
  | consumer-2
    | Attributes
      | ...
| consumer-coordinator-metrics
  | consumer-1
    | Attributes
      | heartbeat-rate
      | ...
  | consumer-2
    | Attributes
      |  heartbeat-rate
      |  ...
| consumer-fetch-manager-metrics
  | consumer-1
    | Attributes
      | ...
  | consumer-2
    | Attributes
      | ...
| consumer-metrics
  | consumer-1
    | Attributes
      | ...
  | consumer-2
    | Attributes
      | ...
| consumer-node-metrics
  | consumer-1
    | node--1
      | Attributes
        | ...
    | ...
  | consumer-2
    | node--1
      | Attributes
        | ...
    | ...
| metrics-stats
  | Attributes
    | metrics-total

Basically, metrics-stats isn't scoped to a client-id while the others were. I think what ends up happening is the two clients might be writing their own metrics count to the same metrics-total mbean. So if one consumer had x metrics while another consumer had y metrics, the metrics-total could for instance jump back forth between x and y.

Below is the same application:

/**
 * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE
 * file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file
 * to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
 * License. You may obtain a copy of the License at
 * 
 * http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
 * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
 * specific language governing permissions and limitations under the License.
 */
package org.apache.kafka.clients.consumer;

import java.util.Arrays;
import java.util.Properties;

public class MetricOfMetricsConsumer {
    public static void main(String[] args) {
        KafkaConsumer<String, String> consumer1 = makeConsumer();
        KafkaConsumer<String, String> consumer2 = makeConsumer();
        consumer1.subscribe(Arrays.asList("t"));
        consumer2.subscribe(Arrays.asList("t"));
        while(true) {
            consumer1.poll(1000);
            consumer2.poll(1000);
        }
    }

    public static KafkaConsumer<String, String> makeConsumer() {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("group.id", System.currentTimeMillis() + "");
        props.put("partition.assignment.strategy", "roundrobin");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        return new KafkaConsumer<>(props);
    }
}

The one mbean ObjectName is:
kafka.consumer:type=metrics-stats

onurkaraman · 2015-10-19T17:41:40Z

So yeah I think the original intent was to just have this metric for a broker, which made me think it was odd putting this in the Metrics constructor. But it kind of makes sense to have this for consumers and producers too.

There are some ways to address this bug:

allow an optional tag in the Metrics constructor.
push the creation of this metric of metrics out to KafkaServer, KafkaConsumer, and KafkaProducer (a bit of duplicate logic).

lindong28 · 2015-10-20T04:39:33Z

@onurkaraman Thanks much for taking time to test it! I have updated the patch to address your comments. The updated patch use the client-id in metricsTags in the metricName. And I have tested the patch with broker, consumer and producer. Please let me know if the updated patch looks good.

onurkaraman · 2015-10-20T05:22:44Z

LGTM.

I think your revised patch did the trick. jconsole now shows:

kafka.consumer
| ...
| metrics-stats
  | consumer-1
    | Attributes
      | metrics-total
  | consumer-2
    | Attributes
      | metrics-total

The two mbean ObjectNames now are:
kafka.consumer:type=metrics-stats,client-id=consumer-1
kafka.consumer:type=metrics-stats,client-id=consumer-2

If we want this also for MirrorMaker (once it switches to new consumer and producer), I'm not really sure how that'd work, but I think this is probably good for now.

lindong28 · 2015-10-20T05:38:13Z

Great. Thanks for confirming that the patch works.

onurkaraman · 2015-10-20T07:16:42Z

Actually, while things look good from the perspective of clients that fully depend on kafka's metrics system, it doesn't quite seem right from the broker's perspective. It only reports the number of org.apache.kafka.common.metrics and doesn't factor in the yammer metrics, which I think is misleading.

This is okay in the long run since we will migrate all of the sensors away from yammer, but it's unclear what the right thing to do now is.

lindong28 · 2015-10-21T07:04:59Z

@onurkaraman I have updated the patch such that the new metric shows the total number of attributes of all mbeans in the MBean server. This may include some MBean that is not directly registerd via Kafka, e.g. java.lang:type=Memory:ObjectName. But the semantics of this API is clear and it should solve the problem that motivates this ticket.

jjkoshy · 2015-10-23T16:48:37Z

Let me think about this for a bit more - originally I thought we should only count the metrics under the kafka-metrics package. I think we should eventually move away from yammer metrics but I think the latest approach of just reporting the total number of mbeans may subsume that and may be useful by itself.

onurkaraman · 2015-10-23T16:56:12Z

I think recording the total number of mbeans is probably not right. In terms of debugging, we care about how many metrics are coming from kafka. Recording the total number of mbeans doesn't really help us if there's a ton of mbeans being made outside of kafka.

For instance, let's say an application has two clients: a KafkaConsumer and a client for some other KV store. If the KV store client mbeans start going through the roof, it can incorrectly get classified as a kafka problem.

lindong28 · 2015-10-23T17:40:33Z

@onurkaraman I think the total number of mbeans is generally useful even in your use case -- it means something is going wrong if the number goes through roof. Importantly, it addresses the usecase that motivates this patch, i.e. the number of mbeans registered on the server side.

If you think we need to specifically record the number of mbeans registered by kafka threads, we can add a filter in the query to only record those mbeans named kafka. But I think it is actually useful to report total number of mbeans from user's perspective.

PS. the semantics of this metric is total number of mbeans registered by the program, not total number of mbeans registered by kafka threads. Thus it should not simply get classified as a kafka problem in your example.

jjkoshy · 2015-10-26T16:03:18Z

I agree that it may be useful to record and report the total number of attributes. The main concern I have in the current approach is that each lookup for this metric (from some metric pulling system) will involve a traversal of the mbean registry which is a bit much if the lookup happens frequently. There are a couple of ways to avoid this/improve it:

Cache the count and avoid updating the count unless a certain period (e.g., a few minutes) has elapsed
Trap mbean registrations/deregistrations and increment/decrement counts there
Don’t count all mbean attributes - instead, only count Kafka metric attributes. i.e., back to your earlier approach.

lindong28 · 2015-10-28T23:34:11Z

@jjkoshy I think the updated the patch should address the concern raised by you and Onur. Thanks!

onurkaraman · 2015-10-30T08:09:38Z

My bad. I was again going to say LGTM but noticed that your change makes kafka-clients depend on yammer metrics, which I think we wanted to avoid.

onurkaraman · 2015-10-30T15:11:56Z

Given the dependency constraint, I think the best approach would be to have a kafka-metrics-total and yammer-metrics-total.

kafka-metrics-total can get created within Metrics
yammer-metrics-total can get made in KafkaServer on startup

kafka-metrics-total could use metrics.size()
yammer-metrics-total could use the com.yammer.metrics.Metrics.defaultRegistry().groupedMetrics().values() trick.

lindong28 · 2015-10-31T00:48:33Z

@onurkaraman @jjkoshy @becketqin Thanks much for taking time to review the patch! I have updated the patch so that we don't introduce yammer dependency to new producer and new consumer. With the latest patch, server will have 2 additional jmx metrics with name kafka-metrics-total and yammer-metrics-total. New producer and new consumer will have 1 additional jmx metrics named kafka-metrics-total. Old consumer will have 1 additional metrics named yammer-metrics-total.

I have tested the patch with server, new producer and old consumer. The patch works as expected. Can you please have another look?

lindong28 · 2015-11-02T06:17:01Z

Test failure is unrelated to this patch.

lindong28 · 2015-11-23T04:52:03Z

@guozhangwang Would you have time to review this patch? @onurkaraman has reviewed this patch. @jjkoshy would be on leave soon.

guozhangwang · 2015-11-25T01:30:04Z

Went through the code and it looks good to me overall, but I would like to borrow another pair of eyes from @junrao to take a look as my knowledge in this module is shallow.

jjkoshy · 2015-11-25T17:42:08Z

groupedMetrics creates a new TreeMap on every invocation. Also, the conversion makes yet another copy of these in scala. Can you instead just do defaultRegistry().allMetrics().size()? That does not create any unnecessary intermediate copies.

lindong28 · 2015-11-28T03:31:42Z

Thank you @guozhangwang @jjkoshy for your review. I have updated the patch as suggested.

junrao · 2015-11-28T17:04:57Z

Currently, we add the default tag (e.g, client-id) explicitly for each metric. With the default tags, I am wondering if it's simpler to add the following method in Metrics that automatically adds the default tags in the metric name. Then, we can use that method to create the metric name and remove the code that explicitly sets the default tags.

MetricName metricName(...)

Yes it is a good point and should remove quite some lines of code. But this will touch a lot of files and potentially cause conflict with other commits. Let me give it a shot.

lindong28 · 2015-11-29T07:48:03Z

@junrao My last commit remove the code that explicitly sets the default tags, and instead use metrics.metricName(name, group, description) to create MetricName with default tags.

I still kept the explicit usage of metricTags in Selector because ControllerChannelManager.addNewBroker(Broker) needs the flexibility to changes tags.

Could you have a look and let me know if this is what you originally suggested? Thanks!

junrao · 2015-11-29T17:09:20Z

Instead of using MeasurableStat(), it seems that we can just use Measurable like the following.

new Measurable() {
public double measure(MetricConfig config, long now) {
return metrics.size();
}

Ah I misunderstood your comment on this earlier. Fixed now.

guozhangwang · 2015-11-30T17:33:52Z

@lindong28 Seems some of the indentation does not match check-style rules in the Jenkins job, could you verify?

lindong28 · 2015-11-30T17:53:07Z

@guozhangwang I was wondering why the Jenkins test failed without providing failure information. I have fixed indention and remove unused imports. Could you have a look? Thanks!

lindong28 · 2015-12-09T02:33:49Z

@junrao Great! Thanks so much for your time and review. I have removed a bunch of unnecessary space and rebased the patch against trunk.

junrao · 2015-12-09T03:39:41Z

Thanks for the patch. LGTM

guozhangwang · 2015-12-09T17:28:18Z

@lindong28 I got some warnings when compiling with this patch, could you take a look? Seems "<>" cannot be directly used in comment link:

/Users/guozhang/Workspace/github/guozhangwang/kafka-work/clients/src/main/java/org/apache/kafka/common/MetricName.java:118: warning - Tag @link:illegal character: "60" in "org.apache.kafka.common.metrics.Metrics#metricName(String, String, Map<String, String>)"
/Users/guozhang/Workspace/github/guozhangwang/kafka-work/clients/src/main/java/org/apache/kafka/common/MetricName.java:118: warning - Tag @link:illegal character: "62" in "org.apache.kafka.common.metrics.Metrics#metricName(String, String, Map<String, String>)"
/Users/guozhang/Workspace/github/guozhangwang/kafka-work/clients/src/main/java/org/apache/kafka/common/MetricName.java:118: warning - Tag @link: can't find metricName(String, String, Map<String, String>) in org.apache.kafka.common.metrics.Metrics
/Users/guozhang/Workspace/github/guozhangwang/kafka-work/clients/src/main/java/org/apache/kafka/common/MetricName.java:77: warning - Tag @link:illegal character: "60" in "org.apache.kafka.common.metrics.Metrics#metricName(String, String, String, Map<String, String>)"
/Users/guozhang/Workspace/github/guozhangwang/kafka-work/clients/src/main/java/org/apache/kafka/common/MetricName.java:77: warning - Tag @link:illegal character: "62" in "org.apache.kafka.common.metrics.Metrics#metricName(String, String, String, Map<String, String>)"
/Users/guozhang/Workspace/github/guozhangwang/kafka-work/clients/src/main/java/org/apache/kafka/common/MetricName.java:77: warning - Tag @link: can't find metricName(String, String, String, Map<String, String>) in org.apache.kafka.common.metrics.Metrics

lindong28 · 2015-12-09T18:12:13Z

@guozhangwang Thanks for the catch. I didn't notice the warning message.. I have created a minor pull request at #651. Can you take a look?

…e#328) TICKET = KAFKA-13797 LI_DESCRIPTION = In the past, we found that some brokers in the venice cluster went through heavy load due to excessive Metadata requests. This PR adds a metric to show the rate of metadata outgoing bytes per second for easier troubleshooting of such issues. EXIT_CRITERIA = When KAFKA-13797 is resolved and the changes are pulled into this repo.

lindong28 force-pushed the KAFKA-2668 branch from 60c923c to d83f19c Compare November 23, 2015 04:50

jjkoshy reviewed Nov 25, 2015
View reviewed changes

junrao reviewed Nov 28, 2015
View reviewed changes

lindong28 force-pushed the KAFKA-2668 branch from 75cb495 to b1acb1a Compare November 29, 2015 04:52

junrao reviewed Nov 29, 2015
View reviewed changes

lindong28 added 16 commits December 8, 2015 18:23

KAFKA-2668; Add a metric that records the total number of metrics

4b160a8

fix test case error

2075b20

address comment

97786d3

avoid use of sensor

e32bae9

remove the code that explicitly sets the default tags

7a1cd7c

replace new MetricName(..) with metrics.MetricName(..) in non-test code

56022f3

fix indention and remove unused import

9902c4d

avoid passing the default tags to metricsTag

b6aec2c

create MetricName via Metrics

c122301

rebase again trunk

5726ef3

fix test case failure

ba7cd87

keep metricName constructors since they are part of public api

5c79c68

update java doc for deprecated functions

ed96441

improve comment

c65cffc

remove unnecessary space

2e5621e

update QuotasTest for rebase

a9cc9fc

lindong28 force-pushed the KAFKA-2668 branch from 56d51cd to a9cc9fc Compare December 9, 2015 02:30

asfgit closed this in ef92a8a Dec 9, 2015

lindong28 deleted the KAFKA-2668 branch December 9, 2015 17:54

efeg pushed a commit to efeg/kafka that referenced this pull request Jan 29, 2020

Fix the bug in hard goal sanity check. (apache#328)

91858af

davide-armand pushed a commit to aiven/kafka that referenced this pull request Dec 1, 2025

chore(ci): add inkless version branches to CI nightly (apache#328)

8efc80d

jeqo added a commit to aiven/kafka that referenced this pull request Jan 16, 2026

chore(ci): add inkless version branches to CI nightly (apache#328)

6437485

Conversation

lindong28 commented Oct 19, 2015

Uh oh!

onurkaraman commented Oct 19, 2015

Uh oh!

onurkaraman commented Oct 19, 2015

Uh oh!

lindong28 commented Oct 20, 2015

Uh oh!

onurkaraman commented Oct 20, 2015

Uh oh!

lindong28 commented Oct 20, 2015

Uh oh!

onurkaraman commented Oct 20, 2015

Uh oh!

lindong28 commented Oct 21, 2015

Uh oh!

jjkoshy commented Oct 23, 2015

Uh oh!

onurkaraman commented Oct 23, 2015

Uh oh!

lindong28 commented Oct 23, 2015

Uh oh!

jjkoshy commented Oct 26, 2015

Uh oh!

lindong28 commented Oct 28, 2015

Uh oh!

onurkaraman commented Oct 30, 2015

Uh oh!

onurkaraman commented Oct 30, 2015

Uh oh!

lindong28 commented Oct 31, 2015

Uh oh!

lindong28 commented Nov 2, 2015

Uh oh!

lindong28 commented Nov 23, 2015

Uh oh!

guozhangwang commented Nov 25, 2015

Uh oh!

jjkoshy Nov 25, 2015

Choose a reason for hiding this comment

Uh oh!

lindong28 commented Nov 28, 2015

Uh oh!

junrao Nov 28, 2015

Choose a reason for hiding this comment

Uh oh!

lindong28 Nov 29, 2015

Choose a reason for hiding this comment

Uh oh!

lindong28 commented Nov 29, 2015

Uh oh!

junrao Nov 29, 2015

Choose a reason for hiding this comment

Uh oh!

lindong28 Nov 29, 2015

Choose a reason for hiding this comment

Uh oh!

guozhangwang commented Nov 30, 2015

Uh oh!

lindong28 commented Nov 30, 2015

Uh oh!

lindong28 commented Dec 9, 2015

Uh oh!

junrao commented Dec 9, 2015

Uh oh!

guozhangwang commented Dec 9, 2015

Uh oh!

lindong28 commented Dec 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants