Skip to content

KAFKA-9106 make metrics exposed via jmx configurable#7674

Merged
cmccabe merged 3 commits intoapache:trunkfrom
xvrl:kafka-9106
Feb 13, 2020
Merged

KAFKA-9106 make metrics exposed via jmx configurable#7674
cmccabe merged 3 commits intoapache:trunkfrom
xvrl:kafka-9106

Conversation

@xvrl
Copy link
Copy Markdown
Member

@xvrl xvrl commented Nov 9, 2019

This change implements KIP-544

@xvrl xvrl closed this Nov 9, 2019
@xvrl xvrl reopened this Nov 9, 2019
@xvrl xvrl closed this Nov 12, 2019
@xvrl xvrl reopened this Nov 12, 2019
@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Nov 13, 2019

retest this please

@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Nov 13, 2019

cc @ijuma builds appear to be timing out

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Nov 13, 2019

retest this please

1 similar comment
@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Nov 14, 2019

retest this please

@xvrl xvrl force-pushed the kafka-9106 branch 2 times, most recently from 852711e to 591a1b0 Compare November 20, 2019 01:23
@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Nov 20, 2019

retest this please

Copy link
Copy Markdown
Contributor

@omkreddy omkreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl Thanks for the PR. LGTM. Can we also include configs to docs (may be add to Monitoring section: http://kafka.apache.org/documentation/#monitoring ) and add a upgrade note about existing KafkaMetricsReporter implementations need to make changes to explicitly depend on new Kafka internal classes.

Comment thread core/src/main/java/kafka/metrics/KafkaYammerMetrics.java Outdated
Comment thread checkstyle/import-control-core.xml Outdated
Comment thread clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java Outdated
@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Dec 19, 2019

Hi @xvrl, thanks for working on this. Sorry that the review is taking a while. We were busy with the 2.4 release during the last few weeks.

I think there are some fundamental problems with how this works in the context of clients. As I understand it, there is only one set of JMX metrics that the process as a whole exposes. This is why JmxReporter#LOCK is a static variable.

So if I have a process that contains both a producer and a consumer, and they both want configure blacklists / whitelists on JMX metrics, who wins? It seems like the code in this PR will just let the last constructed object "win" and overwrite the other object's JMX whitelist / blacklist settings. Clearly, this is not ideal, since we have a lot of users that create consumers, producers, or even admin clients all in the same process.

If we want to support this configuration on the client side, we need to be more sophisticated than this. For example, perhaps instance A of JmxReporter is configured to never create the Foo mbean. OK, fair enough-- but it should also not delete the Foo mbean that instance B of JmxReporter has created. Of course, the A instance should refrain from attaching its own attributes to the Foo mbean, as configured.

@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Dec 19, 2019

By the way, I would suggest refactoring JmxReporter into two classes: a MetricsReporter class, and a JmxRegistry class (or something with a similar name) that is a singleton. Then the MetricsReporter instances can register what bean attributes they want to hold on to, and JmxRegistry can keep track of when to create, remove, reconfigure, etc. the underlying mbeans.

This will avoid the pervasive confusion that we have had in the JmxReporter class between what is global state, and what is local to an individual MetricsReporter. This confusion has caused many other bugs such as the recent memory leak we fixed in the consumer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make the MetricsReporter interface itself extend Reconfigurable, rather than just JmxReporter. We might want to reconfigure other metrics reporters at runtime, and there is no reason to special-case just this one.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to add default implementation for Reconfigurable methods to the MetricsReporter interface, but that would theoretically break binary compatibility for existing plugins.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new method with default implementation to a Java interface doesn't break binary compatibility. See https://docs.oracle.com/javase/tutorial/java/IandI/defaultmethods.html

Default methods enable you to add new functionality to the interfaces of your libraries and ensure binary compatibility with code written for older versions of those interfaces.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is a separate question as to whether having a superclass implement a new interface is a binary-compatible change. Section 13.4.4 of the Java spec seems to imply that it is. https://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html#jls-13.4.4

Changing the direct superclass or the set of direct superinterfaces of a class type will not break compatibility with pre-existing binaries, provided that the total set of superclasses or superinterfaces, respectively, of the class type loses no members.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://wiki.eclipse.org/Evolving_Java-based_APIs_2#Evolving_API_Interfaces adding a default method is binary incompatible. The reasoning is that if the implementing class already implements a different interface with a default method matching the signature of the added new default methods it would break. In practice this will almost never be the case, but it is theoretically possible. I don't know how strict we are about those things in Kafka.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do consider adding new default methods to interfaces to be a compatible change in Kafka. I agree that it is theoretically possible for an existing implementation to contain a different method with the same name, and have an issue that way. But if we treat every method addition as a compatibility break, it defeats the whole point of default methods in Java. This would make it very hard to evolve interfaces and would lead to some pretty ugly hacks.

In practice people choose descriptive methods names for public interfaces which makes this is a non-issue. For example, I think the chances that an existing MetricsReporter implements reconfigurableConfigs, validateReconfiguration, or reconfigure are basically zero. We also know what most of the widely-deployed MetricsReporters are, so we can check them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricsReporters have been optionally reconfigurable since 1.1.0, before Java 7 support was dropped. Any metrics reporter implementation that implements Reconfigurable is reconfigured by the broker when any of its configs changes. Since then, we have adopted the same approach for other interfaces like Authorizer as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajinisivaram do I understand correctly that you suggest we leave the Reconfigurable interface optional then?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl: The only reason the interface was optional is that we needed to retain support for Java 7, which didn't have default methods. Since we don't have to worry about Java 7 any more, let's make the interface part of the type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I misundertood; I updated the interface to be Reconfigurable and added default methods for backwards compatibility.

Comment thread clients/src/main/java/org/apache/kafka/common/metrics/JmxReporter.java Outdated
Comment thread clients/src/test/java/org/apache/kafka/common/metrics/JmxReporterTest.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/metrics/JmxReporter.java Outdated
Comment thread core/src/main/scala/kafka/server/KafkaServer.scala Outdated
Comment thread core/src/test/scala/unit/kafka/server/AbstractFetcherThreadTest.scala Outdated
@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Dec 19, 2019

Thanks @cmccabe

So if I have a process that contains both a producer and a consumer, and they both want configure blacklists / whitelists on JMX metrics, who wins?

As far as I understand, each client has their own Kafka Metrics and JmxReporter instances, so they each manage their own configuration and individually track the set of metrics that are masked from JMX. If there are mbeans name collisions across clients, then those jmx metrics would already be meaningless, since each instance computes metrics individually.

As I understand it, there is only one set of JMX metrics that the process as a whole exposes. This is why JmxReporter#LOCK is a static variable.

I don't think there is a reason for this variable to be static, each client instance should be able to do its own locking since they manage their own metrics. You may be confusing it with Yammer metrics on the server, which are global, but those are not managed by JmxReporter.

@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Dec 20, 2019

As far as I understand, each client has their own Kafka Metrics and JmxReporter instances, so they each manage their own configuration and individually track the set of metrics that are masked from JMX. If there are mbeans name collisions across clients, then those jmx metrics would already be meaningless, since each instance computes metrics individually.

It is true that each client has its own org.apache.kafka.common.metrics.Metrics object, and those in turn have JmxReporter objects. But JMX metrics as a whole are global. There is one set of beans which is registered (or not) for the whole process.

I do think it's a good idea that different clients should not use the same mbean names, but there is no actual enforcement of that. We do know that some clients create other clients (like how streams creates producers, consumers, and admin clients). I wouldn't be surprised if there was at least some similarity in naming there.

I don't think there is a reason for this variable to be static, each client instance should be able to do its own locking since they manage their own metrics. You may be confusing it with Yammer metrics on the server, which are global, but those are not managed by JmxReporter.

I think we're talking about different things. You're describing using a Java/Scala singleton vs. not using that. I was talking about the fact that the set of beans which are registered is inherently a global property. I think global properties need to be represented appropriately in the code or else it will lead to confusion. Hence the static lock makes sense conceptually.

Relevant: https://issues.apache.org/jira/browse/KAFKA-1304

@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Dec 20, 2019

I was talking about the fact that the set of beans which are registered is inherently a global property. I think global properties need to be represented appropriately in the code or else it will lead to confusion. Hence the static lock makes sense conceptually.

I understand mbeans are global, but the lock only helps to synchronize updates to instance variables, in this case the local mbean map. Globally synchronizing access the the MBeanServer does not achieve anything.

If two clients registers the same mbean names, clients are still going to clobber each-other's mbeans, global lock or not, and there is no way to know which beans correspond to which client. We might as well reduce global lock contention and make the lock local.

Each client will bring its own metrics config and should be able to decide which metrics to expose individually. The fact that we have colliding bean names is not a new problem, and this change is not trying to solve that.

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl Thanks for the PR, looks good. Left a couple comments about configs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricsReporters have been optionally reconfigurable since 1.1.0, before Java 7 support was dropped. Any metrics reporter implementation that implements Reconfigurable is reconfigured by the broker when any of its configs changes. Since then, we have adopted the same approach for other interfaces like Authorizer as well.

Comment thread clients/src/main/java/org/apache/kafka/common/metrics/JmxReporter.java Outdated
@xvrl xvrl force-pushed the kafka-9106 branch 3 times, most recently from cd849e0 to d6b9ad3 Compare February 9, 2020 00:49
@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Feb 12, 2020

LGTM once we add the Reconfigurable interface as discussed

@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Feb 12, 2020

retest this please

@xvrl
Copy link
Copy Markdown
Member Author

xvrl commented Feb 12, 2020

@cmccabe I think the failing tests are unrelated, seems to be a known issue based on #6561

@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Feb 13, 2020

Agreed, the test failures are flaky tests. LGTM

@cmccabe cmccabe merged commit 7e1c39f into apache:trunk Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants