Skip to content

KAFKA-8934: Introduce instance-level metrics for streams applications#7416

Merged
guozhangwang merged 5 commits intoapache:trunkfrom
cadonna:add_client_level_metrics
Oct 5, 2019
Merged

KAFKA-8934: Introduce instance-level metrics for streams applications#7416
guozhangwang merged 5 commits intoapache:trunkfrom
cadonna:add_client_level_metrics

Conversation

@cadonna
Copy link
Copy Markdown
Member

@cadonna cadonna commented Sep 30, 2019

  • Moves StreamsMetricsImpl from StreamThread to KafkaStreams
  • Adds instance-level metrics as specified in KIP-444, i.e.:
    -- version
    -- commit-id
    -- application-id
    -- topology-description
    -- state

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Member Author

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call for review: @guozhangwang @vvcephei @bbejeck

Sorry for the large PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I add the instance-level metrics.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamsMetricsImpl is now created at client level and not on thread level anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore since the RocksDBMetricsRecordingTrigger is contained in StreamsMetricsImpl.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threadId needs to be passed to those methods since StreamsMetricsImpl was moved to KafkaStreams and cannot contain the thread ID anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifications of state, topology description, and application ID metrics.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification of state metric.

@cadonna
Copy link
Copy Markdown
Member Author

cadonna commented Sep 30, 2019

JDK 11/Scala 2.13 and JDK 8/Scala 2.11 failed due to OOM.

In JDK 11/Scala 2.12, the following test failed:

kafka.admin.LeaderElectionCommandTest.testPathToJsonFile

Retest this, please

@cadonna
Copy link
Copy Markdown
Member Author

cadonna commented Oct 1, 2019

In JDK 8/Scala 2.11, the following test failed:

kafka.api.SaslOAuthBearerSslEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

Retest this, please

@bbejeck bbejeck added the streams label Oct 1, 2019
Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the Metrics PR @cadonna. I've made a pass, overall looks good, I just have minor comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we log the stacktrace here? IMHO it's better for debugging.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

FYI @cadonna , all you have to do is:

Suggested change
log.warn("Error while loading kafka-streams-version.properties: {}", exception.getMessage());
log.warn("Error while loading kafka-streams-version.properties", exception);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to use StreamsMetricsImpl in several places, would it be better to accept a type StreamsMetrics and cast it internally to StreamsMetricsImpl? Same comment applies to here and elsewhere. I might be missing some context here though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamsMetrics is our public interface to interact with metrics. StreamsMetricsImpl is our internal implementation that also has methods to create our built-in metrics. As far as I know, we pass StreamsMetricsImpl only to internal objects where we need the methods for built-in metrics. A Streams user would only see the StreamsMetrics interface. Said that, I think it is fine as it is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents: it's better not to cast. If we need the "internal interface" (i.e., Impl) here, then we should simply declare that fact in the parameter type. Then, the calling code would always know it has to pass an Impl, and we wouldn't ever get surprise ClassCastExceptions in tests or runtime code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: break args on a separate line

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: Maybe add a brief comment that this is used for metrics like app-id etc. I needed to do a search through the code to see why it was necessary.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to rename the class to ImmutableMetricValue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: make string task a public final variable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed in PR #7429.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here for string literals and below, I won't repeat this comment anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will tackle this in the PR for the store-level metrics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is multi-line here for checkstyle to be happy? Honestly I like the previous one as it is easier to compare multi-lines in case there's any parameter mis-set, would be great if we can get back to oneliner.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not checkstyle, but our coding style guidelines. I see what you mean. I guess we should refactor this anyways to hide most of those parameters behind a method. Looking at the other refactorings, this should become something along the lines of putLatencySensor(threadId, taskId, metricsScope, storeName) which would fit again on one line.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parameter on separate line

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parameters on separate line

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just going out on a limb here... should we also log the topology description here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought, we already do, but I couldn't find any output in a log file that I have locally. I would be in favour of logging the topology because it helps us during on-call. On the other hand, it may pollute the logs. Anyways, could we postpone this discussion to after the release?

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of quick comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

FYI @cadonna , all you have to do is:

Suggested change
log.warn("Error while loading kafka-streams-version.properties: {}", exception.getMessage());
log.warn("Error while loading kafka-streams-version.properties", exception);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be more like "InstanceMetrics" than "client" metrics, where "client" usually means a Producer or Consumer. In Streams, we sometimes conflate "client" with "thread" because each StreamThread has exactly one Consumer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In KIP-444, "client" is used instead of "instance". Also the tag for instance-level is "client-id" whereas for thread it will be "thread-id" with PR #7429. I agree that clients is somehow overloaded, but since Streams is also a client (that uses other clients) it seems consistent to me.

@cadonna cadonna force-pushed the add_client_level_metrics branch from 73ab316 to bf3ec27 Compare October 2, 2019 10:21
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always right? It would set the threadId as the name of the thread that constructed the sensor, but is this always the StreamThread? Even if it is today, it wouldn't be outrageous to think in the future that another thread might build a task and then pass it to the thread to execute. Such a refactoring would break this logic, but it would be very subtle.

In contrast, the old code here explicitly passed the thread name via the context, so it would never be set incorrectly just by changing the way that the code is invoked. Perhaps we could preserve this property, for example by passing the thread name as a method argument.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked the code path, passing the thread name via GlobalStreamThread / StreamThread -> StreamTask -> ProcessorContextImpl maybe fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvcephei, you are right. This is not general enough. On thread-level -- for instance -- sensors are created by the main thread and then used in the stream thread. Here, instead of passing it through InternalProcessorContext, I would pass it as a separate parameter to lateRecordDropSensor(). According to KIP-444 late-record-drop will be superseded with dropped-records and lateRecordDropSensor() will most probably disappear. Actually, since it is currently not a problem at all because the calling thread is the stream thread, I would leave it as it is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds fine to me, as long as we're aware of the risk.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents: it's better not to cast. If we need the "internal interface" (i.e., Impl) here, then we should simply declare that fact in the parameter type. Then, the calling code would always know it has to pass an Impl, and we wouldn't ever get surprise ClassCastExceptions in tests or runtime code.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a pass. One meta comment: lots of LOCs are actually extending the single line to multi-lines because of the insertion of thread-id, if it can be avoided I'd prefer the old way since if there are multi sensors to be added together it's actually easier to review.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the other PR is merged, this part can be leveraged / rebased, just a reminder :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked the code path, passing the thread name via GlobalStreamThread / StreamThread -> StreamTask -> ProcessorContextImpl maybe fine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rely on ImmutableValue from o.a.k.common?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableMetricValue is basically a copy of ImmutableValue. ImmutableValue in o.a.k.common is package private and within AppInfoParser. Additionally, I needed to add hashCode() and equals() for testing purposes. What we can do after the release is refactoring the ImmutableValue in common and use it in streams.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is multi-line here for checkstyle to be happy? Honestly I like the previous one as it is easier to compare multi-lines in case there's any parameter mis-set, would be great if we can get back to oneliner.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.

FYI: We have have a bug here. We use fetch instead of get and users have already asked a question about this on Stackoverflow. KAFKA-8906 documents this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, I think besides the threadId having other paired values in oneline is better to read / review.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is refactored in PR #7429.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In KIP-444 the one below would be stream-thread-metrics.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is refactored in the thread-level refactoring PR that I could not yet open because it is based on PR #7429.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learned new ways of functional interfaces :)

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Oct 3, 2019

Java 11/2.12 failed and Java 8 failed; results already cleared out

retest this please

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Oct 3, 2019

@cadonna failures seem related Execution failed for task ':streams:compileTestJava'.

@guozhangwang
Copy link
Copy Markdown
Contributor

The new commit lgtm. Leave to @bbejeck to merge.

Let's do this one and then #7429

@cadonna cadonna force-pushed the add_client_level_metrics branch from bf3ec27 to 64a5fba Compare October 3, 2019 20:06
- Moves `StreamsMetricsImpl` from `StreamThread` to `KafkaStreams`
- Adds instance-level metrics as specified in KIP-444, i.e.:
-- version
-- commit-id
-- application-id
-- topology-description
-- state
Also adds comments to tests that possibly fail when started from an IDE due
to the missing version file on the classpath.
Fixes compile error introduced by merging apache#5527
@cadonna cadonna force-pushed the add_client_level_metrics branch from ece0eda to ab3c043 Compare October 4, 2019 08:23
@cadonna
Copy link
Copy Markdown
Member Author

cadonna commented Oct 4, 2019

In JDK 11/Scala 2.12, the following test failed:

kafka.api.GroupAuthorizerIntegrationTest.testConsumeWithoutTopicDescribeAccess

In JDK 11/Scala 2.13, the following test failed:

org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector

In JDK 8/Scala 2.11, the following test failed:

kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

Retest this, please

@guozhangwang
Copy link
Copy Markdown
Contributor

Jenkins failures are irrelevant.

@guozhangwang guozhangwang merged commit 52007e8 into apache:trunk Oct 5, 2019
@guozhangwang
Copy link
Copy Markdown
Contributor

Merged to trunk.

@mjsax mjsax added the kip Requires or implements a KIP label Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kip Requires or implements a KIP streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants