Skip to content

KAFKA-12272: Fix commit-interval metrics#10102

Merged
mjsax merged 3 commits intoapache:trunkfrom
mjsax:kafka-12272-fix-commit-latency-metric
Feb 12, 2021
Merged

KAFKA-12272: Fix commit-interval metrics#10102
mjsax merged 3 commits intoapache:trunkfrom
mjsax:kafka-12272-fix-commit-latency-metric

Conversation

@mjsax
Copy link
Copy Markdown
Member

@mjsax mjsax commented Feb 10, 2021

Call for review @ableegoldman

Fixes a regression bug. Should be cherry-picked back to 2.6 branch.

@mjsax mjsax added the streams label Feb 10, 2021

log.debug("{} punctuators ran.", punctuated);

final long beforeCommitTs = now;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such a small nit that I don't think it's worth pushing a fix and restarting the build, but should this be beforeCommitMs rather than Ts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to address.

final int committed = maybeCommit();
totalCommittedSinceLastSummary += committed;
final long commitLatency = advanceNowAndComputeLatency();
final long commitLatency = Math.max(now - beforeCommitTs, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also kind of a nit, since technically this does work, but wouldn't it make more sense to just remove the advanceNowAndComputeLatency call in maybeCommit, and then just call advancedNowAndComputeLatency here as before? Otherwise we're just computing the latency inside maybeCommit for no reason, and throwing out the result.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we need to advance time within maybeCommit() to make sure we compute lastCommitMs correctly (what was exactly the fix that introduced the commit-latency metric regression).

Copy link
Copy Markdown
Member Author

@mjsax mjsax Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could replace advancedNowAndComputeLatency by now = time.milliseconds() in maybeCommit() though. -- In the original PR, we just called advancedNowAndComputeLatency to keep the patter of a single place in the code that does call time.milliseconds

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But currently we call advanceNowAndComputeLatency pretty much at the end of the maybeCommit method, so there shouldn't be a noticeable difference between advancing now at the end of maybeCommit vs advancing it immediately after maybeCommit returns, right?
I'm also ok with just advancing now inside maybeCommit. The main thing that felt off was just that we compute the latency for no reason inside maybeCommit, and ignore the result

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and similarly move the lastCommitMs = now to after maybeCommit returns)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there shouldn't be a noticeable difference between advancing now at the end of maybeCommit vs advancing it immediately after maybeCommit returns, right?

Well, not from a runtime point of view.

(and similarly move the lastCommitMs = now to after maybeCommit returns)

Exactly: we need to first advance now and than update lastCommitMs -- but I would rather not update lastCommitMs outside of maybeCommit, because it breaks encapsulation?

I'm also ok with just advancing now inside maybeCommit. The main thing that felt off was just that we compute the latency for no reason inside maybeCommit, and ignore the result

Yes, that is a little odd -- as I said, happy to change advancedNowAndComputeLatency to now = time.milliseconds() within maybeCommit().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a little odd -- as I said, happy to change advancedNowAndComputeLatency to now = time.milliseconds() within maybeCommit().

Sounds good to me, let's do it

Copy link
Copy Markdown
Member

@ableegoldman ableegoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good enough to me, but I think it could be cleaned up just a bit to avoid calling advanceNowAndComputeLatency just to advance now, but then throwing out the latency. But I'm ok with it as is for now, especially since we want to get this merged quickly

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Feb 11, 2021

Thanks @ableegoldman -- updated the PR. If there are no further concerns raised, I'll merge after Jenkins passed and cherry-pick to 2.8 branch.

Copy link
Copy Markdown
Member

@ableegoldman ableegoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We just need to backport this to 2.8 right (edit: duh, didn't see your last comment)

@mjsax mjsax merged commit d4de383 into apache:trunk Feb 12, 2021
@mjsax mjsax deleted the kafka-12272-fix-commit-latency-metric branch February 12, 2021 01:00
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Feb 12, 2021

Merged to trunk and cherry-picked to 2.8 branch.

ijuma added a commit to ijuma/kafka that referenced this pull request Feb 14, 2021
…e-allocations-lz4

* apache-github/trunk: (118 commits)
  KAFKA-12327: Remove MethodHandle usage in CompressionType (apache#10123)
KAFKA-12297: Make MockProducer return RecordMetadata with values as
per contract
  MINOR: Update zstd and use classes with no finalizers (apache#10120)
KAFKA-12326: Corrected regresion in MirrorMaker 2 executable
introduced with KAFKA-10021 (apache#10122)
KAFKA-12321 the comparison function for uuid type should be 'equals'
rather than '==' (apache#10098)
  MINOR: Add FetchSnapshot API doc in KafkaRaftClient (apache#10097)
  MINOR: KIP-631 KafkaConfig fixes and improvements (apache#10114)
  KAFKA-12272: Fix commit-interval metrics (apache#10102)
  MINOR: Improve confusing admin client shutdown logging (apache#10107)
  MINOR: Add BrokerMetadataListener (apache#10111)
  MINOR: Support Raft-based metadata quorums in system tests (apache#10093)
MINOR: add the MetaLogListener, LocalLogManager, and Controller
interface. (apache#10106)
  MINOR: Introduce the KIP-500 Broker lifecycle manager (apache#10095)
MINOR: Remove always-passing validation in
TestRecordTest#testProducerRecord (apache#9930)
KAFKA-5235: GetOffsetShell: Support for multiple topics and consumer
configuration override (KIP-635) (apache#9430)
MINOR: Prevent creating partition.metadata until ID can be written
(apache#10041)
  MINOR: Add RaftReplicaManager (apache#10069)
MINOR: Add ClientQuotaMetadataManager for processing QuotaRecord
(apache#10101)
  MINOR: Rename DecommissionBrokers to UnregisterBrokers (apache#10084)
MINOR: KafkaBroker.brokerState should be volatile instead of
AtomicReference (apache#10080)
  ...

clients/src/main/java/org/apache/kafka/common/record/CompressionType.java
core/src/test/scala/unit/kafka/coordinator/group/GroupMetadataManagerTest.scala
mjsax added a commit that referenced this pull request Feb 17, 2021
Reviewer: A. Sophie Blee-Goldman <sophie@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants