Skip to content

MINOR: Change ordering of checks to prevent log spam on metadata updates#13447

Merged
jolshan merged 2 commits intoapache:trunkfrom
jolshan:reduce-log
Mar 30, 2023
Merged

MINOR: Change ordering of checks to prevent log spam on metadata updates#13447
jolshan merged 2 commits intoapache:trunkfrom
jolshan:reduce-log

Conversation

@jolshan
Copy link
Copy Markdown
Member

@jolshan jolshan commented Mar 24, 2023

On startup, we always update the metadata. The topic ID also goes from null to defined. Move the epoch is null check to before the topic ID check to prevent log spam.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Mar 24, 2023

alternative to:
#12541 and #12378

@jolshan jolshan changed the title MINOR: Change log level for reseting epoch when topic ID is null MINOR: Change ordering of checks to prevent log spam on metadata updates Mar 29, 2023
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jolshan
Copy link
Copy Markdown
Member Author

jolshan commented Mar 30, 2023

Took a look at failures --
kafka.server.metadata.BrokerMetadataPublisherTest.testExceptionInUpdateCoordinator() is failing on master or at least other PR builds

Authorizer test looks flaky and/or unrelated.

@jolshan jolshan merged commit 6d9d65e into apache:trunk Mar 30, 2023
log.info("Resetting the last seen epoch of partition {} to {} since the associated topicId changed from {} to {}",
tp, newEpoch, oldTopicId, topicId);
} else {
log.debug("Resetting the last seen epoch of partition {} to {} since the associated topicId was undefined but is now set to {}",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jolshan Why did you end up discarding this change? I'm seeing lots of noise from my Kafka producers because this is still at info level. The next commit Reorganize logic which from the commit description I wasn't expecting any functional changes, but it got rid of this change to log levels. Was that intended? If this was logged at debug level, it would reduce a lot of noise for my producers. Thanks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change should remove the logging on startup. Which is where the main noise was found (ie, null to having a topic ID)

Are you still seeing that log message? If so which version are you running on?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response @jolshan. I'm seeing this log statement on multiple Kafka producers. It is logged once every 5 minutes, for each topic partition, not just on startup. (Came to my attention because we exhausted our log quota and this was the biggest cause of logs).

Here is an example log that is generated:

[Producer clientId=producer-1] Resetting the last seen epoch of partition my.topic-3 to 1286 since the associated topicId changed from null to lkaiBwHxQ6iQHE49_AVysQ

This is generated every 5 minutes for each topic partition (so a topic with 10 partitions generates 10 log entries). Each time the topic id is updated from null to lkaiBwHxQ6iQHE49_AVysQ. All partitions for the same topic show the same topic id.

Running kafka-client 3.4.0 or 3.5.0 on the producers. Brokers are running Kafka 2.8.1.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. This would not be in the 3.4 release right? Let's make sure it's not 3.5

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for seeing it every 5 minutes -- that's a bit strange. Once the client receives the topic ID in the cache, it should stay. Are you creating new topics, restarting the clients, or moving partitions?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. Not creating new topics, restarting or moving partitions. Looking at our organisation logs across all services, it appears to affect a lot of producers across different teams.

Copy link
Copy Markdown
Member Author

@jolshan jolshan Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite strange. Can you isolate the issue to a single producer? I have not heard of this being an issue outside of starting producers for the first time. Please ensure you are running on 3.5 and let me know if it's still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants