Skip to content

MINOR: Log client disconnect events at INFO level#11449

Merged
hachikuji merged 3 commits intoapache:trunkfrom
hachikuji:increase-disconnect-logging-verbosity
Nov 10, 2021
Merged

MINOR: Log client disconnect events at INFO level#11449
hachikuji merged 3 commits intoapache:trunkfrom
hachikuji:increase-disconnect-logging-verbosity

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

Client disconnects are crucial events for debugging. The fact that we only log them at DEBUG/TRACE means we rarely have them available outside of a testing context. This patch therefore increases verbosity to INFO level. In practice, we already have backoff configurations which should prevent these logs from getting too spammy.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji
Copy link
Copy Markdown
Contributor Author

The main thing I am interested in here is disconnects when there are inflight requests, so we can probably be more selective about the logging if there are any concerns.

Comment thread clients/src/main/java/org/apache/kafka/clients/NetworkClient.java Outdated
@hachikuji hachikuji force-pushed the increase-disconnect-logging-verbosity branch 2 times, most recently from 34cb715 to 3a46c11 Compare October 29, 2021 17:50
@hachikuji
Copy link
Copy Markdown
Contributor Author

It's worth noting that we already have the WARN log here when a connection cannot be established: https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L766. This is probably the most likely case to be spammy. The other additions from this PR are less common situations such as request timeouts and client-side requests to disconnect.

@hachikuji hachikuji force-pushed the increase-disconnect-logging-verbosity branch from 3a46c11 to 4072760 Compare October 29, 2021 17:58
if (connectionStates.isDisconnected(nodeId))
return;

log.info("Client requested disconnect from node {} (invoking callbacks for in-flight requests)", nodeId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we only have two types of callers for this: 1) admin-client proactively disconnecting, where the caller always log an info explaining why, 2) consumer-client async disconnect the coordinator.

If our target is primarily covering 2) here, what about: 1) make this function disconnect(nodeId, reason) where reason is a formatted string; 2) rename the pendingDisconnects (which is only used for coordinator disconnects) as pendingCoordinatorDisconnects and pass in a reason like "since the node was previously recognized as the group coordinator but now it become unknown".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is useful to cover both cases (as well as any future cases). Let me consider adding a disconnect reason.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it a little simpler to add a separate log message for the coordinator case. Let me know if that seems reasonable. In general, I'm not too worried about the verbosity of this message since client-initiated disconnects should be relatively rare (e.g. after session or request timeouts).

Comment thread clients/src/main/java/org/apache/kafka/clients/NetworkClient.java Outdated
for (Map.Entry<String, ChannelState> entry : this.selector.disconnected().entrySet()) {
String node = entry.getKey();
log.debug("Node {} disconnected.", node);
log.info("Node {} disconnected.", node);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me thinking: why not consolidating disconnect to call processDisconnections but have two call paths here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there is room for consolidation, but I was trying to avoid refactoring here. Could we consider this separately?

@guozhangwang
Copy link
Copy Markdown
Contributor

Sounds good. LGTM!

@hachikuji hachikuji merged commit 79d97bd into apache:trunk Nov 10, 2021
@dajac
Copy link
Copy Markdown
Member

dajac commented Nov 10, 2021

@hachikuji Is it worth cherry-picking this one into 3.1 branch as well?

stanislavkozlovski added a commit to stanislavkozlovski/kafka that referenced this pull request Nov 11, 2021
…ntegration-11-nov

* ak/trunk: (15 commits)
  KAFKA-13429: ignore bin on new modules (apache#11415)
  KAFKA-12648: introduce TopologyConfig and TaskConfig for topology-level overrides (apache#11272)
  KAFKA-12487: Add support for cooperative consumer protocol with sink connectors (apache#10563)
  MINOR: Log client disconnect events at INFO level (apache#11449)
  MINOR: Remove topic null check from `TopicIdPartition` and adjust constructor order (apache#11403)
  KAFKA-13417; Ensure dynamic reconfigurations set old config properly (apache#11448)
  MINOR: Adding a constant to denote UNKNOWN leader in LeaderAndEpoch (apache#11477)
  KAFKA-10543: Convert KTable joins to new PAPI (apache#11412)
  KAFKA-12226: Commit source task offsets without blocking on batch delivery (apache#11323)
  KAFKA-13396: Allow create topic without partition/replicaFactor (apache#11429)
  ...
hachikuji added a commit that referenced this pull request Nov 15, 2021
Client disconnects are crucial events for debugging. The fact that we only log them at DEBUG/TRACE means we rarely have them available outside of a testing context. This patch therefore increases verbosity to INFO level. In practice, we already have backoff configurations which should prevent these logs from getting too spammy. 

Reviewers: Guozhang Wang <wangguoz@gmail.com>
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
Client disconnects are crucial events for debugging. The fact that we only log them at DEBUG/TRACE means we rarely have them available outside of a testing context. This patch therefore increases verbosity to INFO level. In practice, we already have backoff configurations which should prevent these logs from getting too spammy. 

Reviewers: Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants