KAFKA-7211: MM should handle TimeoutException in commitSync by huxihx · Pull Request #5492 · apache/kafka

huxihx · 2018-08-13T05:20:12Z

With KIP-266 introduced, MirrorMaker should handle TimeoutException thrown in commitSync(). Besides, MM should only commit offsets for existsing topics.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

With KIP-266 introduced, MirrorMaker should handle TimeoutException thrown in commitSync(). Besides, MM should only commit offsets for existsing topics.

huxihx · 2018-08-13T05:28:19Z

@hachikuji Please review this patch. Thanks.

huxihx · 2018-08-17T01:09:31Z

@hachikuji Could you take some time to review this patch? Thanks.

omkreddy · 2018-08-17T10:34:29Z


    def commit() {
-      consumer.commitSync(offsets.map { case (tp, offset) =>  (tp, new OffsetAndMetadata(offset, ""))}.asJava)
+      val existingTopics: Set[String] = Try(consumer.listTopics) match {


Should we do this in case of Timeout error and clear offsets map? Checking for every commit maybe costly.

@omkreddy Thanks for the response. Currently, commitSync will retry indefinitely when committing offsets for a nonexistent topic, and user could do nothing to get rid of this. That's why I always check the existence of topics before commitSync is called. Does it make any sense?

With KIP-266, even commitSync will eventually timeout (see the use of default.api.timeout.ms). I prefer @omkreddy's suggestion since listing all the topics in the cluster can be expensive.

With the trunk code, commitSync indefinitely retried instead of throwing TimeoutException when the topic was deleted. Here were the steps I reproduced:

Create a test topic named test

Write a simple consumer to commit offsets for this topic every 3 seconds using commitSync

Delete this topic
Later, the consumer complained "Offset commit failed on partition test-0 at offset...." forever. Did I miss anything?

We should get Timeout error after default.api.timeout.ms (default 60secs).
verified with trunk code and its working as expected.

omkreddy · 2018-08-17T10:35:27Z

+        case Failure(_) => Set.empty
+      }
+      if (existingTopics.nonEmpty)
+        consumer.commitSync(offsets.filterKeys(tp => existingTopics.contains(tp.topic)).map { case (tp, offset) => (tp, new OffsetAndMetadata(offset, "")) }.asJava)


nit: we can use OffsetAndMetadata(long offset) constructor

Also, split to multiple lines. Typical line width is 120 characters.

omkreddy · 2018-08-17T10:35:40Z

+      if (existingTopics.nonEmpty)
+        consumer.commitSync(offsets.filterKeys(tp => existingTopics.contains(tp.topic)).map { case (tp, offset) => (tp, new OffsetAndMetadata(offset, "")) }.asJava)
+      else
+        consumer.commitSync(offsets.map { case (tp, offset) => (tp, new OffsetAndMetadata(offset, "")) }.asJava)


nit: we can use OffsetAndMetadata(long offset) construtor

hachikuji

Thanks for the patch. Do you have any ideas for how we can test the fix?

huxihx · 2018-08-23T06:16:46Z

@hachikuji Please review again. Thanks.

ijuma · 2018-08-23T08:37:16Z

@lindong28 You may want to review this since LinkedIn tends to trigger a bunch of MirrorMaker edge cases.

lindong28 · 2018-08-23T08:46:18Z

@ijuma Sure. I will review this PR.

lindong28 · 2018-08-23T22:16:09Z

@huxihx It seems that the 4 commits in this patch are interleaved with other unrelated commits. Can you rebase the patch so that all commits in this PR are at the top of the branch?

lindong28

Thanks for the patch @huxihx. I have two questions related to the expected behavior when offset commit takes longer than 60 seconds.

lindong28 · 2018-08-23T22:37:14Z

If there is performance issue and request queue time is more than 60 seconds, previously MM will keep waiting until offset commit succeeds. Now the MM just timeout offset commit operation and continue consume/produce messages. This may cause data duplication if commitOffsets(...) is called in onPartitionsRevoked() or if it is called when the MM is shutdown.

Would it make sense to change the default configuration of consumers in MM so that default.api.timeout.ms is set to MAX_LONG to keep old behavior as default behavior?

Going further along this line, in the scenario that offset commit can not finish in 60 seconds in onPartitionsRevoked(), what is the benefit to let MM give up waiting and give the partition ownership to another consumer, instead of keep trying/waiting infinitely?

Good question. I would think this PR does not guarantee the absence of duplicate messages. Two possible points for this PR are: 1. Explicitly capturing TimeoutException and warning give user a hint that they might need to adjust the value of default.api.timeout.ms; 2. Have MM able to remove offsets for already-deleted topics from offsets.

Due to the reason of Point 1 above, I don't think setting default.api.timeout.ms to MAX_LONG is reasonable since the introduction of it does give users the ability to get rid of the indefinitely waiting.

Does it make any sense?

Hey @huxihx, just to clarify, you agree that we should set default.api.timeout.ms to MAX_LONG in MM, right?

BTW, if the default.api.timeout.ms is set to MAX_LONG and a topic is deleted, what is the behavior of offset commit?

@lindong28 Thanks for the review. Firstly, I don't quite agree to set default.api.timeout.ms to MAX since it's sort of against the intention why we introduce it. Secondly, the consumer will complain Offset commit failed on partition test-0 at offset.... forever if default.api.timeout.ms is set to MAX and topic is deleted.

Just to clarify, I am not suggesting that we need to set default.api.timeout.ms to MAX_LONG. I am currently thinking that we may want to use the 60 seconds as timeout value but additionally keep retrying offset commit in e.g. onPartitionsRevoked(), and if MM can not finish offset commit after certain amount of retries, MM should fail fast.

We can discuss based on the reason of offset commit timeout. If the offset commit timeout is due to long request queue time in broker, then even if rebalance completes, the next broker will also not able to commit offset and finally all consumed data will be duplicated. In this case it seems reasonable to just let MM fail fast and let SRE investigate the performance issue in broker and possibly increase the default.api.timeout.ms value.

If offset commit failed due to persistent network error in the given MM host, regardless of whether this MM host timeout or keep retrying, the behavior of this MM host should not affect behavior of other MM. And since this MM will not be able to communicate with other hosts, we would like this MM to fail fast, which will be done by the alternative approach after certain amount of retries.

I have to admit that I cannot figure out a better solution than yours... How about we separate the commit in onPartitionsRevoked from others. That's to say, only fail fast the commit in onPartitionsRevoked but keep the rest unchanged?

@huxihx There are currently three cases where MM does offset commit. These are 1) regular offset commit, 2) onPartitionsRevoked() and 3) MM shutdown.

It seems that user probably also want MM to retry more if offset commit timeout during MM shutdown, since otherwise it just means data duplication which is not good. And it may be reasonable asking user to explicitly kill -9 MM if user decides that offset commit won't happen and they would want to tolerate message duplication.

The motivation for infinite offset commit retries for regularly offset commit is less strong because if a regular offset commit timed out, it does not directly mean message duplication. On the other hand it does increase the impact of message duplication if user finally decides the offset commit can not pass due to constant backlog in the broker side. I am inclined to also keep retries in this case. The cons is that MM may stuck on offset commit forever if broker's request queue time is constantly large. But the resulting degraded performance seems reasonable in the event that broker's performance has degraded. On the other hand message duplication is about correctness and it may cause problem for downstream application if MM accidentally duplicates hours worth of data just due to e.g. misconfiguration or broker performance issue. In this case correctness seems more important than performance.

Hey @huxihx, thanks for all the work! Do you think it is reasonable to retry offset commit infinitely by default in MM per explanation above? I am happy to discuss more.

Sure, I will think about it and resubmit a patch later.

lindong28 · 2018-08-23T22:39:41Z

Related to the concern of data loss, would it be useful to also keep track and log the time of last successful commit in the warning message, so that SRE can gauge how much time worth of data has been duplicated?

huxihx · 2018-08-27T06:27:41Z

@lindong28 Thanks for the comments. Please review again.

lindong28 · 2018-08-29T17:49:39Z

@huxihx Could you rebase your commits so that all commits are consecutively appended on top of the latest trunk? It looks like there are a lot of commits between the first commit and the remaining commits of this patch.

huxihx · 2018-08-30T01:57:35Z

@lindong28 Please review again.

lindong28 · 2018-08-30T05:53:04Z

  }

-  def commitOffsets(consumerWrapper: ConsumerWrapper) {
+  def commitOffsets(consumerWrapper: ConsumerWrapper, retry: Int = Integer.MAX_VALUE): Unit = {


It seems that MirrorMaker. commitOffsets() is public and thus it is public API of an existing tool. Maybe we need to have a KIP in order to change this API. For now it seems simpler to just keep the old compatible behavior such that MM will block infinitely if commit can not pass due something other than topic deletion. If there is good use-case for other behavior than we can have a KIP. Does this sound OK?

lindong28 · 2018-08-30T05:55:11Z

          throw e

+        case _: TimeoutException if retry > 0 =>
+          if (retry == Integer.MAX_VALUE) { // only try to remove offsets for nonexistent topics once


What if topic deletion happens after a few retries? It seems safer to to filter topic for every retry. It is probably OK because the time used to filter topic in MM should be much smaller than the time needed for backoff and the time needed to wait for offset commit to pass, right?

lindong28

Thanks for the update. LGTM. Left only one minor comment.

@hachikuji There is some change in the MM behavior since your last review. Do you want to take another look?

lindong28 · 2018-08-31T17:22:02Z

+            retry += 1
+            warn("Failed to commit offsets because the offset commit request processing can not be completed in time. " +
+              s"If you see this regularly, it could indicate that you need to increase the consumer's ${ConsumerConfig.DEFAULT_API_TIMEOUT_MS_CONFIG} " +
+              s"Last successful offset commit timestamp=${lastSuccessfulCommitTime}, retry count=${retry}")


nits: we can replace ${lastSuccessfulCommitTime} with $lastSuccessfulCommitTime. Same for ${retry}

omkreddy · 2018-09-03T12:12:45Z

@@ -69,6 +70,8 @@ object MirrorMaker extends Logging with KafkaMetricsGroup {
  private var offsetCommitIntervalMs = 0


not related to this PR. Can we update the comments above:
"There are N mirror maker threads each having one KafkaConsumer instance" ?

lindong28 · 2018-09-04T20:57:02Z

@huxihx Thanks much for the PR. LGTM. The PR is merged to trunk.

With KIP-266 introduced, MirrorMaker should handle TimeoutException thrown in commitSync(). Besides, MM should only commit offsets for existsing topics. Author: huxihx <huxi_2b@hotmail.com> Reviewers: Dong Lin <lindong28@gmail.com> Closes apache#5492 from huxihx/KAFKA-7211

KAFKA-7211: MM should handle TimeoutException in commitSync

0defde1

With KIP-266 introduced, MirrorMaker should handle TimeoutException thrown in commitSync(). Besides, MM should only commit offsets for existsing topics.

omkreddy reviewed Aug 17, 2018

View reviewed changes

hachikuji reviewed Aug 20, 2018

View reviewed changes

lindong28 self-assigned this Aug 23, 2018

lindong28 reviewed Aug 23, 2018

View reviewed changes

huxihx force-pushed the KAFKA-7211 branch from a8503ce to b322cd2 Compare August 27, 2018 03:52

Fix incorrect warning messages on tracking failed commits

0228e18

huxihx force-pushed the KAFKA-7211 branch from 4949f73 to 0228e18 Compare August 30, 2018 01:55

lindong28 reviewed Aug 30, 2018

View reviewed changes

address lindong28's comments

b10c6ac

lindong28 reviewed Aug 31, 2018

View reviewed changes

address lindong's comments

b0c33b7

omkreddy reviewed Sep 3, 2018

View reviewed changes

Fixed the incorrect comment for MM

e02bf6e

lindong28 closed this in 79a608b Sep 4, 2018

lhotari mentioned this pull request Dec 14, 2020

Kafka source connector doesn't reconnect after kafka restart apache/pulsar#8343

Closed

		@@ -69,6 +70,8 @@ object MirrorMaker extends Logging with KafkaMetricsGroup {
		private var offsetCommitIntervalMs = 0

Conversation

huxihx commented Aug 13, 2018

Committer Checklist (excluded from commit message)

Uh oh!

huxihx commented Aug 13, 2018

Uh oh!

huxihx commented Aug 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omkreddy Aug 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

huxihx commented Aug 23, 2018

Uh oh!

ijuma commented Aug 23, 2018

Uh oh!

lindong28 commented Aug 23, 2018

Uh oh!

lindong28 commented Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lindong28 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huxihx Aug 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lindong28 Aug 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huxihx commented Aug 27, 2018

Uh oh!

lindong28 commented Aug 29, 2018

Uh oh!

huxihx commented Aug 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lindong28 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

omkreddy Aug 21, 2018 •

edited

Loading

lindong28 commented Aug 23, 2018 •

edited

Loading

huxihx Aug 24, 2018 •

edited

Loading

lindong28 Aug 24, 2018 •

edited

Loading

lindong28 left a comment •

edited

Loading