KAFKA-5075; Defer exception to the next pollOnce() if consumer's fetch position has already increased by lindong28 · Pull Request #2859 · apache/kafka

lindong28 · 2017-04-16T01:15:47Z

No description provided.

…h position has already increased

lindong28 · 2017-04-16T01:18:25Z

In Fetcher.fetchRecords() we iterate over the partition data to collect the ConsumerRecords, after we collect some consumer records from a partition, we advance the position of that partition then move on to the next partition. If the next partition throws exceptions (e.g. OffsetOutOfRangeException), the messages that have already been read out of the buffer will not be delivered to the users. Since the positions of the previous partitions have been be updated, those messages will not be consumed again either.

This patch fixes the problem by deferring exception to the next pollOnce() if consumer's fetch position has already increased.

Ping @becketqin for review.

asfbot · 2017-04-16T01:23:48Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2953/
Test FAILed (JDK 7 and Scala 2.10).

becketqin · 2017-04-16T01:29:47Z

     *         the defaultResetPolicy is NONE
     */
    public Map<TopicPartition, List<ConsumerRecord<K, V>>> fetchedRecords() {
+        if (nextInLineException != null) {


What if the subscription changes and the partition which had exception is not assigned anymore? Or if user seeks to a new position for the partition in question which should no longer throw exception? It seems we need to verify whether the partition state has changed or not before throw the cached exception.

Good point. I have updated the patch to throw exception only if the partition is fetchable and the subscription position equals the fetched offset of the record that causes the exception.

becketqin · 2017-04-16T01:31:24Z

                }
            }
+        } catch (KafkaException e) {
+            if (fetched.isEmpty()) {


Kafka does not use {} when the if statement has only one line.

Sure. Fixed now.

becketqin · 2017-04-16T01:34:39Z

        assertEquals(0, fetcherNoAutoReset.fetchedRecords().size());
    }

+    @Test


Can we add some simple comments. So readers can understand the test more easily?

Sure. I added the following comment:

"verify the advancement in the next fetch offset equals the number of fetched records when some of the fetched partition cause Exception. This ensures that consumer won't lose record upon exception"

asfbot · 2017-04-16T02:01:41Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2958/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-04-16T02:35:52Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2954/
Test FAILed (JDK 8 and Scala 2.12).

becketqin · 2017-04-16T02:53:39Z

@lindong28 The newly added test has failed also. Can you check?

lindong28 · 2017-04-16T03:50:45Z

@becketqin I have fixed the newly added tests. I am still investigating why testPatternSubscriptionMatchingInternalTopicWithDescribeOnlyPermission fails randomly. Can you check if the patch addresses the problem you mentioned?

asfbot · 2017-04-16T03:55:49Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2961/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-04-16T03:58:17Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2956/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-04-16T04:31:33Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2957/
Test FAILed (JDK 8 and Scala 2.12).

becketqin · 2017-04-16T04:34:57Z

    private final Deserializer<V> valueDeserializer;
-
    private PartitionRecords nextInLineRecords = null;
+    private ExceptionMetadata nextInLineExceptionMetadata = null;


I am wondering that would it be simpler if we just peek the first CompletedFetch in the completedFetches, and only remove it after it is parsed. The logic would be:

If parseCompleted(completedFetches.peek()) did not throw exception, remove the CompletedFetch.

If parseCompleted(completedFetches.peek()) threw exception and fetched.isEmpty() == false, catch the exception, return the ConsumerRecords that has already been fetched.

If parseCompleted(completedFetches.peek()) threw exception and fetched.isEmpty() == true, catch the exception, remove the first entry in completedFetches and re-throw the exception.

This way we don't need to cache any result.

But I think we still need to cache these to handle the exception thrown from Fetcher.fetchRecords(), which in turn comes from Fetcher.maybeEnsureValid(). What do you think?

BTW, it is probably OK to cache this exception since we already cache results such as nextInLineRecords.

If an InvalidRecordException is thrown in Fetcher.fetchRecords() the position of that partition won't advance and no message for that partition would be added to fetched either. So it seems we can apply the same rule there as well?

Discussed offline. We will keep the fix as is.

asfbot · 2017-04-16T08:17:37Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2962/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-04-16T08:21:24Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2957/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-04-16T08:53:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2958/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-04-16T17:08:50Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2963/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-04-16T17:09:07Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2959/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-04-16T17:10:54Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2958/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-04-16T18:12:50Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2959/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-04-16T18:13:28Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2964/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-04-16T18:49:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2960/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-04-16T20:13:50Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2967/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-04-16T20:16:03Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2962/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-04-16T20:56:15Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2963/
Test PASSed (JDK 8 and Scala 2.12).

becketqin · 2017-04-17T00:28:23Z

    try {
-      consumer.subscribe(Pattern.compile(".*"), new NoOpConsumerRebalanceListener)
-      consumeRecords(consumer)
+      consumer.subscribe(singletonList(kafka.common.Topic.GroupMetadataTopicName), new NoOpConsumerRebalanceListener)


It seems the test was trying to test the pattern subscription, so we probably want to keep that unchanged.

Sure. It is fixed now.

becketqin · 2017-04-17T00:29:30Z

        assertEquals(0, fetcherNoAutoReset.fetchedRecords().size());
    }

+    @Test


Can we add a unit test about subscription change / offset seek after an exception is cached?

Sure. Added now.

becketqin · 2017-04-17T00:32:15Z

    private final Deserializer<V> valueDeserializer;
-
    private PartitionRecords nextInLineRecords = null;
+    private ExceptionMetadata nextInLineExceptionMetadata = null;


Discussed offline. We will keep the fix as is.

asfbot · 2017-04-17T01:53:21Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2971/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-04-17T01:55:18Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2966/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-04-17T02:27:17Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2967/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-04-17T06:09:31Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2973/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-04-17T06:12:23Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2968/
Test PASSed (JDK 7 and Scala 2.10).

becketqin · 2017-04-17T06:21:57Z

Thanks for the patch. LGTM.

asfbot · 2017-04-17T06:46:44Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2969/
Test PASSed (JDK 8 and Scala 2.12).

hachikuji · 2017-04-17T18:25:59Z

Good catch @lindong28. Thanks for fixing!

lindong28 · 2017-04-17T19:57:13Z

@hachikuji My pleasure :)

…h position has already increased Author: Dong Lin <lindong28@gmail.com> Author: Dong Lin <lindong28@users.noreply.github.com> Reviewers: Jiangjie Qin <becket.qin@gmail.com> Closes #2859 from lindong28/KAFKA-5075 This is a backport patch for 0.10.2 after resolving the following conflict. Conflicts: clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

KAFKA-5075; Defer exception to the next pollOnce() if consumer's fetc…

a0d3583

…h position has already increased

becketqin reviewed Apr 16, 2017

View reviewed changes

lindong28 force-pushed the KAFKA-5075 branch from f413766 to 596066b Compare April 16, 2017 03:08

Address comment

73d5367

lindong28 force-pushed the KAFKA-5075 branch from 596066b to 73d5367 Compare April 16, 2017 03:09

becketqin reviewed Apr 16, 2017

View reviewed changes

lindong28 force-pushed the KAFKA-5075 branch from d8fbcc4 to 0cdf4e6 Compare April 16, 2017 17:23

lindong28 force-pushed the KAFKA-5075 branch from 0cdf4e6 to b10533f Compare April 16, 2017 19:19

Fix test failure

d23cc46

lindong28 force-pushed the KAFKA-5075 branch from 5a5f705 to d23cc46 Compare April 16, 2017 19:22

becketqin reviewed Apr 17, 2017

View reviewed changes

Address comment

29f9ae8

lindong28 force-pushed the KAFKA-5075 branch from a3a1ba0 to 29f9ae8 Compare April 17, 2017 01:01

Update AuthorizerIntegrationTest.scala

3651bf8

asfgit closed this in 17ce2a7 Apr 17, 2017

lindong28 deleted the KAFKA-5075 branch April 17, 2017 19:40

Conversation

lindong28 commented Apr 16, 2017

Uh oh!

lindong28 commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lindong28 Apr 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

becketqin commented Apr 16, 2017

Uh oh!

lindong28 commented Apr 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

becketqin Apr 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

asfbot commented Apr 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lindong28 Apr 16, 2017 •

edited

Loading

lindong28 commented Apr 16, 2017 •

edited

Loading

becketqin Apr 16, 2017 •

edited

Loading