KAFKA-10223; Use NOT_LEADER_OR_FOLLOWER instead of non-retriable REPLICA_NOT_AVAILABLE for consumers by rajinisivaram · Pull Request #8979 · apache/kafka

rajinisivaram · 2020-07-03T16:56:35Z

Brokers currently return NOT_LEADER_FOR_PARTITION to producers and REPLICA_NOT_AVAILABLE to consumers if a replica is not available on the broker during reassignments. Non-Java clients treat REPLICA_NOT_AVAILABLE as a non-retriable exception, Java consumers handle this error by explicitly matching the error code even though it is not an InvalidMetadataException. This PR renames NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER and uses the same error for producers and consumers. This is compatible with both Java and non-Java clients since all clients handle this error code (6) as retriable exception. The PR also makes ReplicaNotAvailableException a subclass of InvalidMetadataException.

ALTER_REPLICA_LOG_DIRS continues to return REPLICA_NOT_AVAILABLE. Retained this for compatibility since this request never returned NOT_LEADER_FOR_PARTITION earlier. Did not deprecate ReplicaNotAvailableException because of this.
MetadataRequest version 0 also returns REPLICA_NOT_AVAILABLE for compatibility. Newer versions filter these out and
return Errors.NONE, so didn't change this.
Partition responses in MetadataRequest return REPLICA_NOT_AVAILABLE to indicate that one of the replicas is not available. Did not change this since NOT_LEADER_FOR_PARTITION is not suitable in this case.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

ijuma · 2020-07-03T20:36:54Z

I think it makes sense to do this, but we also need to figure out a way to:

Make it clear that this is retriable for non Java clients via a KIP
Consider not returning this for older protocol versions.

hachikuji · 2020-07-06T17:10:04Z

I'm not sure the REPLICA_NOT_AVAILABLE error is super helpful as distinguished from NOT_LEADER_FOR_PARTITION. One option here is just to use the latter in all cases.

ijuma · 2020-07-06T17:58:51Z

I think the confusing part is returning NOT_LEADER_FOR_PARTITION if you are trying to fetch from a follower.

ijuma · 2020-07-06T17:59:23Z

Maybe we could rename NOT_LEADER_FOR_PARTITION.

hachikuji · 2020-07-06T18:59:27Z

I agree it's a little weird. Renaming NOT_LEADER is not a bad idea. I think what we want the error to convey is that the recipient is not a valid request target for the given partition. Maybe INVALID_REQUEST_TARGET or something like that.

rajinisivaram · 2020-07-07T09:36:50Z

@ijuma @hachikuji Thank you, I guess we need a KIP anyway even for renaming NOT_LEADER_FOR_PARTITION. We can discuss the options during KIP discussion. My only concern about renaming is that every one knows NOT_LEADER_FOR_PARTITION since it has been there forever, it may take a while to get used to a different name. Perhaps not an issue.

ijuma · 2020-07-07T13:03:54Z

Renaming the error code name has no compatibility implications so in theory it could be done without a KIP. We could make this really boring and safe and call it NOT_LEADER_OR_FOLLOWER. I think this would not cause any confusion since the name is so similar and the length is similar since I trimmed the FOR_PARTITION.

ijuma · 2020-07-07T13:05:32Z

If you both agree that the name above is reasonable, I would suggest sending a follow up to the KIP-320 thread covering what we've learned and how we're addressing it. It would be good to update the KIP text too so that we have something to refer to.

ijuma · 2020-07-07T13:07:41Z

One issue is that the exception NotLeaderForPartitionException is in a public package. We could introduce a subclass, alwayys throw the subclass and deprecate NotLeaderForPartitionException.

rajinisivaram · 2020-07-07T14:21:10Z

@ijuma Sounds good to me. Will see what @hachikuji thinks as well.

ijuma · 2020-07-07T14:24:48Z

Also cc @edenhill as this affects non Java clients too.

ijuma · 2020-07-07T14:31:15Z

We may also want to deprecate the error and exception for ReplicaNotAvailable since it would not serve any purpose (apart from clients retaining compatibility with some brokers), I think.

edenhill · 2020-07-07T14:41:44Z

Renaming to NOT_LEADER_OR_FOLLOWER and reusing that error code in place of REPLICA_NOT_AVAILABLE sounds like a good idea. 👍

hachikuji · 2020-07-07T15:55:46Z

NOT_LEADER_OR_FOLLOWER sounds good to me. If we're looking to attach the change to a KIP, then KIP-392 might be a better option.

ijuma · 2020-07-07T16:06:28Z

KIP-392 sounds good.

rajinisivaram · 2020-07-08T08:52:36Z

@ijuma @hachikuji @edenhill Thanks for the responses. I will follow up on KIP-392 discussion thread and update the PR.

rajinisivaram · 2020-07-10T10:00:34Z

@ijuma @hachikuji Updated PR as discussed.

ijuma

Thanks for the PR. Some quick comments.

ijuma · 2020-07-13T22:18:34Z

Can we remove this now?

ijuma · 2020-07-13T22:19:55Z

Would we want to change this for new IBPs?

Updated to change to NOT_LEADER_OR_FOLLOWER from 2.7 onwards. Also added note to upgrade docs.

rajinisivaram · 2020-07-14T12:36:08Z

@ijuma Thanks for the review. Another usage of REPLICA_NOT_AVAILABLE is in MetadataCache: https://github.com/rajinisivaram/kafka/blob/eaea63ec201fefd32256d75ed308aca311b42d1f/core/src/main/scala/kafka/server/MetadataCache.scala#L105 and a couple of lines below that one. These get returned for partition metadata in metadata responses and NOT_LEADER_OR_FOLLOWER doesn't look suitable. What do you think?

hachikuji · 2020-07-15T23:02:32Z

+ * This could a transient exception during reassignments.
+ */
+@SuppressWarnings("deprecation")
+public class NotLeaderOrFollowerException extends NotLeaderForPartitionException {


One nitpick on the naming. It seems a little strange that followers might return NOT_LEADER_OR_FOLLOWER for requests which are intended for the leader. I think my suggestion to frame the name around the broker not being the intended target of the request was a little more accurate, though I admit the name was clumsier. Perhaps this class would be a good place to document the semantics? Maybe worth mentioning the Produce/Fetch behavior explicitly.

Good point.

Updated javadoc.

hachikuji · 2020-07-15T23:05:25Z

            LeaderNotAvailableException::new),
-    NOT_LEADER_FOR_PARTITION(6, "This server is not the leader for that topic-partition.",
-            NotLeaderForPartitionException::new),
+    NOT_LEADER_OR_FOLLOWER(6, "This server is not the leader or follower for that topic-partition.",


This would also be a good place to be clearer about the interpretation of this error. Maybe something like this?

For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.

hachikuji · 2020-07-15T23:20:19Z

+          case e: NotLeaderOrFollowerException =>
+            warn(s"Unable to alter log dirs for $topicPartition", e)
+            // Retaining REPLICA_NOT_AVAILABLE exception for ALTER_REPLICA_LOG_DIRS for older versions for compatibility
+            (topicPartition, if (config.interBrokerProtocolVersion >= KAFKA_2_7_IV0) Errors.NOT_LEADER_OR_FOLLOWER else Errors.REPLICA_NOT_AVAILABLE)


Hmm.. Wouldn't it make more sense to tie this to a version bump to this protocol rather than the IBP?

Yeah, this was a bad suggestion on my part as I didn't realize that we were using this from clients/tools versus for inter broker protocols. Maybe we should revert to using REPLICA_NOT_AVAILABLE here.

rajinisivaram · 2020-07-16T09:21:07Z

@hachikuji @ijuma Thanks for the reviews, have addressed the comments so far.

hachikuji

LGTM. Just one minor comment.

hachikuji · 2020-07-16T19:04:21Z

 * - {@link Errors#TOPIC_AUTHORIZATION_FAILED} If the user does not have READ access to a requested topic
- * - {@link Errors#REPLICA_NOT_AVAILABLE} If the request is received by a broker which is not a replica
- * - {@link Errors#NOT_LEADER_FOR_PARTITION} If the broker is not a leader and either the provided leader epoch
+ * - {@link Errors#REPLICA_NOT_AVAILABLE} If the request is received by a broker with version 2.4 to 2.6 which is not a replica


Is the note about the range from 2.4 to 2.6 correct? I think this error has always been possible in the case of a reassignment.

Updated to say < 2.6.

ijuma · 2020-07-16T20:59:13Z

    BROKER_NOT_AVAILABLE(8, "The broker is not available.",
            BrokerNotAvailableException::new),
-    REPLICA_NOT_AVAILABLE(9, "The replica is not available for the requested topic-partition.",
+    REPLICA_NOT_AVAILABLE(9, "The replica is not available for the requested topic-partition. This is used for backward compatibility for " +


@hachikuji @rajinisivaram Since this is still used outside of produce/fetch, maybe the backwards compatibility message needs to be qualified?

Updated the doc here and added javadoc for ReplicaNotAvailableException with more detail.

ijuma · 2020-07-16T21:01:40Z

@rajinisivaram With regards to MetadataResponse, it's currently documented as:

/**
 * Possible topic-level error codes:
 *  UnknownTopic (3)
 *  LeaderNotAvailable (5)
 *  InvalidTopic (17)
 *  TopicAuthorizationFailed (29)

 * Possible partition-level error codes:
 *  LeaderNotAvailable (5)
 *  ReplicaNotAvailable (9)
 */

I don't think we should change it, but it does raise the question of whether we should make ReplicaNotAvailable retriable.

hachikuji · 2020-07-16T21:13:00Z

I think it would make sense to change ReplicaNotAvailableException to extend InvalidMetadataException.

rajinisivaram · 2020-07-17T10:41:19Z

@hachikuji @ijuma Thanks for the reviews. I have updated the docs and made ReplicaNotAvailableException a subclass of InvalidMetadataException. Also moved upgrade doc under 2.6. If this doesn't go into 2.6, I will change all the javadocs and upgrade note to say 2.7.

ijuma · 2020-07-17T13:31:14Z

+/**
+ * The replica is not available for the requested topic partition. This may be
+ * a transient exception during reassignments. From version 2.6 onwards, Fetch requests
+ * and other requests intended only for the leader of the topic partition return


Should it be leader or follower her and in the ReplicaNotAvailable message?

@ijuma Thanks for the review, updated.

ijuma

LGTM, thanks.

rajinisivaram · 2020-07-17T19:01:22Z

@ijuma @hachikuji @bob-barrett Thanks for the reviews, merging to trunk and 2.6.

…ICA_NOT_AVAILABLE for consumers (#8979) Brokers currently return NOT_LEADER_FOR_PARTITION to producers and REPLICA_NOT_AVAILABLE to consumers if a replica is not available on the broker during reassignments. Non-Java clients treat REPLICA_NOT_AVAILABLE as a non-retriable exception, Java consumers handle this error by explicitly matching the error code even though it is not an InvalidMetadataException. This PR renames NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER and uses the same error for producers and consumers. This is compatible with both Java and non-Java clients since all clients handle this error code (6) as retriable exception. The PR also makes ReplicaNotAvailableException a subclass of InvalidMetadataException. - ALTER_REPLICA_LOG_DIRS continues to return REPLICA_NOT_AVAILABLE. Retained this for compatibility since this request never returned NOT_LEADER_FOR_PARTITION earlier. - MetadataRequest version 0 also returns REPLICA_NOT_AVAILABLE as topic-level error code for compatibility. Newer versions filter these out and return Errors.NONE, so didn't change this. - Partition responses in MetadataRequest return REPLICA_NOT_AVAILABLE to indicate that one of the replicas is not available. Did not change this since NOT_LEADER_FOR_PARTITION is not suitable in this case. Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Bob Barrett <bob.barrett@confluent.io>

* MINOR: Filter out quota configs for ConfigCommand using --bootstrap-server (apache#9030) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, David Jacot <djacot@confluent.io>, Ron Dagostino <rdagostino@confluent.io> * MINOR: Fix flaky system test assertion after static member fencing (apache#9033) The test case `OffsetValidationTest.test_fencing_static_consumer` fails periodically due to this error: ``` Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 134, in run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 192, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/mark/_mark.py", line 429, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/tests/kafkatest/tests/client/consumer_test.py", line 257, in test_fencing_static_consumer assert len(consumer.dead_nodes()) == num_conflict_consumers AssertionError ``` When a consumer stops, there is some latency between when the shutdown is observed by the service and when the node is added to the dead nodes. This patch fixes the problem by giving some time for the assertion to be satisfied. Reviewers: Boyang Chen <boyang@confluent.io> * MINOR: Improve log4j for per-consumer assignment (apache#8997) Add log4j entry summarizing the assignment (previous owned and assigned) at the consumer level. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io> * KAFKA-10223; Use NOT_LEADER_OR_FOLLOWER instead of non-retriable REPLICA_NOT_AVAILABLE for consumers (apache#8979) Brokers currently return NOT_LEADER_FOR_PARTITION to producers and REPLICA_NOT_AVAILABLE to consumers if a replica is not available on the broker during reassignments. Non-Java clients treat REPLICA_NOT_AVAILABLE as a non-retriable exception, Java consumers handle this error by explicitly matching the error code even though it is not an InvalidMetadataException. This PR renames NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER and uses the same error for producers and consumers. This is compatible with both Java and non-Java clients since all clients handle this error code (6) as retriable exception. The PR also makes ReplicaNotAvailableException a subclass of InvalidMetadataException. - ALTER_REPLICA_LOG_DIRS continues to return REPLICA_NOT_AVAILABLE. Retained this for compatibility since this request never returned NOT_LEADER_FOR_PARTITION earlier. - MetadataRequest version 0 also returns REPLICA_NOT_AVAILABLE as topic-level error code for compatibility. Newer versions filter these out and return Errors.NONE, so didn't change this. - Partition responses in MetadataRequest return REPLICA_NOT_AVAILABLE to indicate that one of the replicas is not available. Did not change this since NOT_LEADER_FOR_PARTITION is not suitable in this case. Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Bob Barrett <bob.barrett@confluent.io> * MINOR: Improved code quality for various files. (apache#9037) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> * MINOR: Fixed some resource leaks. (apache#8922) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com> * MINOR: Enable broker/client compatibility tests for 2.5.0 release - Add missing broker/client compatibility tests for 2.5.0 release Author: Manikumar Reddy <manikumar.reddy@gmail.com> Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com> Closes apache#9041 from omkreddy/compat * KAFKA-10286: Connect system tests should wait for workers to join group (apache#9040) Currently, the system tests `connect_distributed_test` and `connect_rest_test` only wait for the REST api to come up. The startup of the worker includes an asynchronous process for joining the worker group and syncing with other workers. There are some situations in which this sync takes an unusually long time, and the test continues without all workers up. This leads to flakey test failures, as worker joins are not given sufficient time to timeout and retry without waiting explicitly. This changes the `ConnectDistributedTest` to wait for the Joined group message to be printed to the logs before continuing with tests. I've activated this behavior by default, as it's a superset of the checks that were performed by default before. This log message is present in every version of DistributedHerder that I could find, in slightly different forms, but always with `Joined group` at the beginning of the log message. This change should be safe to backport to any branch. Signed-off-by: Greg Harris <gregh@confluent.io> Author: Greg Harris <gregh@confluent.io> Reviewer: Randall Hauch <rhauch@gmail.com> * KAFKA-10295: Wait for connector recovery in test_bounce (apache#9043) Signed-off-by: Greg Harris <gregh@confluent.io> Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com> Co-authored-by: Jason Gustafson <jason@confluent.io> Co-authored-by: Guozhang Wang <wangguoz@gmail.com> Co-authored-by: Leonard Ge <62600326+leonardge@users.noreply.github.com> Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com> Co-authored-by: Greg Harris <gregh@confluent.io>

Nevon · 2021-07-23T06:59:33Z

I was recently made aware of this change, and am looking to make the corresponding change in the NodeJS client tulios/kafkajs#1155.

One thing I don't really understand about this change is how REPLICA_NOT_AVAILABLE could change to being a retriable error. What does this mean for backwards compatibility with clusters pre-2.6? Was it always supposed to be retriable, and this just fixed an oversight, or does it mean that we will now end up retrying an operation that actually isn't retriable on clusters from before 2.6?

rajinisivaram · 2021-07-23T09:48:13Z

@Nevon REPLICA_NOT_AVAILABLE was a transient error during reassignments with older clusters as well and should have been marked retriable. But it was not being handled as a retriable error by clients apart from Java consumers which did handle this specially as a retriable error. With the changes in this PR, we don't return REPLICA_NOT_AVAILABLE to clients except in the cases listed in the PR description. This ensures that both Java and existing non-Java consumers without any changes retry transient errors during reassignment (now NOT_LEADER_OR_FOLLOWER).

Nevon · 2021-07-23T11:34:03Z

That clears it right up. Thanks for the info!

rajinisivaram requested a review from hachikuji July 3, 2020 16:56

rajinisivaram force-pushed the KAFKA-10223-replica-not-available branch from a7b5bf9 to 8dc37b7 Compare July 10, 2020 09:41

rajinisivaram changed the title ~~KAFKA-10223; Make ReplicaNotAvailableException retriable metadata exception~~ KAFKA-10223; Use NOT_LEADER_OR_FOLLOWER instead of non-retriable REPLICA_NOT_AVAILABLE for consumers Jul 10, 2020

rajinisivaram requested a review from ijuma July 10, 2020 10:00

ijuma reviewed Jul 13, 2020

View reviewed changes

rajinisivaram added 2 commits July 14, 2020 13:36

KAFKA-10223; Replace REPLICA_NOT_AVAILABLE with NOT_LEADER_OR_FOLLOWER

116b7b0

Address review comments, remove some more usages of ReplicaNotAvailable

8bcda5e

rajinisivaram force-pushed the KAFKA-10223-replica-not-available branch from 4707714 to 8bcda5e Compare July 14, 2020 12:41

hachikuji reviewed Jul 15, 2020

View reviewed changes

Address review comments

eedd6e2

hachikuji approved these changes Jul 16, 2020

View reviewed changes

ijuma reviewed Jul 16, 2020

View reviewed changes

bob-barrett approved these changes Jul 17, 2020

View reviewed changes

Address review comments

991b61d

ijuma reviewed Jul 17, 2020

View reviewed changes

Address review comment

65d4f62

ijuma approved these changes Jul 17, 2020

View reviewed changes

rajinisivaram merged commit 9c8f75c into apache:trunk Jul 17, 2020

showuon mentioned this pull request Jan 14, 2021

MINOR: replace NotLeaderForPartitionException with NotLeaderOrFollowerException #9885

Merged

3 tasks

Nevon mentioned this pull request Jul 23, 2021

Support retriable REPLICA_NOT_AVAILABLE error tulios/kafkajs#1155

Closed

Conversation

rajinisivaram commented Jul 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

ijuma commented Jul 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hachikuji commented Jul 6, 2020

Uh oh!

ijuma commented Jul 6, 2020

Uh oh!

ijuma commented Jul 6, 2020

Uh oh!

hachikuji commented Jul 6, 2020

Uh oh!

rajinisivaram commented Jul 7, 2020

Uh oh!

ijuma commented Jul 7, 2020

Uh oh!

ijuma commented Jul 7, 2020

Uh oh!

ijuma commented Jul 7, 2020

Uh oh!

rajinisivaram commented Jul 7, 2020

Uh oh!

ijuma commented Jul 7, 2020

Uh oh!

ijuma commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edenhill commented Jul 7, 2020

Uh oh!

hachikuji commented Jul 7, 2020

Uh oh!

ijuma commented Jul 7, 2020

Uh oh!

rajinisivaram commented Jul 8, 2020

Uh oh!

rajinisivaram commented Jul 10, 2020

Uh oh!

ijuma left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajinisivaram commented Jul 14, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajinisivaram commented Jul 16, 2020

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajinisivaram commented Jul 3, 2020 •

edited

Loading

ijuma commented Jul 3, 2020 •

edited

Loading

ijuma commented Jul 7, 2020 •

edited

Loading

ijuma Jul 16, 2020 •

edited

Loading