Skip to content

KAFKA-10223; Use NOT_LEADER_OR_FOLLOWER instead of non-retriable REPLICA_NOT_AVAILABLE for consumers#8979

Merged
rajinisivaram merged 5 commits intoapache:trunkfrom
rajinisivaram:KAFKA-10223-replica-not-available
Jul 17, 2020
Merged

KAFKA-10223; Use NOT_LEADER_OR_FOLLOWER instead of non-retriable REPLICA_NOT_AVAILABLE for consumers#8979
rajinisivaram merged 5 commits intoapache:trunkfrom
rajinisivaram:KAFKA-10223-replica-not-available

Conversation

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram commented Jul 3, 2020

Brokers currently return NOT_LEADER_FOR_PARTITION to producers and REPLICA_NOT_AVAILABLE to consumers if a replica is not available on the broker during reassignments. Non-Java clients treat REPLICA_NOT_AVAILABLE as a non-retriable exception, Java consumers handle this error by explicitly matching the error code even though it is not an InvalidMetadataException. This PR renames NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER and uses the same error for producers and consumers. This is compatible with both Java and non-Java clients since all clients handle this error code (6) as retriable exception. The PR also makes ReplicaNotAvailableException a subclass of InvalidMetadataException.

  • ALTER_REPLICA_LOG_DIRS continues to return REPLICA_NOT_AVAILABLE. Retained this for compatibility since this request never returned NOT_LEADER_FOR_PARTITION earlier. Did not deprecate ReplicaNotAvailableException because of this.
  • MetadataRequest version 0 also returns REPLICA_NOT_AVAILABLE for compatibility. Newer versions filter these out and
    return Errors.NONE, so didn't change this.
  • Partition responses in MetadataRequest return REPLICA_NOT_AVAILABLE to indicate that one of the replicas is not available. Did not change this since NOT_LEADER_FOR_PARTITION is not suitable in this case.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@rajinisivaram rajinisivaram requested a review from hachikuji July 3, 2020 16:56
@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 3, 2020

I think it makes sense to do this, but we also need to figure out a way to:

  1. Make it clear that this is retriable for non Java clients via a KIP
  2. Consider not returning this for older protocol versions.

@hachikuji
Copy link
Copy Markdown
Contributor

I'm not sure the REPLICA_NOT_AVAILABLE error is super helpful as distinguished from NOT_LEADER_FOR_PARTITION. One option here is just to use the latter in all cases.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 6, 2020

I think the confusing part is returning NOT_LEADER_FOR_PARTITION if you are trying to fetch from a follower.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 6, 2020

Maybe we could rename NOT_LEADER_FOR_PARTITION.

@hachikuji
Copy link
Copy Markdown
Contributor

I agree it's a little weird. Renaming NOT_LEADER is not a bad idea. I think what we want the error to convey is that the recipient is not a valid request target for the given partition. Maybe INVALID_REQUEST_TARGET or something like that.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma @hachikuji Thank you, I guess we need a KIP anyway even for renaming NOT_LEADER_FOR_PARTITION. We can discuss the options during KIP discussion. My only concern about renaming is that every one knows NOT_LEADER_FOR_PARTITION since it has been there forever, it may take a while to get used to a different name. Perhaps not an issue.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 7, 2020

Renaming the error code name has no compatibility implications so in theory it could be done without a KIP. We could make this really boring and safe and call it NOT_LEADER_OR_FOLLOWER. I think this would not cause any confusion since the name is so similar and the length is similar since I trimmed the FOR_PARTITION.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 7, 2020

If you both agree that the name above is reasonable, I would suggest sending a follow up to the KIP-320 thread covering what we've learned and how we're addressing it. It would be good to update the KIP text too so that we have something to refer to.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 7, 2020

One issue is that the exception NotLeaderForPartitionException is in a public package. We could introduce a subclass, alwayys throw the subclass and deprecate NotLeaderForPartitionException.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma Sounds good to me. Will see what @hachikuji thinks as well.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 7, 2020

Also cc @edenhill as this affects non Java clients too.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 7, 2020

We may also want to deprecate the error and exception for ReplicaNotAvailable since it would not serve any purpose (apart from clients retaining compatibility with some brokers), I think.

@edenhill
Copy link
Copy Markdown
Contributor

edenhill commented Jul 7, 2020

Renaming to NOT_LEADER_OR_FOLLOWER and reusing that error code in place of REPLICA_NOT_AVAILABLE sounds like a good idea. 👍

@hachikuji
Copy link
Copy Markdown
Contributor

NOT_LEADER_OR_FOLLOWER sounds good to me. If we're looking to attach the change to a KIP, then KIP-392 might be a better option.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 7, 2020

KIP-392 sounds good.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma @hachikuji @edenhill Thanks for the responses. I will follow up on KIP-392 discussion thread and update the PR.

@rajinisivaram rajinisivaram force-pushed the KAFKA-10223-replica-not-available branch from a7b5bf9 to 8dc37b7 Compare July 10, 2020 09:41
@rajinisivaram rajinisivaram changed the title KAFKA-10223; Make ReplicaNotAvailableException retriable metadata exception KAFKA-10223; Use NOT_LEADER_OR_FOLLOWER instead of non-retriable REPLICA_NOT_AVAILABLE for consumers Jul 10, 2020
@rajinisivaram rajinisivaram requested a review from ijuma July 10, 2020 10:00
@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma @hachikuji Updated PR as discussed.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Some quick comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to change this for new IBPs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to change to NOT_LEADER_OR_FOLLOWER from 2.7 onwards. Also added note to upgrade docs.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma Thanks for the review. Another usage of REPLICA_NOT_AVAILABLE is in MetadataCache: https://github.com/rajinisivaram/kafka/blob/eaea63ec201fefd32256d75ed308aca311b42d1f/core/src/main/scala/kafka/server/MetadataCache.scala#L105 and a couple of lines below that one. These get returned for partition metadata in metadata responses and NOT_LEADER_OR_FOLLOWER doesn't look suitable. What do you think?

@rajinisivaram rajinisivaram force-pushed the KAFKA-10223-replica-not-available branch from 4707714 to 8bcda5e Compare July 14, 2020 12:41
* This could a transient exception during reassignments.
*/
@SuppressWarnings("deprecation")
public class NotLeaderOrFollowerException extends NotLeaderForPartitionException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nitpick on the naming. It seems a little strange that followers might return NOT_LEADER_OR_FOLLOWER for requests which are intended for the leader. I think my suggestion to frame the name around the broker not being the intended target of the request was a little more accurate, though I admit the name was clumsier. Perhaps this class would be a good place to document the semantics? Maybe worth mentioning the Produce/Fetch behavior explicitly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated javadoc.

LeaderNotAvailableException::new),
NOT_LEADER_FOR_PARTITION(6, "This server is not the leader for that topic-partition.",
NotLeaderForPartitionException::new),
NOT_LEADER_OR_FOLLOWER(6, "This server is not the leader or follower for that topic-partition.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also be a good place to be clearer about the interpretation of this error. Maybe something like this?

For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.

Comment thread clients/src/main/java/org/apache/kafka/common/protocol/Errors.java
case e: NotLeaderOrFollowerException =>
warn(s"Unable to alter log dirs for $topicPartition", e)
// Retaining REPLICA_NOT_AVAILABLE exception for ALTER_REPLICA_LOG_DIRS for older versions for compatibility
(topicPartition, if (config.interBrokerProtocolVersion >= KAFKA_2_7_IV0) Errors.NOT_LEADER_OR_FOLLOWER else Errors.REPLICA_NOT_AVAILABLE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. Wouldn't it make more sense to tie this to a version bump to this protocol rather than the IBP?

Copy link
Copy Markdown
Member

@ijuma ijuma Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a bad suggestion on my part as I didn't realize that we were using this from clients/tools versus for inter broker protocols. Maybe we should revert to using REPLICA_NOT_AVAILABLE here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@hachikuji @ijuma Thanks for the reviews, have addressed the comments so far.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one minor comment.

* - {@link Errors#TOPIC_AUTHORIZATION_FAILED} If the user does not have READ access to a requested topic
* - {@link Errors#REPLICA_NOT_AVAILABLE} If the request is received by a broker which is not a replica
* - {@link Errors#NOT_LEADER_FOR_PARTITION} If the broker is not a leader and either the provided leader epoch
* - {@link Errors#REPLICA_NOT_AVAILABLE} If the request is received by a broker with version 2.4 to 2.6 which is not a replica
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the note about the range from 2.4 to 2.6 correct? I think this error has always been possible in the case of a reassignment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to say < 2.6.

BROKER_NOT_AVAILABLE(8, "The broker is not available.",
BrokerNotAvailableException::new),
REPLICA_NOT_AVAILABLE(9, "The replica is not available for the requested topic-partition.",
REPLICA_NOT_AVAILABLE(9, "The replica is not available for the requested topic-partition. This is used for backward compatibility for " +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji @rajinisivaram Since this is still used outside of produce/fetch, maybe the backwards compatibility message needs to be qualified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doc here and added javadoc for ReplicaNotAvailableException with more detail.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 16, 2020

@rajinisivaram With regards to MetadataResponse, it's currently documented as:

/**
 * Possible topic-level error codes:
 *  UnknownTopic (3)
 *  LeaderNotAvailable (5)
 *  InvalidTopic (17)
 *  TopicAuthorizationFailed (29)

 * Possible partition-level error codes:
 *  LeaderNotAvailable (5)
 *  ReplicaNotAvailable (9)
 */

I don't think we should change it, but it does raise the question of whether we should make ReplicaNotAvailable retriable.

@hachikuji
Copy link
Copy Markdown
Contributor

I think it would make sense to change ReplicaNotAvailableException to extend InvalidMetadataException.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@hachikuji @ijuma Thanks for the reviews. I have updated the docs and made ReplicaNotAvailableException a subclass of InvalidMetadataException. Also moved upgrade doc under 2.6. If this doesn't go into 2.6, I will change all the javadocs and upgrade note to say 2.7.

/**
* The replica is not available for the requested topic partition. This may be
* a transient exception during reassignments. From version 2.6 onwards, Fetch requests
* and other requests intended only for the leader of the topic partition return
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be leader or follower her and in the ReplicaNotAvailable message?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Thanks for the review, updated.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma @hachikuji @bob-barrett Thanks for the reviews, merging to trunk and 2.6.

@rajinisivaram rajinisivaram merged commit 9c8f75c into apache:trunk Jul 17, 2020
rajinisivaram added a commit that referenced this pull request Jul 17, 2020
…ICA_NOT_AVAILABLE for consumers (#8979)

Brokers currently return NOT_LEADER_FOR_PARTITION to producers and REPLICA_NOT_AVAILABLE to consumers if a replica is not available on the broker during reassignments. Non-Java clients treat REPLICA_NOT_AVAILABLE as a non-retriable exception, Java consumers handle this error by explicitly matching the error code even though it is not an InvalidMetadataException. This PR renames NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER and uses the same error for producers and consumers. This is compatible with both Java and non-Java clients since all clients handle this error code (6) as retriable exception. The PR also makes ReplicaNotAvailableException a subclass of InvalidMetadataException.
    - ALTER_REPLICA_LOG_DIRS continues to return REPLICA_NOT_AVAILABLE. Retained this for compatibility since this request never returned NOT_LEADER_FOR_PARTITION earlier.
   -  MetadataRequest version 0 also returns REPLICA_NOT_AVAILABLE as topic-level error code for compatibility. Newer versions filter these out and return Errors.NONE, so didn't change this.
   - Partition responses in MetadataRequest return REPLICA_NOT_AVAILABLE to indicate that one of the replicas is not available. Did not change this since NOT_LEADER_FOR_PARTITION is not suitable in this case.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Bob Barrett <bob.barrett@confluent.io>
splett2 added a commit to confluentinc/kafka that referenced this pull request Jul 20, 2020
* MINOR: Filter out quota configs for ConfigCommand using --bootstrap-server (apache#9030)

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, David Jacot <djacot@confluent.io>, Ron Dagostino <rdagostino@confluent.io>

* MINOR: Fix flaky system test assertion after static member fencing (apache#9033)

The test case `OffsetValidationTest.test_fencing_static_consumer` fails periodically due to this error:
```
Traceback (most recent call last):
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 134, in run
    data = self.run_test()
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/tests/runner_client.py", line 192, in run_test
    return self.test_context.function(self.test)
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.8-py2.7.egg/ducktape/mark/_mark.py", line 429, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/jenkins/workspace/system-test-kafka_2.6/kafka/tests/kafkatest/tests/client/consumer_test.py", line 257, in test_fencing_static_consumer
    assert len(consumer.dead_nodes()) == num_conflict_consumers
AssertionError
```
When a consumer stops, there is some latency between when the shutdown is observed by the service and when the node is added to the dead nodes. This patch fixes the problem by giving some time for the assertion to be satisfied.

Reviewers: Boyang Chen <boyang@confluent.io>

* MINOR: Improve log4j for per-consumer assignment (apache#8997)

Add log4j entry summarizing the assignment (previous owned and assigned) at the consumer level.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>

* KAFKA-10223; Use NOT_LEADER_OR_FOLLOWER instead of non-retriable REPLICA_NOT_AVAILABLE for consumers (apache#8979)

Brokers currently return NOT_LEADER_FOR_PARTITION to producers and REPLICA_NOT_AVAILABLE to consumers if a replica is not available on the broker during reassignments. Non-Java clients treat REPLICA_NOT_AVAILABLE as a non-retriable exception, Java consumers handle this error by explicitly matching the error code even though it is not an InvalidMetadataException. This PR renames NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER and uses the same error for producers and consumers. This is compatible with both Java and non-Java clients since all clients handle this error code (6) as retriable exception. The PR also makes ReplicaNotAvailableException a subclass of InvalidMetadataException.
    - ALTER_REPLICA_LOG_DIRS continues to return REPLICA_NOT_AVAILABLE. Retained this for compatibility since this request never returned NOT_LEADER_FOR_PARTITION earlier. 
   -  MetadataRequest version 0 also returns REPLICA_NOT_AVAILABLE as topic-level error code for compatibility. Newer versions filter these out and return Errors.NONE, so didn't change this.
   - Partition responses in MetadataRequest return REPLICA_NOT_AVAILABLE to indicate that one of the replicas is not available. Did not change this since NOT_LEADER_FOR_PARTITION is not suitable in this case.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Bob Barrett <bob.barrett@confluent.io>

* MINOR: Improved code quality for various files. (apache#9037)

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

* MINOR: Fixed some resource leaks. (apache#8922)

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

* MINOR: Enable broker/client compatibility tests for 2.5.0 release

- Add missing broker/client compatibility tests for 2.5.0 release

Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes apache#9041 from omkreddy/compat

* KAFKA-10286: Connect system tests should wait for workers to join group (apache#9040)

Currently, the system tests `connect_distributed_test` and `connect_rest_test` only wait for the REST api to come up.
The startup of the worker includes an asynchronous process for joining the worker group and syncing with other workers.
There are some situations in which this sync takes an unusually long time, and the test continues without all workers up.
This leads to flakey test failures, as worker joins are not given sufficient time to timeout and retry without waiting explicitly.

This changes the `ConnectDistributedTest` to wait for the Joined group message to be printed to the logs before continuing with tests. I've activated this behavior by default, as it's a superset of the checks that were performed by default before.

This log message is present in every version of DistributedHerder that I could find, in slightly different forms, but always with `Joined group` at the beginning of the log message. This change should be safe to backport to any branch.

Signed-off-by: Greg Harris <gregh@confluent.io>
Author: Greg Harris <gregh@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>

* KAFKA-10295: Wait for connector recovery in test_bounce (apache#9043)

Signed-off-by: Greg Harris <gregh@confluent.io>

Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com>
Co-authored-by: Jason Gustafson <jason@confluent.io>
Co-authored-by: Guozhang Wang <wangguoz@gmail.com>
Co-authored-by: Leonard Ge <62600326+leonardge@users.noreply.github.com>
Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com>
Co-authored-by: Greg Harris <gregh@confluent.io>
@Nevon
Copy link
Copy Markdown

Nevon commented Jul 23, 2021

I was recently made aware of this change, and am looking to make the corresponding change in the NodeJS client tulios/kafkajs#1155.

One thing I don't really understand about this change is how REPLICA_NOT_AVAILABLE could change to being a retriable error. What does this mean for backwards compatibility with clusters pre-2.6? Was it always supposed to be retriable, and this just fixed an oversight, or does it mean that we will now end up retrying an operation that actually isn't retriable on clusters from before 2.6?

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@Nevon REPLICA_NOT_AVAILABLE was a transient error during reassignments with older clusters as well and should have been marked retriable. But it was not being handled as a retriable error by clients apart from Java consumers which did handle this specially as a retriable error. With the changes in this PR, we don't return REPLICA_NOT_AVAILABLE to clients except in the cases listed in the PR description. This ensures that both Java and existing non-Java consumers without any changes retry transient errors during reassignment (now NOT_LEADER_OR_FOLLOWER).

@Nevon
Copy link
Copy Markdown

Nevon commented Jul 23, 2021

That clears it right up. Thanks for the info!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants