Skip to content

KAFKA-15425: Fail fast in Admin::listOffsets when topic (but not partition) metadata is not found#14314

Merged
jolshan merged 3 commits intoapache:trunkfrom
C0urante:kafka-12879
Sep 7, 2023
Merged

KAFKA-15425: Fail fast in Admin::listOffsets when topic (but not partition) metadata is not found#14314
jolshan merged 3 commits intoapache:trunkfrom
C0urante:kafka-12879

Conversation

@C0urante
Copy link
Copy Markdown
Contributor

@C0urante C0urante commented Aug 30, 2023

Jira

This restores previous behavior for Admin::listOffsets, which was to fail immediately if topic metadata could not be found, and only retry if metadata for one or more specific partitions could not be found.

There is a subtle difference here: prior to #13432, the operation would be retried if any metadata error was reported for any individual topic partition, even if an error was also reported for the entire topic. With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions.

I am not aware of any cases where brokers might return both topic- and topic partition-level errors for a metadata request, and if there are none, then this change should be safe. However, if there are such cases, we may need to refine this PR to remove the discrepancy in behavior.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@C0urante C0urante added the admin AdminClient label Aug 30, 2023
@C0urante C0urante changed the title KAFKA-12879: Fail fast in Admin::listOffsets when topic (but not partition) metadata is not found KAFKA-15425: Fail fast in Admin::listOffsets when topic (but not partition) metadata is not found Aug 31, 2023
@jolshan
Copy link
Copy Markdown
Member

jolshan commented Aug 31, 2023

What was the original rationale for https://issues.apache.org/jira/browse/KAFKA-12339? My understanding is that when a topic is first created, there is sometimes a delay to the metadata propagation. It seems like connect was encountering this issue and rather than adding code to retry they wanted it to be done automatically.

I can see the issue with the compatibility break, but I also see the issue with the metadata propagation. Is there anyway to configure whether we prefer to retry on such an error? I see (tolerateUnknownTopics) but not sure exactly how it works. Or is the argument this should always be left to the client to handle the manual retries?

@C0urante
Copy link
Copy Markdown
Contributor Author

C0urante commented Sep 1, 2023

Thanks @jolshan, great questions!

What was the original rationale for https://issues.apache.org/jira/browse/KAFKA-12339?

That ticket was filed in response to a change in behavior caused by #9780, which altered some parts of Kafka Connect to use an admin client instead of a consumer to get the end offsets of internal topics. Kafka Connect already had support for automatically creating these topics on startup, so the switch to using an admin client to list end offsets for internal topics caused some workers to fail on startup after trying to list end offsets for a topic they'd just created. This is because consumers retried when listing end offsets for topics that were unknown to the broker, whereas admin clients did not.

My understanding is that when a topic is first created, there is sometimes a delay to the metadata propagation. It seems like connect was encountering this issue and rather than adding code to retry they wanted it to be done automatically.

This is correct, although in #11797, we reverted the automatic retry logic for Admin::listOffsets and added it to TopicAdmin::retryEndOffsets (a Connect-specific class and method) instead.

I can see the issue with the compatibility break, but I also see the issue with the metadata propagation. Is there anyway to configure whether we prefer to retry on such an error?

There's no user-facing way to handle this scenario in isolation. Users can obviously tweak generic retry settings for their Kafka clients, but that has other, much-wider implications. The tolerateUnknownTopics field is limited to the internal API we use for the admin client and is just a bit of a hack to allow us to fail fast in a specific scenario (topic is unknown to the broker) in some contexts (trying to list topic offsets using an admin client) but retry in others (everything else using the PartitionLeaderStrategy).

Or is the argument this should always be left to the client to handle the manual retries?

Yes, that's the idea--at least for Admin::listOffsets. Every other API remains unaffected, so, e.g., Producer::send will still retry on attempts to produce to an unknown topic partition (which is its own source of headaches, but that's out of scope for this PR).

To summarize: if a topic has just been created, it's possible that Admin::listOffsets will throw a spurious UnknownTopicOrPartitionException with this change, which is less desirable than an automatic retry. However, if the topic genuinely doesn't exist, Admin::listOffsets throwing that exception immediately instead of retrying until the retry timeout expires is better behavior. And, if the topic has just been created, it's debatable whether it's fair to retry for the full retry timeout anyways, instead of a shorter period. This was one of the points in the final rationale for reverting this behavioral change originally:

For example, a likely root cause of KAFKA-12339 was a Connect worker instantiates its KafkaConfigBackingStore (and other internal topic stores), which creates a KafkaBasedLog that as part of start() creates the topic if it doesn't exist and then immediately tries to read the offsets. That reading of offsets can fail if the metadata for the newly created topic hasn't been propagated to all of the brokers. We can solve this particular root cause easily by retrying the reading of offsets within the KafkaBasedLog's start() method, and since topic metadata should be propagated relatively quickly, we don't need to retry for that long – and most of the time we'd probably successfully retry within a few retries.

@jolshan
Copy link
Copy Markdown
Member

jolshan commented Sep 5, 2023

Sorry for the delay @C0urante and thanks for the explanation. It makes sense.

Given that we moved the retries to TopicAdmin::retryEndOffsets does this mean there is not any known wide usage or need for this retry?

I guess that was sort of the question in the final part of the pr description.

) {
switch (topicError) {
case UNKNOWN_TOPIC_OR_PARTITION:
if (!tolerateUnknownTopics) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the operation would be retried if any metadata error was reported for any individual topic partition, even if an error was also reported for the entire topic. With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions.

This change only changes the behavior for unknown topic or partition errors right? Leader/broker not available continue to be retriable, topic auth, invalid topic, and other errors continue to fail the request?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am not aware of any cases where brokers might return both topic- and topic partition-level errors for a metadata request, and if there are none, then this change should be safe.

Asking because my understanding is the only case where we would be concerned about topic and partition level errors is when the topic level error is unknown topic or partition but somehow the partitions for that topic have a different error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change only changes the behavior for unknown topic or partition errors right? Leader/broker not available continue to be retriable, topic auth, invalid topic, and other errors continue to fail the request?

Exactly, and we have unit testing coverage to demonstrate this.

Asking because my understanding is the only case where we would be concerned about topic and partition level errors is when the topic level error is unknown topic or partition but somehow the partitions for that topic have a different error?

This is correct, and I'm optimistic that no such case exists, but wanted to note the potential change in behavior just to be safe.


try (AdminClientUnitTestEnv env = new AdminClientUnitTestEnv(cluster)) {
env.kafkaClient().setNodeApiVersions(NodeApiVersions.create());
Map<MetadataResponse, Class<? extends Throwable>> responsesAndFailures = new HashMap<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extremely minor nit (we could do in a followup and not a blocker bug) but could we parameterize this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it cleaner than you found it!

Pushed a commit that adds parameterization; if you'd prefer, can revert and publish as a follow-up PR after we merge this one.

@C0urante
Copy link
Copy Markdown
Contributor Author

C0urante commented Sep 7, 2023

No worries about the delay, thanks for continuing to work on this @jolshan!

Given that we moved the retries to TopicAdmin::retryEndOffsets does this mean there is not any known wide usage or need for this retry?

Not that I'm aware of, though if anyone believes otherwise, we can discuss later. Considering the change in behavior that we're targeting here seems to have been made accidentally, I think we should err on the side of preserving existing behavior until we have a rationale for changing it and do so intentionally.

I guess that was sort of the question in the final part of the pr description.

(Discussed further in inline comments)

Arguments.of(
// We fail fast when the entire topic is unknown
Errors.UNKNOWN_TOPIC_OR_PARTITION,
Errors.NONE,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, if the topic also had unknown topic or partition as the partition error message, we would fail fast there too.

With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions.

Would we want to add this test case too just so the behavior is also shown in tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. I've tried to clarify in the comments that these are unusual cases we're testing for lest anyone think that this is normal broker behavior.

Copy link
Copy Markdown
Member

@jolshan jolshan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chris!

@jolshan
Copy link
Copy Markdown
Member

jolshan commented Sep 7, 2023

Also -- I assume we want to pick this to 3.6 as well. Is that the only branch @C0urante?

@jolshan
Copy link
Copy Markdown
Member

jolshan commented Sep 7, 2023

Tests look unrelated and like flakes I've seen before, so I will go ahead and merge + pick to 3.6

@jolshan jolshan merged commit 77a91be into apache:trunk Sep 7, 2023
jolshan pushed a commit that referenced this pull request Sep 7, 2023
…ition) metadata is not found (#14314)

This restores previous behavior for Admin::listOffsets, which was to fail immediately if topic metadata could not be found, and only retry if metadata for one or more specific partitions could not be found.

There is a subtle difference here: prior to #13432, the operation would be retried if any metadata error was reported for any individual topic partition, even if an error was also reported for the entire topic. With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions.

I am not aware of any cases where brokers might return both topic- and topic partition-level errors for a metadata request, and if there are none, then this change should be safe. However, if there are such cases, we may need to refine this PR to remove the discrepancy in behavior.

Reviewers: Justine Olshan <jolshan@confluent.io>
@C0urante
Copy link
Copy Markdown
Contributor Author

C0urante commented Sep 7, 2023

Sorry! Yes, that's correct--3.6 is all we need to backport to. Thanks @jolshan!

@C0urante C0urante deleted the kafka-12879 branch September 11, 2023 14:06
mjsax pushed a commit to confluentinc/kafka that referenced this pull request Nov 22, 2023
…ition) metadata is not found (apache#14314)

This restores previous behavior for Admin::listOffsets, which was to fail immediately if topic metadata could not be found, and only retry if metadata for one or more specific partitions could not be found.

There is a subtle difference here: prior to apache#13432, the operation would be retried if any metadata error was reported for any individual topic partition, even if an error was also reported for the entire topic. With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions.

I am not aware of any cases where brokers might return both topic- and topic partition-level errors for a metadata request, and if there are none, then this change should be safe. However, if there are such cases, we may need to refine this PR to remove the discrepancy in behavior.

Reviewers: Justine Olshan <jolshan@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

admin AdminClient

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants