KAFKA-9999: Make internal topic creation error non-fatal by abbccdda · Pull Request #8677 · apache/kafka

abbccdda · 2020-05-15T22:08:15Z

As of today, the internal topic creation failure could shut down a stream thread. Instead of hard failure, we could take a more conservative approach by triggering another rebalance and retry with the topic creation to avoid a thread death due to unavailability of the broker.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

ableegoldman · 2020-05-15T23:11:23Z

                "You can increase admin client config `retries` to be resilient against this error.", retries);
            log.error(timeoutAndRetryError);
-            throw new StreamsException(timeoutAndRetryError);
+            throw new TaskMigratedException("Time out for creating internal topics", new TimeoutException(timeoutAndRetryError));


For 441 we added a nextScheduledRebalance field to the assignment in order to signal when a followup rebalance is needed. Can we leverage that here as well so we don't have to go through the whole ordeal of onPartitionsLost?
Check out the call to fetchEndOffsetsin StreamsPartitionAssignor#populateClientStatesMap where we schedule a followup rebalance on the leader if the listOffsets request fails. I think we can reuse the same logic/code path and keep track of a general flag like adminClientRequestSuccessful so the assignor can still finish the assignment

Sg, let me check that.

I took a look at the codepath, and have some doubts on whether we could be able to complete the assignment because if the internal topic creations failed, we don't have valid sub partitions to be assigned correct?

Good point.. what if we just call on the FallbackPriorTaskAssignor like we do when listOffsets fails, and then remove any tasks that involve internal topics we failed to create? And schedule the followup rebalance for "immediately"

Actually, I'm not sure we necessarily even need to call on the FallbackPriorTaskAssignor, we just need to schedule the followup and remove the affected tasks from the assignment

I was tempted to say we should just return an empty assignment, which would prompt everyone to rejoin again immediately, but I think the FallbackPriorTaskAssignor is a preferable alternative.

IIUC, we should be able to rely on the precondition that any previously assigned tasks we correctly initialized before they were assigned initially, right? So we know they are all safe to keep working (if possible) while we wait a suitable backoff period before trying to create these topics again.

I could see the idea to instead just remove any tasks we couldn't initialize instead of calling the FallbackPriorTaskAssignor, but if I'm reading this code right, we might just have failed to verify that the topics exist, not only fail to create topics we know didn't exist. So, we might actually remove tasks that were previously assigned if we do this.

It's not clear which strategy is better, since it would depend on the exact nature of the failure, but maybe at a very high level, it's better to continue processing existing work and delay starting new work than potentially to start new work but delay processing existing work.

Or we could try for the "best of both worlds", where we assign the union of all previously assigned tasks and any new tasks we were able to set up.

Finally, even if we re-assign previously assigned tasks, I'm not sure if we actually need/want to use the FallbackPriorTaskAssignor in particular. There doesn't seem to be anything wrong with just computing a new assignment for a subset of the tasks while we also schedule a re-attempt to set up the rest of the tasks after a back-off period.

assign the union of all previously assigned tasks and any new tasks we were able to set up

I was worrying about the case where some internal topics got deleted, and we would cause trouble for the previous owner of the corresponding task. But I suppose if the topic was deleted randomly in the middle of processing, the thread would die anyway, so the odds of the original owner not dying on internal topic deletion is pretty low.

With this strategy, we would at least contain the blast radius to just that current owner since once it dies, that task has no previous owner and would not be assigned. So I'm pretty strongly in favor of this idea. Arguably we could just incorporate this into the existing FallbackPriorTaskAssignor since it will just reduce to the current one in the case all topics have been validated. I'm not sure if that would be more work or less, though.

abbccdda · 2020-05-21T19:41:35Z

Have updated the PR with my current understanding of the proposal @vvcephei @ableegoldman , the part that needs more discussion is on the case for prepareRepartitionTopics which could also fail to create any internal topic as well. Should we continue in that case?

vvcephei · 2020-05-29T01:39:33Z

Hey @abbccdda , I've just recently been in some investigation of these timeouts as part of #8738 , and we're also planning to implement KIP-572 as a general solution to all timeouts that can happen in Streams.

Given the complexities that came to light in the discussion above, and all the edge cases that can happen, I'm wondering if we should really try to be this smart in the assignor.

What do you think about just leaving the current behavior as-is, and then in the future, changing it to throw the TimeoutException out of assign() so that the KIP-572 logic can catch it and gracefully retry from the outer loop? The downside of that approach is that all the instances would be blocked for the whole poll interval, and then they would have to repeat their attempt to join the group.

I'm just concerned that it doesn't sound from the above like we're very sure that any specific choice of tasks is going to be the right one, and if we leave some tasks out of the assignment, it's going to be harder to debug than if we just let the thread crash (for now) or recover holistically (after KIP-572).

WDYT?

abbccdda · 2020-06-03T04:02:44Z

@vvcephei Sounds good, let's wait for KIP-572 PR.

abbccdda added 3 commits May 14, 2020 10:58

Handle task migrated inside corruption path

d2f2667

Merge remote-tracking branch 'upstream/trunk' into trunk

50bedd0

throw task migrated

56b2687

abbccdda force-pushed the KAFKA-9999 branch from 96ba64e to 56b2687 Compare May 15, 2020 22:31

ableegoldman reviewed May 15, 2020

View reviewed changes

experimental implementation

3b5e9f9

abbccdda force-pushed the KAFKA-9999 branch from 2432997 to 3b5e9f9 Compare May 21, 2020 19:40

abbccdda closed this Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-9999: Make internal topic creation error non-fatal#8677

KAFKA-9999: Make internal topic creation error non-fatal#8677
abbccdda wants to merge 4 commits intoapache:trunkfrom
abbccdda:KAFKA-9999

abbccdda commented May 15, 2020

Uh oh!

ableegoldman May 15, 2020

Uh oh!

abbccdda May 15, 2020

Uh oh!

abbccdda May 18, 2020

Uh oh!

ableegoldman May 19, 2020

Uh oh!

ableegoldman May 19, 2020

Uh oh!

vvcephei May 20, 2020

Uh oh!

ableegoldman May 20, 2020

Uh oh!

abbccdda commented May 21, 2020

Uh oh!

vvcephei commented May 29, 2020

Uh oh!

abbccdda commented Jun 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abbccdda commented May 15, 2020

Committer Checklist (excluded from commit message)

Uh oh!

ableegoldman May 15, 2020

Choose a reason for hiding this comment

Uh oh!

abbccdda May 15, 2020

Choose a reason for hiding this comment

Uh oh!

abbccdda May 18, 2020

Choose a reason for hiding this comment

Uh oh!

ableegoldman May 19, 2020

Choose a reason for hiding this comment

Uh oh!

ableegoldman May 19, 2020

Choose a reason for hiding this comment

Uh oh!

vvcephei May 20, 2020

Choose a reason for hiding this comment

Uh oh!

ableegoldman May 20, 2020

Choose a reason for hiding this comment

Uh oh!

abbccdda commented May 21, 2020

Uh oh!

vvcephei commented May 29, 2020

Uh oh!

abbccdda commented Jun 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants