KAFKA-5950: AdminClient should retry based on returned error codes#4167
KAFKA-5950: AdminClient should retry based on returned error codes#4167adyach wants to merge 2 commits intoapache:trunkfrom
Conversation
|
FAILURE |
2 similar comments
|
FAILURE |
|
FAILURE |
|
cc @cmccabe |
|
FAILURE |
|
FAILURE |
1 similar comment
|
FAILURE |
|
Thanks for this contribution, @adyach. As you probably guessed, the reason for the original design here is that Kafka actually has no "generic" error response messages that can be deserialized without knowledge of the RPC type. So every RPC error has to be handled by RPC-type-specific code, even if the handling is something that should be common to all RPCs. So the original intention was that all errors should be handled by each Call subclass, in the It might seem obvious that the solution to receiving a retriable exception is always to retry, as this patch does. But unfortunately, that is not quite true. I would divide up the RetriableExceptions like so:
Retrying the exceptions in group number 2 without refreshing the metadata will not help. The exceptions in group 3 should not be retried automatically. |
|
@cmccabe The current design is that all retriable exceptions should be in categories |
Here's one example that I think we can agree on: if someone invokes
Well, the Producer should certainly retry when it gets this exception. I don't think AdminClient can currently receive this exception, so it was a bad example on my part. Similarly, I don't think AdminClient can receive CorruptRecordException or KafkaStorageException currently, either. I just included them in category 3 because I can't think of any reason why retrying would help (we have no process which fixes on-disk corruption right now). |
|
Unfortunately, I don't think the approach in the current patch is going to work. The problem is that most of our RPCs are batch requests, and each element in the batch can fail or succeed independently. For example, let's say you have a If |
|
See #4295 for another approach, which I think will work better. |
|
Closing this since the issue was resolved in a separate patch. |
The PR will add retries for KafkaAdminClient's retriable error responses.