KAFKA-6897: Prevent producer from blocking indefinitely after close#5027
KAFKA-6897: Prevent producer from blocking indefinitely after close#5027hachikuji merged 1 commit intoapache:trunkfrom
Conversation
521ae07 to
594758e
Compare
|
The concept seems good. I'm not sure that IllegalStateException is the right one to throw, since the state is legal and reachable. Maybe something like TimeoutException? |
There was a problem hiding this comment.
Do we need to overload version for this purpose? Would a separate flag work? Maybe a more conventional approach would just use a close() method and have an isClosed field or something like that?
There was a problem hiding this comment.
I thought it might be a bit awkward to have a close() method for a class like Metadata that has no underlying resources really. That said, I also introduced close() for MetadataUpdater so may be we should do the same for Metadata as well. I will update the PR.
|
@cmccabe I agree that |
There was a problem hiding this comment.
Probably not a big deal since awaitUpdate will not block in wait for longer than 100ms, but maybe we may as well call notifyAll()?
There was a problem hiding this comment.
Should Metadata extend Closeable?
There was a problem hiding this comment.
I am wondering if we should throw KafkaException. This is an expected state since the producer is designed to block in send() to await metadata and there is not really any way for a user to avoid it. To be consistent, we could also raise KafkaException from RecordAccumulator in the similar scenario.
There was a problem hiding this comment.
yeah, the existing IllegalStateException is confusing and we should fix it.
002525a to
2406694
Compare
|
Retest this please |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks, left a few comments. Also note the build failure. Might be a good idea to rebase and verify that the patch is still compiling.
There was a problem hiding this comment.
Not sure it makes sense to change this one. In fact, close() should really be idempotent, so maybe we can just remove this check?
There was a problem hiding this comment.
I think it's fine to leave this unchanged since it is only invoked at the start of the mock producer apis.
There was a problem hiding this comment.
Probably not a big deal, but I do see this being called from MockProducer#send for example so it might be worth keeping things consistent by throwing KafkaException as we do when KafkaProducer#send is called after close.
There was a problem hiding this comment.
Not a big deal, but perhaps we could let MetadataUpdater implement Closeable? We can still override close() so that it doesn't throw an exception.
There was a problem hiding this comment.
Don't we need to update this?
There was a problem hiding this comment.
Related to the other comment - Metadata#update throws IllegalStateException when invoked after close.
There was a problem hiding this comment.
Since we have the notify in close(), do we still need this change?
There was a problem hiding this comment.
Good point, reverted.
There was a problem hiding this comment.
Should we use KafkaException here as well?
There was a problem hiding this comment.
Hmm, I feel IllegalStateException is more appropriate in this case. We expect NetworkClient to not invoke Metadata#update after it has called Metadata#close().
There was a problem hiding this comment.
This message seems a little low level for something which will get propagated back to the user. An alternative to consider would be to let awaitUpdate return a boolean indicating whether the update happened or not. That would allow us to raise an exception with a producer-specific message from send().
There was a problem hiding this comment.
I am now catching this exception in KafkaProducer#send and rethrowing with a more appropriate message.
There was a problem hiding this comment.
Does this need to be public?
There was a problem hiding this comment.
I have KafkaProducer calling into this now, so needs to be public.
There was a problem hiding this comment.
Hmm.. The old logic would let us continue fetching in the case of a timeout. Do you think that was not intentional?
There was a problem hiding this comment.
Even if we continued fetching, we would have failed the test at the end. tearDown() checks if we saw any background errors and fails the test if we did, so I thought that this change should be reasonable. Let me know if you think otherwise.
ed4be25 to
efd3e9a
Compare
|
Discussed offline, but we should try and distinguish legitimate illegal state errors when a producer method is called after |
19794a5 to
8e47fe3
Compare
hachikuji
left a comment
There was a problem hiding this comment.
Sorry for the delay. Left a few more comments.
There was a problem hiding this comment.
Can you explain why we need to catch this? It's generally a bad practice to ignore interrupts, so usually we either let the exception propagate or we reset the interrupt so that the caller has a chance to observe it.
There was a problem hiding this comment.
I am rethrowing as KafkaException if the interrupt was because of producer close; close() calls notifyAll() which could interrupt the wait() in this method. Does this seem reasonable?
There was a problem hiding this comment.
The problem is that we are losing the indication that the interrupt has occurred. A caller up the stack may depend on seeing it. I think I would just let the exception be raised in all cases even if the producer is being closed.
There was a problem hiding this comment.
Should Metadata extend Closeable?
There was a problem hiding this comment.
nit: "after producer has been closed"?
There was a problem hiding this comment.
Should we chain the caught exception? We expect this to be the close exception, but it could also be a timeout or an authentication failure. Might be useful to know in some scenarios.
There was a problem hiding this comment.
Since it's trace level anyway, maybe we should just print the stacktrace instead of just the message.
There was a problem hiding this comment.
I know it is from the original code, but asserting the error message seems dubious. Maybe we can just verify the exception type is KafkaException?
1180a2b to
0cf48e4
Compare
|
FAILURE |
|
Retest this please |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks for the updates. Just one additional comment.
There was a problem hiding this comment.
This feels a little brittle. If there is a delay before executing the task, then send() may raise the wrong exception. I think we could make it more reliable by waiting until the topic "test" has been added to Metadata.
There was a problem hiding this comment.
Good point. There is still some degree of uncertainty (even if much smaller than before) so I retained the sleep.
There was a problem hiding this comment.
Can you elaborate? I don't see any point in the code where we would return between adding the topic and awaiting the update.
There was a problem hiding this comment.
Can we use waitForCondition?
There was a problem hiding this comment.
Can you elaborate? I don't see any point in the code where we would return between adding the topic and awaiting the update.
cb3c69d to
a2ba5e2
Compare
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patch.
|
@hachikuji is this ready to be merged? And is it just trunk or 2.0 as well? |
|
@ijuma Yes, I will merge to trunk and 2.0. |
… closed (#5027) After successful completion of KafkaProducer#close, it is possible that an application calls KafkaProducer#send. If the send is invoked for a topic for which we do not have any metadata, the producer will block until `max.block.ms` elapses - we do not expect to receive any metadata update in this case because Sender (and NetworkClient) has already exited. It is only when RecordAccumulator#append is invoked that we notice that the producer has already been closed and throw an exception. If `max.block.ms` is set to Long.MaxValue (or a sufficiently high value in general), the producer could block awaiting metadata indefinitely. This patch makes sure `Metadata#awaitUpdate` periodically checks if the network client has been closed, and if so bails out as soon as possible.
* apache-github/2.0: MINOR: Close ZooKeeperClient if waitUntilConnected fails during construction (apache#5411) KAFKA-6897; Prevent KafkaProducer.send from blocking when producer is closed (apache#5027)
After successful completion of KafkaProducer#close, it is possible that an application calls KafkaProducer#send. If the send is invoked for a topic for which we do not have any metadata, the producer will block until
max.block.mselapses - we do not expect to receive any metadata update in this case because Sender (and NetworkClient) has already exited. It is only when RecordAccumulator#append is invoked that we notice that the producer has already been closed and throw an exception. Ifmax.block.msis set to Long.MaxValue (or a sufficiently high value in general), the producer could block awaiting metadata indefinitely.This patch makes sure
Metadata#awaitUpdateperiodically checks if the network client has been closed, and if so bails out as soon as possible.Committer Checklist (excluded from commit message)