KAFKA-3720 : Deprecated BufferExhaustedException and also removed its use and the related sensor metric#1417
KAFKA-3720 : Deprecated BufferExhaustedException and also removed its use and the related sensor metric#1417MayureshGharat wants to merge 3 commits intoapache:trunkfrom
Conversation
|
Thanks @MayureshGharat. During the the request timeout KIP, was there a discussion about a metric to replace the buffer exhausted one? |
|
cc @junrao |
|
@ijuma I don't think the KIP-19 had any discussion about that. Since now we are failing a request if there is not enough memory by default, its the same behavior as BufferExhaustException and we probably should add another metric for this I think to replace the old one. |
|
Thanks for the patch. Could you fix the compilation error? [ant:checkstyle] intellij/kafka/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java:27:8: Unused import - org.apache.kafka.common.metrics.stats.Rate. |
|
@junrao any thoughts on whether we should add a metric to replace the buffer exhausted one? |
|
@ijuma : Yes, instead of removing buffer-exhausted-records sensor, we can probably just update it when BufferPool throws a TimeoutException. |
|
@junrao Thanks. I will upload a new PR. |
c5b726d to
e32ca9a
Compare
There was a problem hiding this comment.
Are all cases of this exception due to buffer exhaustion? I thought that it could also happen in other cases? One option would be to keep BufferExhaustedException and have it inherit from TimeoutException. Thoughts?
There was a problem hiding this comment.
@ijuma Actually yeah, it can even happen when updating metadata. My bad. This seems like a viable solution or we can throw a BuffereExhaustedException from the Bufferpool.allocate() method instead of TimeoutException and have a proper message of what happened. What do you think?
There was a problem hiding this comment.
If I understand you correctly, that's indeed what I had in mind. We throw BufferExhaustedException from BufferPool.allocate. To make it compatible with code that expects a TimeoutException, we make it a subclass of TimeoutException. And then here we catch BufferExhaustedException.
That would mean not deprecating BufferExhaustedException, but updating its documentation to say that it's thrown when a buffer allocation times out.
What do you think @junrao?
There was a problem hiding this comment.
Sounds good to me. Will wait for @junrao to comment.
|
Can we proceed this PR? The idea you two are discussing sounds reasonable to me too, +1. |
|
@MayureshGharat : Sorry for the delay. The approach that you and @ijuma described sounds good to me. |
…related sensor metric
… thrown on BufferExhaustion
e32ca9a to
d137aff
Compare
|
@ijuma I have updated the PR. Would you mind taking another look? |
| this.metrics.sensor("buffer-exhausted-records").record(); | ||
| if (this.interceptors != null) | ||
| this.interceptors.onSendError(record, tp, e); | ||
| throw e; |
There was a problem hiding this comment.
I think this is gonna be a slight spec change. Before this, a TimeoutException thrown either by waiting metadata or by waiting buffer allocation were caught by the following clause for ApiException(since TimeoutException extends RetriableException which is an ApiException), so a FutureFailure returned instead of throwing, and the callback triggered too.
As the result of this change, two TimeoutException cases are treated differently:
- timeout occurred while waiting metadata update => callback called,
FutureFailurereturned instead of throw - timeout occurred while waiting buffer allocation => callback not called, throw exception
This sounds confusing and inconsistent. I think we can either take 1. always return FailureFuture for TimeoutException or 2. always throw for timeout exception.
IMO, by design of KafkaProducer(method blocking is part of interface), the latter one makes more sense to me.
WDYT?
There was a problem hiding this comment.
This is a good point. The current implementation is a bit misleading with regards to the javadoc which states:
* @throws TimeoutException If the time taken for fetching metadata or allocating memory for the record has surpassed <code>max.block.ms</code>.We don't actually throw that exception unless you do Future.get. It seems to me that the change in this PR actually fixes the implementation to match the specified contract.
Another option is to change the javadoc to match the implementation (probably less likely to break users) and then we would simply have a check in the ApiException catch block to record the metric for BufferExhaustedException. This seems safer.
Thoughts?
There was a problem hiding this comment.
Right, it's indeed an option which is much safer.
Still I think we better throw here by following reasons:
- It sounds semantically correct more. Callback and Future should be used for providing result of asynchronous(background) processing. However, these two TimeoutException occurrs while a KafkaProducer is still doing synchronous(foreground) processing and that(kafka producer has to do some foreground processing before it appends record to the accumulator) is the reason why a caller of
producer#sendis forced to wait until the result turns out, so the caller should receive the result of that call in a synchronous way. - Assuming this is going to be shipped with 0.10.2.0, making a breaking change on behavior isn't preferred but allowed if we leave a correct note on "breaking changes" section.
- We can expect this breaking change is relatively less-harm with expect to most users who uses producer in sensitive situation would already using it like below. Users may see some new error logs but it still doesn't break the whole processing(maybe I'm biased, feel free to leave objection if any :D)
try {
producer.send(record, (metadata, exception) -> {
if (exception != null) {
// logging
}
});
} catch (RuntimeException/* or maybe KafkaException, TimeoutException whatever */ e) {
// logging
}
kawamuray
left a comment
There was a problem hiding this comment.
Left just one suggestion.
|
Superseded by #8399. |
BufferExhaustedException is no longerthrown by the new producer. Removed it from the catch clause and deprecated the exception class and removed the corresponding metrics.