KAFKA-8729, pt 2: Add error_records and error_message to PartitionResponse#7150
KAFKA-8729, pt 2: Add error_records and error_message to PartitionResponse#7150guozhangwang merged 17 commits intoapache:trunkfrom
Conversation
|
@guozhangwang could you take a look at this? |
|
retest this please |
guozhangwang
left a comment
There was a problem hiding this comment.
Made a pass over this PR. Also ping @rayokota @hachikuji for another look.
|
|
||
| override def toString = s"[acksPending: $acksPending, error: ${responseStatus.error.code}, " + | ||
| s"startOffset: ${responseStatus.baseOffset}, requiredOffset: $requiredOffset]" | ||
| s"startOffset: ${responseStatus.baseOffset}, requiredOffset: $requiredOffset, " + |
There was a problem hiding this comment.
I think we do not need to print them here since if the error records / message are ever not null / none there will be no delayer produce generated.
| "If LogAppendTime is used for the topic, the timestamp will be the broker local " + | ||
| "time when the messages are appended."), | ||
| LOG_START_OFFSET_FIELD, | ||
| new Field(ERROR_RECORDS_KEY_NAME, new ArrayOf(INT32)), |
There was a problem hiding this comment.
Why we do not have a description string for error records as well?
|
retest this please |
| import org.apache.kafka.common.errors.ApiException; | ||
|
|
||
| public class InvalidRecordException extends CorruptRecordException { | ||
| public class InvalidRecordException extends ApiException { |
There was a problem hiding this comment.
Hmm.. Isn't this an incompatible change? Does this exception not get exposed anywhere currently? Maybe it would be better to deprecate common/record/InvalidRecordException, but still leave it around.
There was a problem hiding this comment.
Yeah I think InvalidRecordException was not exposed as a public API before, and now we are moving its package as well to make it a public class.
hachikuji
left a comment
There was a problem hiding this comment.
Left a few comments. It's worth considering this in the context of KIP-482. I think we would make these fields optional?
|
I'd recommend we do not block this PR on KIP-482, since that is not merged in yet. If KIP-482 did gets in before the release, we can have another PR incorporating the feature and declare those additional fields as optional. |
|
Other than @hachikuji 's comments and the failed jenkins test, this PR LGTM. |
|
|
|
@guozhangwang I noticed that in |
SGTM, throwing and catching Maybe do a quick call-trace checking to make sure such exception thrown changes does not affect any public facing functions which would then throw different exceptions as well -- I did a quick look and it seems callers / callees are all internal classes, but always better to have another pair of eyes. |
guozhangwang
left a comment
There was a problem hiding this comment.
Nit comments, otherwise LGTM. cc @hachikuji
| * {@link Errors#NOT_LEADER_FOR_PARTITION} (6) | ||
| * {@link Errors#MESSAGE_TOO_LARGE} (10) | ||
| * {@link Errors#INVALID_TOPIC} (17) | ||
| * {@link Errors#INVALID_TOPIC_EXCEPTION} (17) |
There was a problem hiding this comment.
Why we want to add the _EXCEPTION suffix? Others do not have it.
There was a problem hiding this comment.
Someone actually changed it from INVALID_TOPIC to INVALID_TOPIC_EXCEPTION: https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/protocol/Errors.java#L162
There was a problem hiding this comment.
Got it, I think it was in this case long time ago, we can fix it in another PR if we want.
| ]}, | ||
| { "name": "ErrorMessage", "type": "string", "versions": "8+", "default": "", "ignorable": true, | ||
| "about": "The error message of the records that cause the batch to be dropped"} | ||
| "about": "The error message of records that caused the batch to be dropped"} |
There was a problem hiding this comment.
nit: How about The global error message summarizing the common root cause of the records that makes the batch to be dropped.
hachikuji
left a comment
There was a problem hiding this comment.
Thanks, left a few comments.
| "about": "The relative offsets of records that caused the batch to be dropped", "fields": [ | ||
| { "name": "RelativeOffset", "type": "int32", "versions": "8+", | ||
| "about": "The relative offset of the record that cause the batch to be dropped" }, | ||
| { "name": "RelativeOffsetErrorMessage", "type": "string", "versions": "8+", |
There was a problem hiding this comment.
Do we want this to be nullable? Same for ErrorMessage below. We'll need to make a similar change to the old style spec.
There was a problem hiding this comment.
The error message is great, but it can't be used programmatically on the client/app, so we'll also need the 16-bit error code here.
There was a problem hiding this comment.
@edenhill I've thought about this when proposing KIP-467 and the reason I did not try to encode the 16 bit error code is that:
-
At the moment, from all the possible errors that can be returned from the broker, except the InvalidRecordException all others would cause the response to be returned immediately, which means that we would not have more than one error code that would trigger the response to be rejected.
-
For the only possible InvalidRecordException that may be co-exist with other exceptions in the rejected response, that would cause the error records to be trimmed, whereas all other error codes would still cause the whole batch to be dropped. Therefore we've decided to make the broker-side logic to execute the following: when any non-InvalidRecordException is thrown, since that would cause the whole batch to be dropped we would just return that error code on the per-partition and ignore any other InvalidRecordException; if the only exception thrown is InvalidRecordException on specific record(s), then we encode it on per-partition level.
That being said, if people feel that in the future or even now we would like to change the broker-side logic and it's worth paying the 16bits in addition to 8bit for encoding an empty string, then I'm fine doing that in this PR as well.
@hachikuji @tuvtran @junrao please let me know.
There was a problem hiding this comment.
Thanks for explaining your reasoning @guozhangwang.
I think the key here is "At the moment": having an error code field in this response would allow any future error code to be added without having to bump the protocol version, more specifically the range of returned error codes for the per-message errors should be left undefined as this is a means for communication between some future message verification mechanism on the broker and the application producing the data, rather than between the broker and the producer.
For space efficiency I'd suggest using a varint, but I doubt it matters in practice since these errors should be rare, and the two bytes of error code are just a fraction of the error message. And the erroneous Produce response is most likely far smaller than the request, or a fetch response for that matter.
There was a problem hiding this comment.
@edenhill Honestly I'm still not convinced of the future usage of per-record error codes -- atm we have not observed concrete error codes that would split within a single batch that would like to be handled differently by clients. If we do find out that become a common scenario to support we can still always bump up the version, which is not a very costly evolution anyways. On the other hand, since we do not have varint in place yet we would still need to encode 2bytes with the empty string: note that the per-record error message would be commonly empty in practice also.
Another thing worth mentioning is that even if we add the fields now, any new error code beyond InvalidRecord that we'd like to add in the future would still require the clients to upgrade because otherwise they cannot recognize them anyways.
There was a problem hiding this comment.
@guozhangwang Okay, sounds good, I'll rest my case.
| } | ||
|
|
||
| public PartitionResponse(Errors error, long baseOffset, long logAppendTime, long logStartOffset) { | ||
| this(error, baseOffset, logAppendTime, logStartOffset, new HashMap<>(), ""); |
There was a problem hiding this comment.
nit: use Collections.emptyMap()?
|
retest this please |
|
retest this please |
|
retest this please |
guozhangwang
left a comment
There was a problem hiding this comment.
Made another pass, LGTM!
| if (partStruct.hasField(LOG_APPEND_TIME_KEY_NAME)) | ||
| partStruct.set(LOG_APPEND_TIME_KEY_NAME, part.logAppendTime); | ||
| partStruct.setIfExists(LOG_START_OFFSET_FIELD, part.logStartOffset); | ||
|
|
There was a problem hiding this comment.
nit: we can probably do some cleanup in pr.3, e.g. line 269 above can be replaced with setIfExist as well.
As noted in the KIP-467, the updated
ProduceResponseiswith a new error code:
Committer Checklist (excluded from commit message)