Skip to content

KAFKA-8729, pt 3: Add broker-side logic to handle the case when there are record_errors and error_message#7167

Merged
guozhangwang merged 20 commits intoapache:trunkfrom
tuvtran:kip-467-broker-logic
Oct 10, 2019
Merged

KAFKA-8729, pt 3: Add broker-side logic to handle the case when there are record_errors and error_message#7167
guozhangwang merged 20 commits intoapache:trunkfrom
tuvtran:kip-467-broker-logic

Conversation

@tuvtran
Copy link
Copy Markdown
Contributor

@tuvtran tuvtran commented Aug 6, 2019

All the changes are in ReplicaManager.appendToLocalLog and ReplicaManager.appendRecords. Also, replaced LogAppendInfo. unknownLogAppendInfoWithLogStartOffset with LogAppendInfo. unknownLogAppendInfoWithAdditionalInfo to include those 2 new fields

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@tuvtran
Copy link
Copy Markdown
Contributor Author

tuvtran commented Aug 6, 2019

Depending on #7150 to be merged

@tuvtran tuvtran force-pushed the kip-467-broker-logic branch from 37f1d53 to a42ef2f Compare September 4, 2019 20:34
brokerTopicStats.allTopicsStats.failedProduceRequestRate.mark()
error(s"Error processing append operation on partition $topicPartition", t)
(topicPartition, LogAppendResult(LogAppendInfo.unknownLogAppendInfoWithLogStartOffset(logStartOffset), Some(t)))
(topicPartition, LogAppendResult(LogAppendInfo.unknownLogAppendInfoWithAdditionalInfo(logStartOffset, List(), ""), Some(t)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve this by doing the following:

  1. inside LogValidator, when throwing InvalidTimestampException and InvalidRecordException, encode the relative offset inside the exception object.

  2. then here we can add that relative offset as a singleton to the response.

By doing this on the client side, we can improve our error messaging such that we only tries to drop that single record but not the whole batch.

@tuvtran
Copy link
Copy Markdown
Contributor Author

tuvtran commented Oct 3, 2019

Ready for another round of reviews @guozhangwang . The failed tests passed on my machine so likely they're flaky

@guozhangwang
Copy link
Copy Markdown
Contributor

retest this please.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @hachikuji for another look.

Comment thread clients/src/main/java/org/apache/kafka/common/InvalidRecordException.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java Outdated
Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
Comment thread core/src/test/scala/unit/kafka/server/ProduceRequestTest.scala Outdated
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few comments.

Comment thread clients/src/main/java/org/apache/kafka/common/InvalidRecordException.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/InvalidRecordException.java Outdated
Comment thread core/src/main/scala/kafka/log/LogValidator.scala Outdated
Comment thread core/src/main/scala/kafka/log/LogValidator.scala Outdated
Comment thread core/src/main/scala/kafka/log/LogValidator.scala Outdated
Comment thread core/src/main/scala/kafka/log/LogValidator.scala
@tuvtran tuvtran force-pushed the kip-467-broker-logic branch from 1d3c187 to 4fcfd7a Compare October 7, 2019 21:59
@tuvtran
Copy link
Copy Markdown
Contributor Author

tuvtran commented Oct 7, 2019

I have added a new RecordValidationException that includes a list of ErrorRecord and InvalidTimestampException or InvalidRecordException object. Ready for another round of reviews @guozhangwang @hachikuji

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few more comments.

Comment thread core/src/test/scala/unit/kafka/log/LogValidatorTest.scala
Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java Outdated
Comment thread core/src/main/scala/kafka/log/LogValidator.scala Outdated
Comment thread core/src/main/scala/kafka/log/LogValidator.scala Outdated
Comment thread core/src/main/scala/kafka/log/LogValidator.scala Outdated
Comment thread core/src/main/scala/kafka/server/ReplicaManager.scala Outdated
Comment thread core/src/test/scala/unit/kafka/log/LogValidatorTest.scala
@guozhangwang
Copy link
Copy Markdown
Contributor

retest this please

@tuvtran tuvtran force-pushed the kip-467-broker-logic branch from 28d75a0 to 877cde8 Compare October 8, 2019 22:30
@tuvtran tuvtran changed the title KAFKA-8729, pt 3: Add broker-side logic to handle the case when there are error_records and error_message KAFKA-8729, pt 3: Add broker-side logic to handle the case when there are record_errors and error_message Oct 9, 2019
@tuvtran
Copy link
Copy Markdown
Contributor Author

tuvtran commented Oct 9, 2019

All tests have run and succeeded. Please let me know what you think about the latest changes @guozhangwang @hachikuji


val expectedOffset = expectedInnerOffset.getAndIncrement()

// inner records offset should always be continuous
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch here.

Comment thread core/src/test/scala/unit/kafka/log/LogValidatorTest.scala
@guozhangwang
Copy link
Copy Markdown
Contributor

Just minor comment, otherwise LGTM! Thanks @tuvtran

@guozhangwang guozhangwang merged commit f41a5c2 into apache:trunk Oct 10, 2019
guozhangwang pushed a commit that referenced this pull request Oct 10, 2019
… are record_errors and error_message (#7167)

All the changes are in ReplicaManager.appendToLocalLog and ReplicaManager.appendRecords. Also, replaced LogAppendInfo.unknownLogAppendInfoWithLogStartOffset with LogAppendInfo.unknownLogAppendInfoWithAdditionalInfo to include those 2 new fields.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
@guozhangwang
Copy link
Copy Markdown
Contributor

Merged to trunk, also pushed to 2.4

@tuvtran
Copy link
Copy Markdown
Contributor Author

tuvtran commented Oct 10, 2019

Thanks @hachikuji @guozhangwang for the reviews!

omkreddy pushed a commit that referenced this pull request Nov 2, 2019
…batch does have continuous incremental offsets

#7167 added a check for non-incremental offsets in `assignOffsetsNonCompressed`, which is not applicable for message format V0 and V1. Therefore, I added a condition to disable the check if the record version precedes V2.

Author: Tu Tran <tu@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #7628 from tuvtran/KAFKA-9080
omkreddy pushed a commit that referenced this pull request Nov 2, 2019
…batch does have continuous incremental offsets

#7167 added a check for non-incremental offsets in `assignOffsetsNonCompressed`, which is not applicable for message format V0 and V1. Therefore, I added a condition to disable the check if the record version precedes V2.

Author: Tu Tran <tu@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #7628 from tuvtran/KAFKA-9080
KetkiT pushed a commit to KetkiT/kafka that referenced this pull request Nov 5, 2019
…batch does have continuous incremental offsets

apache#7167 added a check for non-incremental offsets in `assignOffsetsNonCompressed`, which is not applicable for message format V0 and V1. Therefore, I added a condition to disable the check if the record version precedes V2.

Author: Tu Tran <tu@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes apache#7628 from tuvtran/KAFKA-9080
guozhangwang pushed a commit that referenced this pull request Nov 6, 2019
…ds (#7612)

Background:
Currently, whenever a batch is dropped because ofInvalidRecordException or InvalidTimestampException, only the culprit record appears in ProduceResponse.PartitionResponse.recordErrors. However, after users try to resend that batch excluding the rejected message, the latter records are not guaranteed to be free of problems.

Changes:
To address this issue, I changed the function signature of validateKey, validateRecord and validateTimestamp to return a Scala's Option object. Specifically, this object will hold the reason/message the current record in iteration fails and leaves to the callers (convertAndAssignOffsetsNonCompressed, assignOffsetsNonCompressed, validateMessagesAndAssignOffsetsCompressed) to gathered all troubling records into one place. Then, all these records will be returned along with the PartitionResponse object. As a result, if a batch contains more than one record errors, users see exactly which records cause the failure. PartitionResponse.recordErrors is a list of RecordError objects introduced by #7167 which include batchIndex denoting the relative position of a record in a batch and message indicating the reason of failure.

Gotchas:
Things are particularly tricky when a batch has records rejected because of both InvalidRecordException and InvalidTimestampException. In this case, the InvalidTimestampException takes higher precedence. Therefore, the Error field in PartitionResponse will be encoded with INVALID_TIMESTAMP.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants