KAFKA-13472: Correct last-committed offsets tracking for sink tasks after partial revocation#11526
Conversation
…fter partial revocation
|
@kkonstantine would you mind taking a look at this 3.1 blocker? There was a small issue in #10563 that causes a regression in some edge cases. |
kkonstantine
left a comment
There was a problem hiding this comment.
Thanks for the fix @C0urante.
LGTM overall, I have a comment on the removal method.
Also, by reading your description, I understand this issue was discovered through system tests. Would it be possible to have run these system tests before merging the original PR? Where these AK system tests or connector system tests?
| topicPartitions.forEach(currentOffsets::remove); | ||
| } | ||
| updatePartitionCount(); | ||
| topicPartitions.forEach(lastCommittedOffsets::remove); |
There was a problem hiding this comment.
Is removeAll a supported operation by the underlying set of the hashmap here? Is it applicable?
(same question above for currentOffsets)
There was a problem hiding this comment.
Ah, neat trick. Ran unit+integration tests locally after making this change and they all passed; looks like the operation is supported, and yes, it does appear applicable.
|
Thanks @kkonstantine. My description may have been a little misleading; to clarify, none of this was caught by automated testing (unit, integration, or system); it was discovered by re-reading the code base while investigating an unrelated failure. I made note of system tests because the fix to the |
…fter partial revocation (#11526) The `WorkerSinkTask.lastCommittedOffsets` field is now added to (via `Map::putAll`) after a successful offset commit, instead of being completely overwritten. In order to prevent this collection from growing indefinitely, elements are removed from it after topic partitions are revoked from the task's consumer. Two test cases are added to `WorkerSinkTaskTest`: - A basic test to verify the "rewind for redelivery" behavior when a task throws an exception from `SinkTask::preCommit`; surprisingly, no existing test cases appear to cover this scenario - A more sophisticated test to verify this same behavior, but with a few rounds of cooperative consumer rebalancing beforehand that expose a bug in the current logic for the `WorkerSinkTask` class The `VerifiableSinkTask` class is also updated to only flush the requested topic partitions in its `flush` method. This is technically unrelated to the issue addressed by this PR and can be moved to a separate PR if necessary; including it here as the original context for identifying this bug was debugging failed system tests and the logic in this part of the tests was originally suspected as a cause of the test failure. Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
…fter partial revocation (#11526) The `WorkerSinkTask.lastCommittedOffsets` field is now added to (via `Map::putAll`) after a successful offset commit, instead of being completely overwritten. In order to prevent this collection from growing indefinitely, elements are removed from it after topic partitions are revoked from the task's consumer. Two test cases are added to `WorkerSinkTaskTest`: - A basic test to verify the "rewind for redelivery" behavior when a task throws an exception from `SinkTask::preCommit`; surprisingly, no existing test cases appear to cover this scenario - A more sophisticated test to verify this same behavior, but with a few rounds of cooperative consumer rebalancing beforehand that expose a bug in the current logic for the `WorkerSinkTask` class The `VerifiableSinkTask` class is also updated to only flush the requested topic partitions in its `flush` method. This is technically unrelated to the issue addressed by this PR and can be moved to a separate PR if necessary; including it here as the original context for identifying this bug was debugging failed system tests and the logic in this part of the tests was originally suspected as a cause of the test failure. Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
|
Thanks @kkonstantine, appreciate the turnaround. |
…fter partial revocation (apache#11526) The `WorkerSinkTask.lastCommittedOffsets` field is now added to (via `Map::putAll`) after a successful offset commit, instead of being completely overwritten. In order to prevent this collection from growing indefinitely, elements are removed from it after topic partitions are revoked from the task's consumer. Two test cases are added to `WorkerSinkTaskTest`: - A basic test to verify the "rewind for redelivery" behavior when a task throws an exception from `SinkTask::preCommit`; surprisingly, no existing test cases appear to cover this scenario - A more sophisticated test to verify this same behavior, but with a few rounds of cooperative consumer rebalancing beforehand that expose a bug in the current logic for the `WorkerSinkTask` class The `VerifiableSinkTask` class is also updated to only flush the requested topic partitions in its `flush` method. This is technically unrelated to the issue addressed by this PR and can be moved to a separate PR if necessary; including it here as the original context for identifying this bug was debugging failed system tests and the logic in this part of the tests was originally suspected as a cause of the test failure. Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
…fter partial revocation (apache#11526) The `WorkerSinkTask.lastCommittedOffsets` field is now added to (via `Map::putAll`) after a successful offset commit, instead of being completely overwritten. In order to prevent this collection from growing indefinitely, elements are removed from it after topic partitions are revoked from the task's consumer. Two test cases are added to `WorkerSinkTaskTest`: - A basic test to verify the "rewind for redelivery" behavior when a task throws an exception from `SinkTask::preCommit`; surprisingly, no existing test cases appear to cover this scenario - A more sophisticated test to verify this same behavior, but with a few rounds of cooperative consumer rebalancing beforehand that expose a bug in the current logic for the `WorkerSinkTask` class The `VerifiableSinkTask` class is also updated to only flush the requested topic partitions in its `flush` method. This is technically unrelated to the issue addressed by this PR and can be moved to a separate PR if necessary; including it here as the original context for identifying this bug was debugging failed system tests and the logic in this part of the tests was originally suspected as a cause of the test failure. Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>
Jira
The
WorkerSinkTask.lastCommittedOffsetsfield is now added to (viaMap::putAll) after a successful offset commit, instead of being completely overwritten. In order to prevent this collection from growing indefinitely, elements are removed from it after topic partitions are revoked from the task's consumer.Two test cases are added to
WorkerSinkTaskTest:SinkTask::preCommit; surprisingly, no existing test cases appear to cover this scenarioWorkerSinkTaskclassThe
VerifiableSinkTaskclass is also updated to only flush the requested topic partitions in itsflushmethod. This is technically unrelated to the issue addressed by this PR and can be moved to a separate PR if necessary; including it here as the original context for identifying this bug was debugging failed system tests and the logic in this part of the tests was originally suspected as a cause of the test failure.Committer Checklist (excluded from commit message)