-
Notifications
You must be signed in to change notification settings - Fork 15.1k
KAFKA-9742: Fix broken StandbyTaskEOSIntegrationTest #8330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -82,29 +82,27 @@ public void surviveWithOneTaskAsStandby() throws ExecutionException, Interrupted | |
|
|
||
| final CountDownLatch instanceLatch = new CountDownLatch(1); | ||
|
|
||
| final String stateDirPathOne = stateDirPath + "/" + appId + "-1/"; | ||
| final KafkaStreams streamInstanceOne = | ||
| buildStreamWithDirtyStateDir(appId, stateDirPathOne, instanceLatch); | ||
| try ( | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test isn't logically changed, but I noticed a couple of problems with it that I fixed on the side. |
||
| final KafkaStreams streamInstanceOne = buildStreamWithDirtyStateDir(appId, stateDirPath + "/" + appId + "-1/", instanceLatch); | ||
| final KafkaStreams streamInstanceTwo = buildStreamWithDirtyStateDir(appId, stateDirPath + "/" + appId + "-2/", instanceLatch); | ||
| ) { | ||
|
Comment on lines
+85
to
+88
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I moved this into a try-with-resources so that the instances would get closed even if the assertions fail. |
||
|
|
||
| final String stateDirPathTwo = stateDirPath + "/" + appId + "-2/"; | ||
| final KafkaStreams streamInstanceTwo = | ||
| buildStreamWithDirtyStateDir(appId, stateDirPathTwo, instanceLatch); | ||
|
|
||
| streamInstanceOne.start(); | ||
| streamInstanceOne.start(); | ||
|
|
||
| streamInstanceTwo.start(); | ||
| streamInstanceTwo.start(); | ||
|
|
||
| // Wait for the record to be processed | ||
| assertTrue(instanceLatch.await(15, TimeUnit.SECONDS)); | ||
| // Wait for the record to be processed | ||
| assertTrue(instanceLatch.await(15, TimeUnit.SECONDS)); | ||
|
|
||
| waitForCondition(() -> streamInstanceOne.state().equals(KafkaStreams.State.RUNNING), | ||
| "Stream instance one should be up and running by now"); | ||
| waitForCondition(() -> streamInstanceTwo.state().equals(KafkaStreams.State.RUNNING), | ||
| "Stream instance one should be up and running by now"); | ||
|
|
||
| streamInstanceOne.close(Duration.ofSeconds(30)); | ||
| streamInstanceTwo.close(Duration.ofSeconds(30)); | ||
| waitForCondition(() -> streamInstanceOne.state().equals(KafkaStreams.State.RUNNING), | ||
| "Stream instance one should be up and running by now"); | ||
| waitForCondition(() -> streamInstanceTwo.state().equals(KafkaStreams.State.RUNNING), | ||
| "Stream instance two should be up and running by now"); | ||
|
|
||
| streamInstanceOne.close(Duration.ZERO); | ||
| streamInstanceTwo.close(Duration.ZERO); | ||
|
Comment on lines
+103
to
+104
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Close them both asynchronously so we can tear them down in parallel. The try-with-resources will block until they are really closed. |
||
| } | ||
| } | ||
|
|
||
| private KafkaStreams buildStreamWithDirtyStateDir(final String appId, | ||
|
|
@@ -123,14 +121,14 @@ private KafkaStreams buildStreamWithDirtyStateDir(final String appId, | |
| .write(Collections.singletonMap(new TopicPartition("unknown-topic", 0), 5L)); | ||
|
|
||
| assertTrue(new File(stateDirectory.directoryForTask(taskId), | ||
| "rocksdb/KSTREAM-AGGREGATE-STATE-STORE-0000000001").mkdirs()); | ||
| "rocksdb/KSTREAM-AGGREGATE-STATE-STORE-0000000001").mkdirs()); | ||
|
|
||
| builder.stream(inputTopic, | ||
| Consumed.with(Serdes.Integer(), Serdes.Integer())) | ||
| .groupByKey() | ||
| .count() | ||
| .toStream() | ||
| .peek((key, value) -> recordProcessLatch.countDown()); | ||
| Consumed.with(Serdes.Integer(), Serdes.Integer())) | ||
| .groupByKey() | ||
| .count() | ||
| .toStream() | ||
| .peek((key, value) -> recordProcessLatch.countDown()); | ||
|
|
||
| return new KafkaStreams(builder.build(), props); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix, to return an accurate estimate instead of throwing an exception. I felt a warning was appropriate, given that this does indicate task corruption, or some other unexpected situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we report the lag as the whole log in this case? Even if the log is truncated it is not guaranteed to throw the invalid offset exception and hence task-corruption logic would not necessarily triggered.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know the exact reason, why the offset sum on the client is larger than the end offset sum on the broker? Before we change this invariant we should make sure we understand why it is not satisfied. Is it our code that breaks it or some external influences that we cannot control?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guozhangwang if the end offset is less than the checkpointed offset, how is it possible to not throw a
TaskCorruptedException? I thought that was thrown after checking this exact condition?edit: what I mean is, do we think this is a possible state? If so, we should explicitly check for it and throw
TaskCorruptedif detected. (If not, it's an illegal state and thus the check here is appropriate)Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @vvcephei , if a single task is corrupted and we do hit theTaskCorruptedExceptiondon't we close, wipe, and recreate stores of all tasks in the thread? Seems we would need to keep track of the thread ownership, and if we detect this condition then set the lag for all tasks owned by that thread to theendOffsetedit: nevermind, I checked the
TaskCorruptedExcepionPR, looks like we ended up just setting the invalid offsets to 0 instead of wiping everything. So, disregard this comment 🙂There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, edited my previous comment before refreshing & seeing your response. Makes sense, although I still believe we should be checking for this and throwing
TaskCorruptedif we detect this case (in the owning StreamThread, not in the leader during assignment)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the corruption comes from our code, we should throw. The leader fetches the end offset shortly before that code, so they are most probably up-to-date. If the corruption is caused by failure we should handle it gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think about cases where this could possibly go wrong. We know it must reach or already be in standby/restoring phase. I will use
logEndOffsetfor the actualendOffsetof the partition, andendOffsetfor the offset a standby/restoring task is allowed to restore up to.Newly-assigned: we read the invalid offset from the checkpoint, and determine the endOffset as either
logEndOffset(normal) ormin(logEndOffset, LCO)(optimized-source-changelog). In both cases, theendOffset<=logEndOffsetso the invalid offset should also satisfyoffset>endOffsetwhich we can detect and throw asTaskCorrupted.Already-assigned: in this case, we must have been the ones who wrote the invalid offset and/or have it in our
checkpointableOffsetsmap. But we still need to fetch theendOffsetto know when to stop, so the above applies here as well and we can detect the corruption.In either case, I'd agree the leader should not throw and should just treat that task's offset sum as 0. But, we should make sure that on the other end we actually do detect this case and handle it the way the assignor expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ableegoldman If the log is truncated and then immediately more records are appended to go beyond the original log-end-offset, in old versions we would not throw the exception.
After some thoughts I think it makes sense to report the lag as the whole log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the conversation, all. I think where it nets out for me is that we don't really expect this to happen for non-bugged code, but there are a few edge cases where it could. Thus, logging a warning actually seems reasonable.
Since the graceful handling option is still safe in any case, and since there do exist edge cases where this condition doesn't indicate a bug in Streams, we'll just keep the proposed graceful handling of reporting the lag as the whole log. Either the member in question won't get assigned the task (because it's de-prioritized wrt to other members who report valid positions), and it winds up cleaning up its state dir on its own, or it gets assigned the task, detects the corruption on its side, and recovers appropriately.