KAFKA-12497: Skip periodic offset commits for failed source tasks#10528
KAFKA-12497: Skip periodic offset commits for failed source tasks#10528C0urante merged 4 commits intoapache:trunkfrom
Conversation
|
@ncliang @gharris1727 @kpatelatwork @ddasarathan could one or two of you take a look at this when you have time? |
|
@C0urante overall looks good and very good job on the tests but as I am new to the code, I tried my best to review but I recommend getting a LGTM from one more reviewer also who knows this code better than me to ensure we don't miss something obvious? |
ncliang
left a comment
There was a problem hiding this comment.
Thanks @C0urante . I also agree with @kpatelatwork that it is better to encapsulate the should offset commit logic in WorkerSourceTask . Otherwise I think generally LGTM.
|
@rhauch @tombentley could either of you take a look? It'd be nice to get this merged in time for the upcoming 3.1 release; I know I've seen plenty of people led astray by continued offset commit messages for failed tasks and it'd be great if we could improve their experience. |
tombentley
left a comment
There was a problem hiding this comment.
@C0urante I've taken a very quick initial look. Given the JIRA is explicitly about logging it would be nice if there were assertions in the tests that directly checked for logging. Could we use a LogCaptureAppender to make such assertions?
There was a problem hiding this comment.
The method used in the if is called shouldCommitOffsets, not areThereOffsetsToBeCommitted, so is this message accurate, or perhaps the method name is slightly misleading? Maybe amending the message to "...there are no offsets that should be committed"?
There was a problem hiding this comment.
Sure, the message rewording is fair. Renaming to areThereOffsetsToBeCommitted is a little verbose and although it does technically capture the case where the task's producer has failed to send a record with a non-retriable error, I think it may still be a little misleading.
There was a problem hiding this comment.
Wouldn't assertEquals(shouldFlush, writer.willFlush()) and assertEquals(shouldFlush, writer.beginFlush()) be clearer?
There was a problem hiding this comment.
Yeah, good point. Thought it would make messages clearer for failed assertions but the difference isn't worth the unreadable code.
|
Thanks @tombentley. Interesting point about the log capture appender but it's a little tricky with the current setup. I've hacked together a prototype that tries to provide realistic coverage by simulating genuine offset commits with calls to If the test cases themselves look acceptable, I suspect we may want to either develop a Connect-specific log capture appender (similar to how Connect has its own embedded testing framework that was largely copied from Streams) or move the Streams-specific |
Agreed about he undesirability of the dependency, and if possible a common LogCaptureAppender would be better than have several copies floating about. |
tombentley
left a comment
There was a problem hiding this comment.
@C0urante the extra assertions in the test look reasonable to me.
There was a problem hiding this comment.
Any reason not to use multi-resource try-with-resources?
There was a problem hiding this comment.
The "timed out" part of the error is a little misleading now.
9aafadd to
2ba00bc
Compare
|
I've rebased onto the latest trunk. I think with the latest changes (especially these logging improvements) most of the changes in this PR were made redundant. The only remaining room for improvement IMO is skipping log messages for failed tasks; the other issues (squatting on the source task offset commit thread too long for failed messages to be acknowledged, and misleading users with messages about flushing 0 records) have already been addressed. I've force-pushed a single commit that brings this PR up to date with the latest trunk; going to push an additional commit later this week that addresses the review comments that have been left on it. |
|
Thanks for the review @tombentley, and apologies for the delay. I've addressed your comments and moved the I hope this looks alright given the changes in #11323 that drastically alter the logic touched on here. We're probably past the point for 3.1 so this isn't particularly urgent anymore but I believe it'd still be useful to have at some point if you can spare the time. No rush, though. |
|
CC @vvcephei -- this touches on the |
There was a problem hiding this comment.
Since you've moved the util, do you still need this dependency?
There was a problem hiding this comment.
Good catch, do not need this!
There was a problem hiding this comment.
Interesting. We had an import restriction on pulling in a Streams dependency?
Actually, it looks like the util isn't in this package anymore at all, so this is probably not needed.
There was a problem hiding this comment.
Yep, good call, thanks for catching this.
I could be wrong but I think the import control logic requires everything to be explicitly allowed; don't think there's an explicit restriction on the Streams package name.
tombentley
left a comment
There was a problem hiding this comment.
Thanks @C0urante. I guess we might be able to refactor kafka.utils.LogCaptureAppender to use the moved LogCaptureAppender, but if so we can do that in another PR.
eb3cddc to
e6c5c92
Compare
|
I see 2 committers approved the changes but this was not merged. |
|
This will create other conflicts with #11780. Would it be possible to prioritize that PR and then, once it's merged, fix the conflicts on this one and merge it? |
|
Thanks, that makes sense. I've started reviewing #11780 |
|
@mimaison I've finally gotten around to rebasing this one, mind taking another look? |
|
@tombentley @vvcephei Can you take another look? |
|
@mimaison @vvcephei @tombentley I'd like to merge this in order to unblock #12434, which adds a second use case for the |
|
It's also been proposed in #12434 that we eliminate the Java Any thoughts? |
…ith-resources block
|
Test failures are unrelated; going to merge this as-is. @vvcephei @tombentley Please let me know if you'd like to follow up on this at any point. |
…ache#10528) Also moves the Streams LogCaptureAppender class into the clients module so that it can be used by both Streams and Connect. Reviewers: Nigel Liang <nigel@nigelliang.com>, Kalpesh Patel <kpatel@confluent.io>, John Roesler <vvcephei@apache.org>, Tom Bentley <tbentley@redhat.com>
…ache#10528) Also moves the Streams LogCaptureAppender class into the clients module so that it can be used by both Streams and Connect. Reviewers: Nigel Liang <nigel@nigelliang.com>, Kalpesh Patel <kpatel@confluent.io>, John Roesler <vvcephei@apache.org>, Tom Bentley <tbentley@redhat.com>
…ache#10528) (#55) Also moves the Streams LogCaptureAppender class into the clients module so that it can be used by both Streams and Connect. Reviewers: Nigel Liang <nigel@nigelliang.com>, Kalpesh Patel <kpatel@confluent.io>, John Roesler <vvcephei@apache.org>, Tom Bentley <tbentley@redhat.com> Co-authored-by: Chris Egerton <chrise@aiven.io>
Jira
This change serves two purposes:
Existing unit tests for the
OffsetStorageWriterare tweaked to verify the small change made to it. Several new unit tests are added for theWorkerSourceTaskthat cover various cases where offset commits should not be attempted, and some existing tests are modified to cover cases where offset commits either should or should not be attempted.Committer Checklist (excluded from commit message)