KAFKA-9274: Mark retries config as deprecated and add new task.timeout.ms config#8864
KAFKA-9274: Mark retries config as deprecated and add new task.timeout.ms config#8864mjsax merged 2 commits intoapache:trunkfrom
retries config as deprecated and add new task.timeout.ms config#8864Conversation
There was a problem hiding this comment.
@cmccabe I think we can keep most of those test as-is until retries is removed -- the ones with @SuppressWarning should be rewritten if retries is removed IMHO, while the ones with @Deprecated should be remove. Let me know if this makes sense or if you think we should rewrite those tests right away.
There was a problem hiding this comment.
Thanks @mjsax. I agree we should keep testing the deprecated option. It does kind of seem like we should rewrite the tests you want to rewrite, now, though. I don't think this scheme of deprecating some tests and suppressing the warning on others is going to be remembered for very long after this is merged.
There was a problem hiding this comment.
We can do that. Wanted to get input from @hachikuji and @cmccabe about it. Frankly, I am not always 100% sure atm how to exactly rewrite the tests and what test coverage is desired?
There was a problem hiding this comment.
I dug into the test and I think I found ways to get rid for retries for the methods that suppress the deprecation warning. Let me know what you think.
There was a problem hiding this comment.
@rhauch @kkonstantine -- We deprecate retries via KIP-572. Not sure if we should rewrite this test right away so it does not use retries any longer?
There was a problem hiding this comment.
Not sure if we should rewrite this test right away
First of all, this is not test code. It's actually part of the Connect runtime.
Second, I'd have to look into what we would do differently and how we would set deliver.timeout.ms to work with the existing logic that handles retries. Since it's not clear, I suggest we keep the current behavior of setting retries=0 as long as that still function. We're using the ProducerConfig.RETRIES constant, so that can't be removed without changing this code.
There was a problem hiding this comment.
@rhauch @kkonstantine As discussed in person, I create a ticket that is marked as blocker for 2.7 release to address this issue: https://issues.apache.org/jira/browse/KAFKA-10297
There was a problem hiding this comment.
Btw: @hachikuji raise a concern about this issue, too: #8864 (review)
I just had one question about the proposal. Using retries=0 in the producer allows the user to have "at-most-once" delivery. This allows the application to implement its own retry logic for example. Do we still have a way to do this once this configuration is gone?
So maybe we need to do some follow up work in the Producer to make it work for Connect. But I would defer this to the follow up work.
My original though was, that setting max.deliver.timeout.ms := request .timeout.ms might prevent internal retries. But we need to verify this. It was also brought to my attention, that this might not work if the network disconnects -- only retries=0 would prevent to re-open the connection but a low timeout would not prevent retries.
In KIP-572, we proposed for Kafka Streams itself, to treat task.timeout.ms = 0 as "no retries" -- maybe we can do a similar thing for the producer?
There is also max.block.ms that we should consider. Unfortunately, I am not an expert on the Producer. But I would like to move forward with KIP-572 for now and are happy to help to resolve those questions.
There was a problem hiding this comment.
@hachikuji @cmccabe @vvcephei -- I think we should deprecate this flag (guess we should update the KIP accordingly) -- Thoughts?
There was a problem hiding this comment.
Updated the KIP and sent a mail to the VOTE thread on the dev list.
There was a problem hiding this comment.
Just some side cleanup. Method does not throw anything
aa74cfa to
7d82538
Compare
retries config as deprecatedretries config as deprecated and add new task.timeout.ms config
abbccdda
left a comment
There was a problem hiding this comment.
Looks like the only remaining question is the deprecation path, and no actual production logic change? If so, I could rely on reviews from other tech leads to unblock.
There was a problem hiding this comment.
I just had one question about the proposal. Using retries=0 in the producer allows the user to have "at-most-once" delivery. This allows the application to implement its own retry logic for example. Do we still have a way to do this once this configuration is gone?
There was a problem hiding this comment.
nit: feels like overkill to deprecate test cases. Since they reference RETRIES directly, I don't think we need to worry about them not getting removed
There was a problem hiding this comment.
I think the deprecation warning was part of a scheme to mark which tests should be rewritten: https://github.com/apache/kafka/pull/8864/files#r439685696
There was a problem hiding this comment.
Correct. To make the build pass, we either need to add a Suppress annotation of just deprecate the test itself. Deprecating the test itself is "cleaner" IMHO.
There was a problem hiding this comment.
Thanks @mjsax. I agree we should keep testing the deprecated option. It does kind of seem like we should rewrite the tests you want to rewrite, now, though. I don't think this scheme of deprecating some tests and suppressing the warning on others is going to be remembered for very long after this is merged.
There was a problem hiding this comment.
I think the deprecation warning was part of a scheme to mark which tests should be rewritten: https://github.com/apache/kafka/pull/8864/files#r439685696
There was a problem hiding this comment.
Did you mean to fill this in as part of this PR?
There was a problem hiding this comment.
Hey, sorry if this is picky, but are these also to be removed when we remove the config? If so, maybe we should just say it on every suppression.
There was a problem hiding this comment.
Not 100% sure yet how the test will be rewritten. My expectation is, that we remove the retries config and also remove the Suppress tag -- however, we can only do this, after streams code was changes.
There was a problem hiding this comment.
Thanks for adding the comment!
There was a problem hiding this comment.
Yes :) Added it as a reminder only first. Was expecting to get review comment anyway.
b58d7a1 to
e6c10d8
Compare
|
Updates this PR. |
vvcephei
left a comment
There was a problem hiding this comment.
Thanks for the update, @mjsax !
There are still multiple comments like TODO revisit in follow up PR. Should we file a blocker to actually follow up on this? Otherwise, I'm afraid we'll just forget about them. If I understand the change correctly, these tests are all using the deprecated retries parameter, which means we're lacking coverage on the non-deprecated path.
Otherwise, it LGTM!
|
Thanks @vvcephei -- I don't think we need tickets. I will use this PR as reference itself to make sure we address all TODO that are added. |
| } | ||
| } | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
| } | ||
| } | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
| private final int retries; | ||
| private final long retryBackOffMs; | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO: remove in follow up PR when `RETRIES` is removed |
| assertEquals(100, returnedProps.get(StreamsConfig.topicPrefix(TopicConfig.SEGMENT_BYTES_CONFIG))); | ||
| } | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
| private MockAdminClient mockAdminClient; | ||
| private InternalTopicManager internalTopicManager; | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
Fixes flakiness in `KafkaAdminClientTest` as a result of #8864. Addresses the following flaky tests: - testAlterReplicaLogDirsPartialFailure - testDescribeLogDirsPartialFailure - testMetadataRetries Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
|
|
||
| private static final String SINK_TOPIC = "streamsResilienceSink"; | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
| }); | ||
| } | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
| public class StreamsOptimizedTest { | ||
|
|
||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
|
|
||
| public class StreamsStandByReplicaTest { | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
| streams.close(Duration.ofSeconds(10)); | ||
| } | ||
|
|
||
| @SuppressWarnings("deprecation") // TODO revisit in follow up PR |
Fixes flakiness in `KafkaAdminClientTest` as a result of apache#8864. Addresses the following flaky tests: - testAlterReplicaLogDirsPartialFailure - testDescribeLogDirsPartialFailure - testMetadataRetries Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
First PR for KIP-572 for public changes only. Producer/Admin client won't have any functional change as they still will respect
retries. For KS, we will get rid or the usage orretriesin follow up PRs.Call for review: @cmccabe @hachikuji @abbccdda @vvcephei