MINOR: remove unnecessary timeout for admin request by ableegoldman · Pull Request #8738 · apache/kafka

ableegoldman · 2020-05-28T04:12:14Z

Turns out future.get() actually does apply the admin's default.api.timeout.ms config internally, so we don't need to worry about providing a timeout of our own. Who knew

ableegoldman · 2020-05-28T04:13:04Z

    private final static String INTERRUPTED_ERROR_MESSAGE = "Thread got interrupted. This indicates a bug. " +
        "Please report at https://issues.apache.org/jira/projects/KAFKA or dev-mailing list (https://kafka.apache.org/contact).";

-    private static final class InternalAdminClientConfig extends AdminClientConfig {


Moved to ClientUtil

ableegoldman · 2020-05-28T04:16:08Z

                if (nextProbingRebalanceMs.get() < time.milliseconds()) {
                    log.info("Triggering the followup rebalance scheduled for {} ms.", nextProbingRebalanceMs.get());
                    mainConsumer.enforceRebalance();
+                    nextProbingRebalanceMs.set(Long.MAX_VALUE);


This is neither relevant to this PR nor required for correctness, but I noticed the log message above tends to spam the logs in some tests. Since this gets set/reset at the end of every rebalance, we may as well reset it here to avoid an avalanche of Triggering the followup rebalance...

enforceRebalance is guaranteed not to actually run the assignment logic, right? That will only run during a call to poll, I'm hoping. Otherwise, this line should go before the call.

Yep. It just provides a notice to the consumer to enforce that a rebalance will occur on the next poll

chia7712 · 2020-05-28T10:01:08Z

    public static Map<TopicPartition, ListOffsetsResultInfo> fetchEndOffsets(final Collection<TopicPartition> partitions,
                                                                             final Admin adminClient,
-                                                                             final Duration timeout) {
+                                                                             final long timeoutMs) {


It seems to me Duration is more readable than long. Is there a reason to make this change?

I think it's because we're now also calling it right after calling getAdminDefaultApiTimeoutMs, so it seems a bummer to create a Duration from millis and then immediately convert it back to millis.

chia7712 · 2020-05-28T10:06:04Z

+            fetchEndOffsets(
+                allPartitions,
+                adminClient,
+                getAdminDefaultApiTimeoutMs(config)


The admin configs is built in KafkaStreams construction. Could we reuse it?

adminClient = clientSupplier.getAdmin(config.getAdminConfigs(ClientUtils.getSharedAdminClientId(clientId)));

I did it this way since we need to get the admin's default.api.timeout in other places where we only have the streamsConfig, so we may as well just pass that as the argument to getAdminDefaultApiTimeoutMs

We should update the JavaDocs that this method may throw a TimeoutException now. What make we wondering if this is a public API change? Was there any discussion on the original KIP about the behavior of allLocalStorePartitionLags ?

This change might require a KIP... \cc @vvcephei @guozhangwang WDYT?

chia7712 · 2020-05-28T10:07:41Z

-                                                                                           final Admin adminClient) {
-        return fetchEndOffsets(partitions, adminClient, null);
+    public static int getAdminDefaultApiTimeoutMs(final StreamsConfig streamsConfig) {
+        final InternalAdminClientConfig dummyAdmin = new InternalAdminClientConfig(streamsConfig.getAdminConfigs("dummy"));


Could you add comment for the "dummy"?

vvcephei

LGTM! Just a few minor comments.

vvcephei · 2020-05-28T16:29:19Z

    public static Map<TopicPartition, ListOffsetsResultInfo> fetchEndOffsets(final Collection<TopicPartition> partitions,
                                                                             final Admin adminClient,
-                                                                             final Duration timeout) {
+                                                                             final long timeoutMs) {


I think it's because we're now also calling it right after calling getAdminDefaultApiTimeoutMs, so it seems a bummer to create a Duration from millis and then immediately convert it back to millis.

vvcephei · 2020-05-28T16:32:42Z

                mkEntry(StreamsConfig.PROBING_REBALANCE_INTERVAL_MS_CONFIG, "480000"),
                mkEntry(StreamsConfig.InternalConfig.ASSIGNMENT_LISTENER, configuredAssignmentListener),
-                mkEntry(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 9)
+                mkEntry(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, 90_000)


Just curious, why the change from 9 to 90,000?

I mean, because the value has to be larger than the request.timeout.ms which defaults to 30,000

Ah, thanks.

vvcephei · 2020-05-28T17:04:16Z

Test this please

vvcephei · 2020-05-28T19:36:23Z

It looks like all the review comments are addressed, and all the tests passed, so I'll proceed to merge.

vvcephei · 2020-05-28T19:54:32Z

Oh, actually @ableegoldman , it looks like there was a conflict with the other PR I just merged.

vvcephei · 2020-05-28T19:56:20Z

I just resolved the conflicts.

vvcephei · 2020-05-28T19:56:31Z

Test this please

vvcephei · 2020-05-28T19:56:35Z

Ok to test

vvcephei · 2020-05-28T19:56:41Z

Retest this please

vvcephei · 2020-05-28T19:56:52Z

... or not

vvcephei · 2020-05-28T20:28:22Z

Test this please

vvcephei · 2020-05-28T20:28:29Z

Test this please

vvcephei · 2020-05-28T20:28:34Z

Test this please

vvcephei · 2020-05-28T20:28:40Z

Retest this please

vvcephei · 2020-05-28T20:56:41Z

Test this please

vvcephei · 2020-05-28T20:56:54Z

\o/

mjsax · 2020-05-28T21:01:51Z

        EasyMock.expect(adminClient.listOffsets(EasyMock.anyObject())).andThrow(new RuntimeException());
        replay(adminClient);
-        assertThrows(StreamsException.class, () ->  fetchEndOffsetsWithoutTimeout(emptyList(), adminClient));
+        assertThrows(StreamsException.class, () ->  fetchEndOffsets(emptyList(), adminClient, 60_000L));


Should we pass in MAX_VALUE to avoid introducing test flakyness?

mjsax · 2020-05-28T21:02:43Z

        EasyMock.expect(adminClient.listOffsets(EasyMock.anyObject())).andStubReturn(result);
        EasyMock.expect(result.all()).andStubReturn(allFuture);
-        EasyMock.expect(allFuture.get()).andThrow(new InterruptedException());
+        EasyMock.expect(allFuture.get(60000L, TimeUnit.MILLISECONDS)).andThrow(new InterruptedException());


As above (also below)

Also nit: 60_000L (if just 60 and TimeUnit.SECONDS?)

…hub.com/ableegoldman/kafka into MINOR-remove-extra-admin-timeout-config

vvcephei

Thanks for the update @ableegoldman

vvcephei · 2020-05-28T22:06:41Z

Test this please

vvcephei · 2020-05-28T22:06:50Z

Test this please

mjsax · 2020-05-28T22:13:33Z


        log.debug("Current changelog positions: {}", allChangelogPositions);
-        final Map<TopicPartition, ListOffsetsResultInfo> allEndOffsets = fetchEndOffsetsWithoutTimeout(allPartitions, adminClient);
+        final Map<TopicPartition, ListOffsetsResultInfo> allEndOffsets = fetchEndOffsets(allPartitions, adminClient);


Can we please update the JavaDocs of allLocalStorePartitionLags to state that a StreamsException could be thrown?

mjsax · 2020-05-28T22:17:51Z

Checkstyle error:

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.12/streams/src/test/java/org/apache/kafka/streams/processor/internals/StreamsPartitionAssignorTest.java:21:8: Unused import - org.apache.kafka.clients.admin.AdminClientConfig. [UnusedImports]

mjsax · 2020-05-28T22:35:20Z

Retest this please.

mjsax · 2020-05-28T22:35:49Z

Retest this please.

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>

mjsax · 2020-05-29T01:37:46Z

Merged to trunk and cherry-picked to 2.6. Do we want this in 2.5, too? If yes, we would need a new PR. Not possible to cherry-pick.

ableegoldman · 2020-05-29T01:54:22Z

Thanks! I actually don't think there ended up being anything relevant t o2.5 in the final form of this PR. Except maybe adding @throws StreamsException to the allLocalStorePartitionLags javadocs, but not sure that warrants an entire PR

mjsax · 2020-05-29T18:52:35Z

Ack. In 2.5 we use the AdminClient directly in allLocalStorePartitionLags and don't apply a timeout on get(). -- Might still be worth to do quick PR to update the JavaDocs :)

* 'trunk' of github.com:apache/kafka: (36 commits) Remove redundant `containsKey` call in KafkaProducer (apache#8761) KAFKA-9494; Include additional metadata information in DescribeConfig response (KIP-569) (apache#8723) KAFKA-10061; Fix flaky `ReassignPartitionsIntegrationTest.testCancellation` (apache#8749) KAFKA-9130; KIP-518 Allow listing consumer groups per state (apache#8238) KAFKA-9501: convert between active and standby without closing stores (apache#8248) KAFKA-10056; Ensure consumer metadata contains new topics on subscription change (apache#8739) MINOR: Log the reason for coordinator discovery failure (apache#8747) KAFKA-10029; Don't update completedReceives when channels are closed to avoid ConcurrentModificationException (apache#8705) MINOR: remove unnecessary timeout for admin request (apache#8738) MINOR: Relax Percentiles test (apache#8748) MINOR: regression test for task assignor config (apache#8743) MINOR: Update documentation.html to refer to 2.6 (apache#8745) MINOR: Update documentation.html to refer to 2.5 (apache#8744) KAFKA-9673: Filter and Conditional SMTs (apache#8699) KAFKA-9971: Error Reporting in Sink Connectors (KIP-610) (apache#8720) KAFKA-10052: Harden assertion of topic settings in Connect integration tests (apache#8735) MINOR: Slight MetadataCache tweaks to avoid unnecessary work (apache#8728) KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness (apache#8736) KAFKA-10050: kafka_log4j_appender.py fixed for JDK11 (apache#8731) KAFKA-9146: Add option to force delete active members in StreamsResetter (apache#8589) ... # Conflicts: # core/src/main/scala/kafka/log/Log.scala

ableegoldman · 2020-06-01T17:50:16Z

Fair enough: #8772

ableegoldman added 3 commits May 27, 2020 21:09

fix timeout config

7c1ef4d

reset nextProbingRebalanceMs

7676de2

extract internal admin configs

702187a

ableegoldman commented May 28, 2020

View reviewed changes

ableegoldman force-pushed the MINOR-remove-extra-admin-timeout-config branch from 45d3dc4 to 702187a Compare May 28, 2020 05:29

ableegoldman added 2 commits May 27, 2020 22:31

default.api.timeout must be larger than request.timeout

ef807c2

fix javadocs missing end delimiter

8b5766a

chia7712 reviewed May 28, 2020

View reviewed changes

vvcephei reviewed May 28, 2020

View reviewed changes

github review suggestions

1e03bd3

vvcephei approved these changes May 28, 2020

View reviewed changes

Merge branch 'trunk' into MINOR-remove-extra-admin-timeout-config

c55e557

mjsax reviewed May 28, 2020

View reviewed changes

ableegoldman added 2 commits May 28, 2020 14:49

remove timeout instead

c290156

Merge branch 'MINOR-remove-extra-admin-timeout-config' of https://git…

54ea20b

…hub.com/ableegoldman/kafka into MINOR-remove-extra-admin-timeout-config

ableegoldman changed the title ~~MINOR: apply default.api.timeout.ms to fetchEndOffsets~~ MINOR: remove unnecessary timeout for admin request May 28, 2020

vvcephei approved these changes May 28, 2020

View reviewed changes

mjsax reviewed May 28, 2020

View reviewed changes

mjsax approved these changes May 28, 2020

View reviewed changes

ableegoldman added 2 commits May 28, 2020 15:24

checkstyle

0edafb6

add StreamsException to javadocs

f7aad3a

mjsax merged commit 36ca33f into apache:trunk May 29, 2020

mjsax pushed a commit that referenced this pull request May 29, 2020

MINOR: remove unnecessary timeout for admin request (#8738)

3229e90

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>

vvcephei mentioned this pull request May 29, 2020

KAFKA-9999: Make internal topic creation error non-fatal #8677

Closed

3 tasks

ableegoldman deleted the MINOR-remove-extra-admin-timeout-config branch May 29, 2020 20:49

ableegoldman mentioned this pull request Jun 1, 2020

MINOR: clarify exception throwing in allStorePartitionLags javadocs #8772

Merged

Conversation

ableegoldman commented May 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

vvcephei commented May 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

vvcephei commented May 28, 2020

ableegoldman commented May 28, 2020 •

edited

Loading

mjsax May 28, 2020 •

edited

Loading