Skip to content

MINOR: remove unnecessary timeout for admin request#8738

Merged
mjsax merged 11 commits intoapache:trunkfrom
ableegoldman:MINOR-remove-extra-admin-timeout-config
May 29, 2020
Merged

MINOR: remove unnecessary timeout for admin request#8738
mjsax merged 11 commits intoapache:trunkfrom
ableegoldman:MINOR-remove-extra-admin-timeout-config

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

@ableegoldman ableegoldman commented May 28, 2020

Turns out future.get() actually does apply the admin's default.api.timeout.ms config internally, so we don't need to worry about providing a timeout of our own. Who knew

private final static String INTERRUPTED_ERROR_MESSAGE = "Thread got interrupted. This indicates a bug. " +
"Please report at https://issues.apache.org/jira/projects/KAFKA or dev-mailing list (https://kafka.apache.org/contact).";

private static final class InternalAdminClientConfig extends AdminClientConfig {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to ClientUtil

if (nextProbingRebalanceMs.get() < time.milliseconds()) {
log.info("Triggering the followup rebalance scheduled for {} ms.", nextProbingRebalanceMs.get());
mainConsumer.enforceRebalance();
nextProbingRebalanceMs.set(Long.MAX_VALUE);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neither relevant to this PR nor required for correctness, but I noticed the log message above tends to spam the logs in some tests. Since this gets set/reset at the end of every rebalance, we may as well reset it here to avoid an avalanche of Triggering the followup rebalance...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforceRebalance is guaranteed not to actually run the assignment logic, right? That will only run during a call to poll, I'm hoping. Otherwise, this line should go before the call.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It just provides a notice to the consumer to enforce that a rebalance will occur on the next poll

@ableegoldman ableegoldman force-pushed the MINOR-remove-extra-admin-timeout-config branch from 45d3dc4 to 702187a Compare May 28, 2020 05:29
public static Map<TopicPartition, ListOffsetsResultInfo> fetchEndOffsets(final Collection<TopicPartition> partitions,
final Admin adminClient,
final Duration timeout) {
final long timeoutMs) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me Duration is more readable than long. Is there a reason to make this change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because we're now also calling it right after calling getAdminDefaultApiTimeoutMs, so it seems a bummer to create a Duration from millis and then immediately convert it back to millis.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

fetchEndOffsets(
allPartitions,
adminClient,
getAdminDefaultApiTimeoutMs(config)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admin configs is built in KafkaStreams construction. Could we reuse it?

        adminClient = clientSupplier.getAdmin(config.getAdminConfigs(ClientUtils.getSharedAdminClientId(clientId)));

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way since we need to get the admin's default.api.timeout in other places where we only have the streamsConfig, so we may as well just pass that as the argument to getAdminDefaultApiTimeoutMs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the JavaDocs that this method may throw a TimeoutException now. What make we wondering if this is a public API change? Was there any discussion on the original KIP about the behavior of allLocalStorePartitionLags ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change might require a KIP... \cc @vvcephei @guozhangwang WDYT?

final Admin adminClient) {
return fetchEndOffsets(partitions, adminClient, null);
public static int getAdminDefaultApiTimeoutMs(final StreamsConfig streamsConfig) {
final InternalAdminClientConfig dummyAdmin = new InternalAdminClientConfig(streamsConfig.getAdminConfigs("dummy"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comment for the "dummy"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few minor comments.

public static Map<TopicPartition, ListOffsetsResultInfo> fetchEndOffsets(final Collection<TopicPartition> partitions,
final Admin adminClient,
final Duration timeout) {
final long timeoutMs) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because we're now also calling it right after calling getAdminDefaultApiTimeoutMs, so it seems a bummer to create a Duration from millis and then immediately convert it back to millis.

mkEntry(StreamsConfig.PROBING_REBALANCE_INTERVAL_MS_CONFIG, "480000"),
mkEntry(StreamsConfig.InternalConfig.ASSIGNMENT_LISTENER, configuredAssignmentListener),
mkEntry(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, 9)
mkEntry(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, 90_000)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why the change from 9 to 90,000?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fun

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, because the value has to be larger than the request.timeout.ms which defaults to 30,000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks.

@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

@vvcephei
Copy link
Copy Markdown
Contributor

It looks like all the review comments are addressed, and all the tests passed, so I'll proceed to merge.

@vvcephei
Copy link
Copy Markdown
Contributor

Oh, actually @ableegoldman , it looks like there was a conflict with the other PR I just merged.

@vvcephei
Copy link
Copy Markdown
Contributor

I just resolved the conflicts.

@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

@vvcephei
Copy link
Copy Markdown
Contributor

Ok to test

@vvcephei
Copy link
Copy Markdown
Contributor

Retest this please

@vvcephei
Copy link
Copy Markdown
Contributor

... or not

@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

2 similar comments
@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

@vvcephei
Copy link
Copy Markdown
Contributor

Retest this please

@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

@vvcephei
Copy link
Copy Markdown
Contributor

\o/

EasyMock.expect(adminClient.listOffsets(EasyMock.anyObject())).andThrow(new RuntimeException());
replay(adminClient);
assertThrows(StreamsException.class, () -> fetchEndOffsetsWithoutTimeout(emptyList(), adminClient));
assertThrows(StreamsException.class, () -> fetchEndOffsets(emptyList(), adminClient, 60_000L));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass in MAX_VALUE to avoid introducing test flakyness?

EasyMock.expect(adminClient.listOffsets(EasyMock.anyObject())).andStubReturn(result);
EasyMock.expect(result.all()).andStubReturn(allFuture);
EasyMock.expect(allFuture.get()).andThrow(new InterruptedException());
EasyMock.expect(allFuture.get(60000L, TimeUnit.MILLISECONDS)).andThrow(new InterruptedException());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above (also below)

Also nit: 60_000L (if just 60 and TimeUnit.SECONDS?)

@ableegoldman ableegoldman changed the title MINOR: apply default.api.timeout.ms to fetchEndOffsets MINOR: remove unnecessary timeout for admin request May 28, 2020
Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @ableegoldman

@vvcephei
Copy link
Copy Markdown
Contributor

Test this please

1 similar comment
@vvcephei
Copy link
Copy Markdown
Contributor

Test this please


log.debug("Current changelog positions: {}", allChangelogPositions);
final Map<TopicPartition, ListOffsetsResultInfo> allEndOffsets = fetchEndOffsetsWithoutTimeout(allPartitions, adminClient);
final Map<TopicPartition, ListOffsetsResultInfo> allEndOffsets = fetchEndOffsets(allPartitions, adminClient);
Copy link
Copy Markdown
Member

@mjsax mjsax May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please update the JavaDocs of allLocalStorePartitionLags to state that a StreamsException could be thrown?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, done

@mjsax
Copy link
Copy Markdown
Member

mjsax commented May 28, 2020

Checkstyle error:

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.12/streams/src/test/java/org/apache/kafka/streams/processor/internals/StreamsPartitionAssignorTest.java:21:8: Unused import - org.apache.kafka.clients.admin.AdminClientConfig. [UnusedImports]

@mjsax
Copy link
Copy Markdown
Member

mjsax commented May 28, 2020

Retest this please.

1 similar comment
@mjsax
Copy link
Copy Markdown
Member

mjsax commented May 28, 2020

Retest this please.

@mjsax mjsax merged commit 36ca33f into apache:trunk May 29, 2020
mjsax pushed a commit that referenced this pull request May 29, 2020
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
@mjsax
Copy link
Copy Markdown
Member

mjsax commented May 29, 2020

Merged to trunk and cherry-picked to 2.6. Do we want this in 2.5, too? If yes, we would need a new PR. Not possible to cherry-pick.

@ableegoldman
Copy link
Copy Markdown
Member Author

Thanks! I actually don't think there ended up being anything relevant t o2.5 in the final form of this PR. Except maybe adding @throws StreamsException to the allLocalStorePartitionLags javadocs, but not sure that warrants an entire PR

@mjsax
Copy link
Copy Markdown
Member

mjsax commented May 29, 2020

Ack. In 2.5 we use the AdminClient directly in allLocalStorePartitionLags and don't apply a timeout on get(). -- Might still be worth to do quick PR to update the JavaDocs :)

@ableegoldman ableegoldman deleted the MINOR-remove-extra-admin-timeout-config branch May 29, 2020 20:49
Kvicii pushed a commit to Kvicii/kafka that referenced this pull request May 30, 2020
* 'trunk' of github.com:apache/kafka: (36 commits)
  Remove redundant `containsKey` call in KafkaProducer (apache#8761)
  KAFKA-9494; Include additional metadata information in DescribeConfig response (KIP-569) (apache#8723)
  KAFKA-10061; Fix flaky `ReassignPartitionsIntegrationTest.testCancellation` (apache#8749)
  KAFKA-9130; KIP-518 Allow listing consumer groups per state (apache#8238)
  KAFKA-9501: convert between active and standby without closing stores (apache#8248)
  KAFKA-10056; Ensure consumer metadata contains new topics on subscription change (apache#8739)
  MINOR: Log the reason for coordinator discovery failure (apache#8747)
  KAFKA-10029; Don't update completedReceives when channels are closed to avoid ConcurrentModificationException (apache#8705)
  MINOR: remove unnecessary timeout for admin request (apache#8738)
  MINOR: Relax Percentiles test (apache#8748)
  MINOR: regression test for task assignor config (apache#8743)
  MINOR: Update documentation.html to refer to 2.6 (apache#8745)
  MINOR: Update documentation.html to refer to 2.5 (apache#8744)
  KAFKA-9673: Filter and Conditional SMTs (apache#8699)
  KAFKA-9971: Error Reporting in Sink Connectors (KIP-610) (apache#8720)
  KAFKA-10052: Harden assertion of topic settings in Connect integration tests (apache#8735)
  MINOR: Slight MetadataCache tweaks to avoid unnecessary work (apache#8728)
  KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness (apache#8736)
  KAFKA-10050: kafka_log4j_appender.py fixed for JDK11 (apache#8731)
  KAFKA-9146: Add option to force delete active members in StreamsResetter (apache#8589)
  ...

# Conflicts:
#	core/src/main/scala/kafka/log/Log.scala
@ableegoldman
Copy link
Copy Markdown
Member Author

Fair enough: #8772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants