Skip to content

KAFKA-8503: Ignore retries config if a custom timeout is provided#6913

Closed
huxihx wants to merge 4 commits intoapache:trunkfrom
huxihx:KAFKA-8503
Closed

KAFKA-8503: Ignore retries config if a custom timeout is provided#6913
huxihx wants to merge 4 commits intoapache:trunkfrom
huxihx:KAFKA-8503

Conversation

@huxihx
Copy link
Copy Markdown
Contributor

@huxihx huxihx commented Jun 11, 2019

https://issues.apache.org/jira/browse/KAFKA-8503

When custom timeout is provided, retries config could be ignored for individual APIs in KafkaAdminClient. Tweaked code path for Call.fail to pass NPathComplexity check.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Jun 12, 2019

@hachikuji Please review this patch. Thanks.

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Sep 6, 2019

retest this please.

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Sep 12, 2019

@hachikuji Could you take some time to review this patch? Thanks.

@hachikuji
Copy link
Copy Markdown
Contributor

@huxihx I wrote this KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-533%3A+Add+default+api+timeout+to+AdminClient. Once this is approved, we can complete this work.

@hachikuji
Copy link
Copy Markdown
Contributor

@huxihx KIP-533 has been approved: https://cwiki.apache.org/confluence/display/KAFKA/KIP-533%3A+Add+default+api+timeout+to+AdminClient. Would you like to update this patch with the proposal?

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Nov 29, 2019

@hachikuji Please review this patch. Thanks.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Left a few comments. It would be helpful to have a basic unit or integration test.

+ "than 1/3 of that value. It can be adjusted even lower to control the expected time for normal rebalances.";

public static final String DEFAULT_API_TIMEOUT_MS_CONFIG = "default.api.timeout.ms";
public static final String DEFAULT_API_TIMEOUT_MS_DOC = "Specifies the timeout (in milliseconds) for client APIs that could block. " +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to generalize this description a little bit. The AdminClient APIs do not block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply dropping could block is a good option?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds fine.

*/
public Integer timeoutMs() {
return timeoutMs;
public Integer apiTimeoutMs() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are incompatible; we need to keep the existing API. I think it's fine to update the javadoc.

this.defaultTimeoutMs = config.getInt(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG);
this.defaultTimeoutMs = config.getInt(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG);
// Normally, the default api timeout should be equal or larger than the request timeout.
this.requestTimeoutMs = Math.min(defaultTimeoutMs, config.getInt(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So our options here are either to raise an error to the user or adjust one of the configurations. Since default.api.timeout.ms is a new configuration, it is possible that a user has explicitly provided a request.timeout.ms which conflicts with the default default.api.timeout.ms. I think the logic should be something like the following:

  1. If a default.api.timeout.ms has been explicitly specified, raise an error if it conflicts with request.timeout.ms.
  2. If no default.api.timeout.ms has been configured, then set its value as the max of the default and request.timeout.ms. Also we should probably log a warning.
  3. Otherwise, use the provided values for both configurations.

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Dec 6, 2019

retest this please.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Left one more comment. Note we still need test a test case which verifies basic timeout behavior.

}
Call call = calls.remove(0);
int timeoutMs = calcTimeoutMsRemainingAsInt(now, call.deadlineMs);
int timeoutMs = Math.min(requestTimeoutMs, calcTimeoutMsRemainingAsInt(now, call.deadlineMs));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to passing through the timeout to createRequest, we also need to pass it through in newClientRequest. Only some of the APIs support an explicit request timeout. This appears to have been a pre-existing bug.

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Dec 11, 2019

testHandleTimeout which is now disabled due to flaky failures seems to be a good starting point for testing the timeout behavior. Do we need a test case which should cover the new-added config or the request timeout or even both?

@hachikuji
Copy link
Copy Markdown
Contributor

hachikuji commented Dec 12, 2019

Mainly we need a test case which verifies that the config is actually respected. We don't have to do this for every API. I would say pick one and verify 1) that the default api timeout works, and 2) that it can be overridden by the Options.

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Dec 13, 2019

retest this please.

3 similar comments
@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Dec 16, 2019

retest this please.

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Dec 17, 2019

retest this please.

@huxihx
Copy link
Copy Markdown
Contributor Author

huxihx commented Dec 19, 2019

retest this please.

@hachikuji
Copy link
Copy Markdown
Contributor

@huxihx I opened a PR into your branch with some of the missing test cases. Please take a look: huxihx#1.

@hachikuji
Copy link
Copy Markdown
Contributor

@huxihx Do you have time to follow up here? I would like to get this merged for 2.5 if possible.

@hachikuji
Copy link
Copy Markdown
Contributor

@huxihx Closing this. We will merge #8011 with co-author attribution. Thanks for the contribution!

@hachikuji hachikuji closed this Jan 30, 2020
hachikuji added a commit that referenced this pull request Jan 31, 2020
This PR implements `default.api.timeout.ms` as documented by KIP-533. This is a rebased version of #6913 with some additional test cases and small cleanups.

Reviewers: David Arthur <mumrah@gmail.com>

Co-authored-by: huxi <huxi_2b@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants