KAFKA-15859: Add timeout field to the ListOffsets request#17112
KAFKA-15859: Add timeout field to the ListOffsets request#17112satishd merged 3 commits intoapache:trunkfrom
Conversation
|
This PR is not dependent on #16602 and can be reviewed separately. PTAL. |
| "validVersions": "0-9", | ||
| // | ||
| // Version 10 enables async remote list offsets support (KIP-1075) | ||
| "validVersions": "0-10", |
There was a problem hiding this comment.
since we have flexible versions from v6, is it required to bump the version to 10?
fbede17 to
77671e3
Compare
1a91702 to
c0f94be
Compare
|
This is the final part-3 of KIP-1075. The PR is ready for review. PTAL. Thanks! |
61ded02 to
4c274e8
Compare
showuon
left a comment
There was a problem hiding this comment.
Do you think we should do some guard like this one: KAFKA-17331?
| "validVersions": "0-9", | ||
| // | ||
| // Version 10 enables async remote list offsets support (KIP-1075) | ||
| "validVersions": "0-10", |
4c274e8 to
4a160ec
Compare
If the client doesn't provides the timeout, then we will use the |
|
Verified that the timeout propagates as expected by running the server, admin, and consumer locally: |
4a160ec to
1cff7ec
Compare
|
@satishd @showuon this PR broke some tests and has caused trunk builds to fail. The latest CI run for this PR was https://github.com/apache/kafka/actions/runs/11027485627 which shows both of the JUnit steps failing. Unlike with Jenkins, we need to look at every failed test build -- especially before merging. Looking at these two failed jobs, we can see we have actual failed tests and not just flaky ones
https://github.com/apache/kafka/actions/runs/11027485627/job/30626323008 We want to see all the status checks in the PR to be green before merging. I would like to have branch protections in place to prevent this sort of regression, but until we sort out the flaky tests we can't really. Anyways, I'm not meaning to pick on this PR -- just trying to raise awareness of our "new normal" for build expectations :) |
|
The failed test are fixed in #17287 |
|
Sorry for missing the tests(thought those were the flaky ones). Thanks @mumrah for following up and getting the fix merged. |
|
@kamalcph We were using current trunk with brokers that do not have your change yet, and AdminClient's ListOffset call is running into It seems this could be related to this PR. Could you please take a look? |
Opened #17358 to address this issue. PTAL. |
This is part-3 of the KIP-1075. Added a `timeoutMs` field to the ListOffsets request. This timeout is applicable only for the topic/partitions that are enabled with remote storage. When the timeout is defined in the request, then we use it to define the delay timeout for `DelayedRemoteListOffsets` request. When the timeout is not defined (requests from older client), then we take the dynamic `remote.list.offsets.request.timeout.ms` server config as the timeout. Consumer and Admin client behavior are different. Consumer retries the LIST_OFFSETS request in-case of an error but not the AdminClient. And, consumer timeouts the request, if the response exceeds `request.timeout.ms`, whereas, AdminClient timeouts the request when it exceeds the `default.api.timeout.ms`. To retain the same behavior, we are passing the `requestTimeoutMs` as timeout from the consumer and defaultApiTimeout / overwritten ListOffsetsOption timeout from the admin. Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>

This is the part-3 of the KIP-1075
Added a
timeoutMsfield to the ListOffsets request. This timeout is applicable only for the topic/partitions that are enabled with remote storage.When the timeout is defined in the request, then we use it to define the delay timeout for DelayedRemoteListOffsets request. When the timeout is not defined (requests from older client), then we take the dynamic
remote.list.offsets.request.timeout.msserver config as the timeout.Consumer and Admin client behavior are different. Consumer retries the LIST_OFFSETS request in-case of an error but not the AdminClient. And, consumer timeouts the request, if the response exceeds
request.timeout.ms, whereas, AdminClient timeouts the request when it exceeds thedefault.api.timeout.ms.To retain the same behavior, we are passing the requestTimeoutMs as timeout from the consumer and defaultApiTimeout / overwritten ListOffsetsOption timeout from the admin.
Committer Checklist (excluded from commit message)