Refresh DruidLeaderClient cache selectively for non-200 responses#14092
Refresh DruidLeaderClient cache selectively for non-200 responses#14092gianm merged 8 commits intoapache:masterfrom
Conversation
|
Instead of each caller passing through an acceptable response code, why not just do a retry (along with cache invalidation) on any 5xx? I think that's what happens after your change anyway. we wouldn't want to invalidate the cache if there is a 401. when the client needs to find a new leader, doesn't seem to be something that a class like LookupReferenceManager should decide. and if its just retries, we can do that for 503 and 504 no matter where the request comes from. 503 and 504 are more likely to be thrown by the proxy than from the leader service. Is there anything else in the response from the proxy that could be used to detect such setup and then we can use that for a targeted retry? If not, we can still retry (with cache invalidation for 503/504). |
|
@abhishekagarwal87 Thanks for taking a look at this.
I was in 2 minds about this as well. It seemed like giving the callers an option to override this behavior, would provide the flexibility such that if in the future they would want to handle a 5xx selectively, without retry, then they'd have this option. Otherwise there would be no way to handle or expect a 5xx response from the node, internally the client would always think this as inappropriate and make additional requests, before handing it back to the caller. I do agree that it makes it a bit more complicated for classes like LookupReferenceManager. |
|
@abhishekagarwal87 Made changes so that we do implicit retries for 503/504. Added a boolean flag to disable retries for 503/504 if needed for a particular request. |
|
@abhishek-chouhan - We can get rid of the function signature with the boolean flag and always retry for 503/504. Anyway, the false flag is not being used anywhere. We can add it later if we ever need it. |
|
Done! |
| )); | ||
|
|
||
| request = withUrl(request, redirectUrl); | ||
| } else if (HttpResponseStatus.SERVICE_UNAVAILABLE.equals(responseStatus) |
There was a problem hiding this comment.
I missed this but if there is a retry happening in response to 503/504, there will be no way to see that happening in logs. We should be logging a warning here. what do you think?
There was a problem hiding this comment.
Makes sense. Added a log.
|
A note, there's also the newer #12696 introduced these and discusses them in more detail. The PR calls out some issues with DruidLeaderClient:
So, we can likely improve further by migrating things to ServiceClient. |
|
Thanks for the reviews and commit @abhishekagarwal87 @gianm ! |
|
thank you for your contribution @abhishek-chouhan. Looking forward to many more 😉 |
Fixes #14090.
Description
Introduces retries for non-200 responses along with cache invalidation. Caller can still control the behavior by passing in the expected responses (in addition to 200), in which case the response is returned without any internal retries. The internal retries with cache-invalidation are best-effort, in case of continuous failure, failed response is returned to the caller. This keeps the existing behavior of the API as is.
Release note
Key changes in this PR
This PR has: