Skip to content

DruidLeaderClient should refresh cache for non-200 responses #14090

@abhishek-chouhan

Description

@abhishek-chouhan

Currently DruidLeaderClient invalidates the cache when it encounters an IOException or a ChannelException (here). In environments where proxies/sidecars are involved in communication between Druid components, there is also the possibility of getting errors from the proxy itself such as a 503, when the requested endpoint for outbound communication is unavailable.
In this case DruidLeaderClient treats 503 as a valid response and propogates it back to the caller, even if the caller retries, the internal cache is never refreshed and the requested node is not re-resolved. This leads to continuous failures for scenarios such as rolling restarts/upgrades when the cache needs to be re-resolved.
One of the options of dealing with this is to extend the API and allow callers to pass in the HTTP responses which they want to handle. If we get a response with any of these, we pass it onwards to the caller, for anything else we do a best-effort retry inside the DruidLeaderClient (similar to IOException) by clearing the cache, which if its still unsuccessful is returned to the caller to deal with(existing behavior).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions