[Proposal] Shutdown druid processes upon complete loss of ZK connectivity

Currently if there is a loss of connectivity between the druid nodes and the zookeeper, the curator attempts connection retries and finally gives up retrying. At this point, the druid node is in a weird state. In case of this happening to a broker, it would still serve queries but provide possibly incorrect results. 
Historicals with loss of ZK connectivity would fail to show up on the coordinator console, even the process is still running (which could be tricky for cluster operators to identify).

The proposal that I'm working on is to shutdown the druid process once the connection retries to ZK are exhausted. Shutting down the process would make more sense than the node remaining in an unstable state as the former can trigger configured process alerts or if there is a supervisor process configured, it can restart the druid process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Shutdown druid processes upon complete loss of ZK connectivity #6518

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal] Shutdown druid processes upon complete loss of ZK connectivity #6518

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions