Skip to content

coordinator lose watch on /druid/announcements due to unreasonable usage of PathChildrenCache #6597

@warren288

Description

@warren288

When a coordinator nodes becomes leader, it starts a PathChildrenCache for /druid/announcements in CuratorInventoryManager.start. if no historical nodes had started, thus /druid/announcements does not exist, the PathChildrenCache for /druid/announcements created by coordinator will create /druid/announcements with CONTAINER mode, and then historicals will announce themselves in /druid/announcements.
After a while if all historicals are shutdown due to some exception, thus /druid/annoucements become empty, the zookeeper server will clean /druid/annoucements node and coordinator leader loses watch on it.
When historicals restore and create /druid/annoucements again, coordinator leader can never perceive.

So, in CuratorInventoryManager.start, coordinator leader should check whether /druid/announcements node exists and create it with PERSISTENT mode if not, before it starts PathChildrenCache for /druid/announcements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions