Skip to content

Gracefully handle mixed consensus configuration #559

@jtuple

Description

@jtuple

Currently, when not all nodes in a cluster have consensus (strong_consistency = on) configured, nodes with consensus enabled (particularly the claimant as well as the root leader) have various processes crash trying to send messages (eg. gen_server:call) to processes that aren't running on the remote nodes (eg. riak_ensemble_manager).

Test this scenario in more detail and make things not crash.

In the future, we should consider adding a proper consensus capability, but I'm not sure that's necessary nor something we should add this late in the 2.0 game. But, feel free to argue that point if you disagree.

It's entirely fine for 2.0 to ship with the condition that users must configure all nodes the same as far as consensus goes for things to work properly (eg. reads/writes to consistent operations, ensembles coming up, etc). However, for a user to activate consensus cluster wide, they'll need to do a rolling restart to change the configuration and we should gracefully handle the mixed case during this window (eg. operations may fail, but we shouldn't be spamming the log with errors or crashing processes for no good reason).

/cc #536

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions