Currently, when not all nodes in a cluster have consensus (strong_consistency = on) configured, nodes with consensus enabled (particularly the claimant as well as the root leader) have various processes crash trying to send messages (eg. gen_server:call) to processes that aren't running on the remote nodes (eg. riak_ensemble_manager).
Test this scenario in more detail and make things not crash.
In the future, we should consider adding a proper consensus capability, but I'm not sure that's necessary nor something we should add this late in the 2.0 game. But, feel free to argue that point if you disagree.
It's entirely fine for 2.0 to ship with the condition that users must configure all nodes the same as far as consensus goes for things to work properly (eg. reads/writes to consistent operations, ensembles coming up, etc). However, for a user to activate consensus cluster wide, they'll need to do a rolling restart to change the configuration and we should gracefully handle the mixed case during this window (eg. operations may fail, but we shouldn't be spamming the log with errors or crashing processes for no good reason).
/cc #536
Currently, when not all nodes in a cluster have consensus (
strong_consistency = on) configured, nodes with consensus enabled (particularly the claimant as well as the root leader) have various processes crash trying to send messages (eg.gen_server:call) to processes that aren't running on the remote nodes (eg.riak_ensemble_manager).Test this scenario in more detail and make things not crash.
In the future, we should consider adding a proper consensus capability, but I'm not sure that's necessary nor something we should add this late in the 2.0 game. But, feel free to argue that point if you disagree.
It's entirely fine for 2.0 to ship with the condition that users must configure all nodes the same as far as consensus goes for things to work properly (eg. reads/writes to consistent operations, ensembles coming up, etc). However, for a user to activate consensus cluster wide, they'll need to do a rolling restart to change the configuration and we should gracefully handle the mixed case during this window (eg. operations may fail, but we shouldn't be spamming the log with errors or crashing processes for no good reason).
/cc #536