-
Notifications
You must be signed in to change notification settings - Fork 69
Description
What is missing?
In case a full AZ got disconnected, we'd want the ability to replace the whole rack concurrently.
The option I can think of right now is to allow ReplaceNode CassandraTasks to run concurrently and re-bootstrap multiple nodes in parallel (allow multiple jobs in a single task).
The starting sequence needs to be reviewed so that we can have a "fast startup path" for replacements, like we have for restarts.
We need to verify that Cassandra allows multiple replacements concurrently and if that requires additional jvm options to be set (and see if they reduce the safety of the system overall).
Why is this needed?
With very high densities it can be more challenging to run repair compared to replacing nodes, but currently CassandraTasks allow replacing a single node at a time, which might not be efficient enough.