Make some Raft parameters tunable by environment variables#2566
Make some Raft parameters tunable by environment variables#2566chungers wants to merge 1 commit into
Conversation
|
Having been bitten pretty hard by environment variables in testkit, I'm inclined to suggest we move checking these environment variables to the highest level in the code feasible. Probably the |
| // EnvRaftHeartbeatTick is the env to set to customize the Raft heartbeat tick | ||
| EnvRaftHeartbeatTick = "SWARMKIT_RAFT_HEARTBEAT_TICK" | ||
|
|
||
| // EnvElectionTick is the env to set to customize the Raft leader election delay |
There was a problem hiding this comment.
Circleci reports that this comment is missing a Raft in EnvRaftElectionTick and is failing because of it.
There was a problem hiding this comment.
Sorry. will fix this.
|
If we are going this far, I'd really like to see this happen in the cluster configuration, so we can be certain the whole cluster is using consistent values. Let me know if you need more direction on that. |
|
Also, what is the motivation behind making these tunable ? Just want to make sure everyone's on the same page. |
|
the motivation makes sense to me, which is that it would let us tweak raft parameters on a live cluster without rebuilds. i don't think it's worth exposing to end-users but could be useful for lots of kinds of debugging and testing. |
|
I think making these parameters tunable is necessary to make things more serviceable. If we had this we could have had customers / field make changes to tune the system without waiting for a new release. Also, as part of performance engineering, we need to come up with proper ranges of parameters for known/controlled environments and those can be incorporated in the future as defaults or validation rules. I am ok with exposing these at the level that makes sense -- sorry I am not familiar with how this is vendored/used in the engine so please let me know of a place that makes the most sense. Maybe swarmkit-specific environment variables are actually the way to go without exposing this to the end user. |
|
Here's my take on this: I am not yet convinced that these two params need to be tunable in the field. Currently, we don't fully understand how different values of these params effect the system resilience to network partitions an degree of packet loss. I don't think doing this out in the field is a good way to test out the effects of these params.
I agree and my suggestion is to do that before we make this change. |
|
LGTM After looking at this, there is already the ability to tune these parameters in several quorum systems (https://coreos.com/etcd/docs/latest/tuning.html is one example; thanks, @cyli ). These will provide a much needed tool for tuning the resilience for varying environments. |
|
Here's docs from etcd on tuning:
So these are parameters that imho cannot be set to be fixed at the factory. I agree that we shouldn't make the users make guesses and we absolutely need to characterize the system in controlled environments to arrive at baseline recommendations; however, to make the system serviceable by people in the field (support engineers or technical qualified personnel), some degree of tunability is required. I had a chat with @stevvooe and perhaps making the environment variable names obvious that changing these could void the warranty (like calling it Thoughts? |
|
Updated to use |
Signed-off-by: David Chung <david.chung@docker.com>
Codecov Report
@@ Coverage Diff @@
## master #2566 +/- ##
==========================================
+ Coverage 61.4% 61.49% +0.08%
==========================================
Files 134 49 -85
Lines 21800 6329 -15471
==========================================
- Hits 13387 3892 -9495
+ Misses 6968 2062 -4906
+ Partials 1445 375 -1070 |
This PR adds on a previous PR #2564 and make some of the Raft parameter changeable via environment variables. If the environment variables are not specified, then the default values
established in that PR are used instead. This will make it possible to tune the leader election
behavior without recompiling binaries, because these parameters don't seem to be configurable
once swarmkit is vendored into another program.
Signed-off-by: David Chung david.chung@docker.com