KAFKA-14207; KRaft Operations documentation#12642
Conversation
|
|
||
| <h5 class="anchor-heading"><a id="kraft_role" class="anchor-link"></a><a href="#kraft_role">Process Roles</a></h5> | ||
|
|
||
| <p>In KRaft mode each Kafka server can be configured as a controller, as a broker or as both using the <code>process.roles<code> property. This property can have the following values:</p> |
There was a problem hiding this comment.
grammar: how about
as a controller, a broker, or both by using the process.roles property
| <ul> | ||
| <li>If <code>process.roles</code> is set to <code>broker</code>, the server acts as a broker.</li> | ||
| <li>If <code>process.roles</code> is set to <code>controller</code>, the server acts as a controller.</li> | ||
| <li>If <code>process.roles</code> is set to <code>broker,controller</code>, the server acts as a broker and a controller.</li> |
There was a problem hiding this comment.
How about adding "both" ?
the server acts as both a broker and a controller.
| <li>If <code>process.roles</code> is not set at all, it is assumed to be in ZooKeeper mode.</li> | ||
| </ul> | ||
|
|
||
| <p>Nodes that act as both brokers and controllers are referred to as "combined" nodes. Combined nodes are simpler to operate for simple use cases like a development environment. The key disadvantage is that the controller will be less isolated from the rest of the system. Combined mode is not recommended is critical deployment environments.</p> |
There was a problem hiding this comment.
Maybe add an example of how operations are harder like "it is not possible to roll the controllers separately from the brokers when in combined mode" ?
There was a problem hiding this comment.
Use Kafka Servers instead of Nodes to be consistent with the above configs. ie Kafka servers that act as both brokers and controllers are referred to as "combined" nodes.
There was a problem hiding this comment.
typo: Combined mode is not recommended IN critical -- last sentence
|
|
||
| <p>In KRaft mode, only a small group of specially selected servers can act as controllers (unlike the ZooKeeper-based mode, where any server can become the Controller). The specially selected controller servers will participate in the metadata quorum. Each controller server is either active, or a hot standby for the current active controller server.</p> | ||
|
|
||
| <p>A Kafka cluster will typically select 3 or 5 servers for this role, depending on factors like cost and the number of concurrent failures your system should withstand without availability impact. A majority of the controllers must be alive in order to maintain availability. With 3 controllers, the cluster can tolerate 1 controller failure; with 5 controllers, the cluster can tolerate 2 controller failures.</p> |
There was a problem hiding this comment.
How about
A Kafka admin will typically select...
the Kafka cluster hasn't achieved sentience .... YET. :)
|
|
||
| <p>A Kafka cluster will typically select 3 or 5 servers for this role, depending on factors like cost and the number of concurrent failures your system should withstand without availability impact. A majority of the controllers must be alive in order to maintain availability. With 3 controllers, the cluster can tolerate 1 controller failure; with 5 controllers, the cluster can tolerate 2 controller failures.</p> | ||
|
|
||
| <p>All of the servers in a Kafka cluster discover the quorum voters using the <code>controller.quorum.voters</code> property. This identifies the quorum controller servers that should be used. All the controllers must be enumerated. Each controller is identified with their <code>id</code>, <code>host</code> and <code>port</code> information. This is an example configuration: <code>controller.quorum.voters=id1@host1:port1,id2@host2:port2,id3@host3:port3</code></p> |
There was a problem hiding this comment.
can we put the example configuration in a PRE or CODE block, or whatever, so it shows as monospace? (Just a thought)
|
|
||
| <p>The <code>kafka-dump-log</code> tool can be used to debug the log segments and snapshots for the cluster metadata directory. The tool will scan the provided files and decode the metadata records. For example, this command decodes and prints the records in the first log segment:</p> | ||
|
|
||
| <pre class="line-numbers"><code class="language-bash"> > bin/kafka-dump-log.sh --cluster-metadata-decoder --skip-record-metadat --files metadata_log_dir/__cluster_metadata-0/00000000000000000000.log</code></pre> |
There was a problem hiding this comment.
can we leave off --skip-record-metadata? I recall it making the output a bit weird. also, it's misspelled here.
|
|
||
| <p>This command decodes and prints the recrods in the a cluster metadata snapshot:</p> | ||
|
|
||
| <pre class="line-numbers"><code class="language-bash"> > bin/kafka-dump-log.sh --cluster-metadata-decoder --skip-record-metadat --files metadata_log_dir/__cluster_metadata-0/00000000000000000100-0000000001.checkpoint</code></pre> |
There was a problem hiding this comment.
same.
can we leave off --skip-record-metadata? I recall it making the output a bit weird. also, it's misspelled here.
| * Modifying certain dynamic configurations on the standalone KRaft controller | ||
| * Support for some configurations, like enabling unclean leader election by default or dynamically changing broker endpoints | ||
| * Delegation tokens | ||
| * Upgrade from ZooKeeper mode |
There was a problem hiding this comment.
Minor nit: Should this list be in order of priority, like upgrade zookeeper #1.
There was a problem hiding this comment.
I think we should delete this file now that we have moved most of this information to ops.html. I can do that in a future PR. Didn't want to have this discussion in this PR.
| <li>If <code>process.roles</code> is not set at all, it is assumed to be in ZooKeeper mode.</li> | ||
| </ul> | ||
|
|
||
| <p>Nodes that act as both brokers and controllers are referred to as "combined" nodes. Combined nodes are simpler to operate for simple use cases like a development environment. The key disadvantage is that the controller will be less isolated from the rest of the system. Combined mode is not recommended is critical deployment environments.</p> |
There was a problem hiding this comment.
Use Kafka Servers instead of Nodes to be consistent with the above configs. ie Kafka servers that act as both brokers and controllers are referred to as "combined" nodes.
| <li>If <code>process.roles</code> is not set at all, it is assumed to be in ZooKeeper mode.</li> | ||
| </ul> | ||
|
|
||
| <p>Nodes that act as both brokers and controllers are referred to as "combined" nodes. Combined nodes are simpler to operate for simple use cases like a development environment. The key disadvantage is that the controller will be less isolated from the rest of the system. Combined mode is not recommended is critical deployment environments.</p> |
There was a problem hiding this comment.
typo: Combined mode is not recommended IN critical -- last sentence
|
|
||
| <h5 class="anchor-heading"><a id="kraft_voter" class="anchor-link"></a><a href="#kraft_voter">Controllers</a></h5> | ||
|
|
||
| <p>In KRaft mode, only a small group of specially selected servers can act as controllers (unlike the ZooKeeper-based mode, where any server can become the Controller). The specially selected controller servers will participate in the metadata quorum. Each controller server is either active, or a hot standby for the current active controller server.</p> |
There was a problem hiding this comment.
Suggestion if this is technically correct for better flow.
In Kraft mode, specific servers are selected to be controllers (unlike ZK....). The servers selected to be controllers will participate in the metadata quorum. Each controller is either active or a hot standby for the current active controller.
|
|
||
| <p>All of the servers in a Kafka cluster discover the quorum voters using the <code>controller.quorum.voters</code> property. This identifies the quorum controller servers that should be used. All the controllers must be enumerated. Each controller is identified with their <code>id</code>, <code>host</code> and <code>port</code> information. This is an example configuration: <code>controller.quorum.voters=id1@host1:port1,id2@host2:port2,id3@host3:port3</code></p> | ||
|
|
||
| <p>If the Kafka cluster has 3 controllers named controller1, controller2 and controller3 then controller3 may have the following:</p> |
There was a problem hiding this comment.
controller3 may have the following configuration: ??
|
|
||
| <h5 class="anchor-heading"><a id="kraft_metadata_tool" class="anchor-link"></a><a href="#kraft_metadata_tool">Metadata Quorum Tool</a></h5> | ||
|
|
||
| <p>The <code>kafka-metadata-quorum</code> tool can be used to describe the runtime state of the cluster metadata partition. For example, the following command display a summary of the metadata quorum:</p> |
There was a problem hiding this comment.
typo: display should be displays
|
|
||
| <ul> | ||
| <li>Kafka server's <code>process.role</code> should be set to either <code>broker</code> or <code>controller</code> but not both. Combined mode can be used in development enviroment but it should be avoided in critical deployment evironments.</li> | ||
| <li>For redundancy, a Kafka cluster should user 3 controllers. More than 3 servers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka.</li> |
There was a problem hiding this comment.
For redundancy, a Kafka cluster should have a minimum of 3 controllers. More than 3 servers is not recommended in critical environments. what? If you are saying less than three then strike this sentence. The change I have would cover this.
In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka What does this mean? Does it affect ZK too or is it a KRaft bug.
|
|
||
| <ul> | ||
| <li>Kafka server's <code>process.role</code> should be set to either <code>broker</code> or <code>controller</code> but not both. Combined mode can be used in development enviroment but it should be avoided in critical deployment evironments.</li> | ||
| <li>For redundancy, a Kafka cluster should user 3 controllers. More than 3 servers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka.</li> |
There was a problem hiding this comment.
| <li>For redundancy, a Kafka cluster should user 3 controllers. More than 3 servers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka.</li> | |
| <li>For redundancy, a Kafka cluster should have a minimum of 3 controllers. Less than 3 controllers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future Kafka release.</li> |
There was a problem hiding this comment.
I think we should only recommend 3 controllers. I think we need to implement Pre-vote before recommending using more than 3 controllers.
Co-authored-by: Chase Thomas <forlack@gmail.com>
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com> Co-authored-by: José Armando García Sancio <jsancio@users.noreply.github.com>
Committer Checklist (excluded from commit message)