Skip to content

KAFKA-14207; KRaft Operations documentation#12642

Merged
jsancio merged 3 commits intoapache:trunkfrom
jsancio:kafka-14207-kraft-ops-limitation
Sep 26, 2022
Merged

KAFKA-14207; KRaft Operations documentation#12642
jsancio merged 3 commits intoapache:trunkfrom
jsancio:kafka-14207-kraft-ops-limitation

Conversation

@jsancio
Copy link
Copy Markdown
Member

@jsancio jsancio commented Sep 14, 2022

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Comment thread docs/ops.html Outdated

<h5 class="anchor-heading"><a id="kraft_role" class="anchor-link"></a><a href="#kraft_role">Process Roles</a></h5>

<p>In KRaft mode each Kafka server can be configured as a controller, as a broker or as both using the <code>process.roles<code> property. This property can have the following values:</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar: how about

as a controller, a broker, or both by using the process.roles property

Comment thread docs/ops.html Outdated
<ul>
<li>If <code>process.roles</code> is set to <code>broker</code>, the server acts as a broker.</li>
<li>If <code>process.roles</code> is set to <code>controller</code>, the server acts as a controller.</li>
<li>If <code>process.roles</code> is set to <code>broker,controller</code>, the server acts as a broker and a controller.</li>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding "both" ?

the server acts as both a broker and a controller.

Comment thread docs/ops.html Outdated
<li>If <code>process.roles</code> is not set at all, it is assumed to be in ZooKeeper mode.</li>
</ul>

<p>Nodes that act as both brokers and controllers are referred to as "combined" nodes. Combined nodes are simpler to operate for simple use cases like a development environment. The key disadvantage is that the controller will be less isolated from the rest of the system. Combined mode is not recommended is critical deployment environments.</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an example of how operations are harder like "it is not possible to roll the controllers separately from the brokers when in combined mode" ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Kafka Servers instead of Nodes to be consistent with the above configs. ie Kafka servers that act as both brokers and controllers are referred to as "combined" nodes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Combined mode is not recommended IN critical -- last sentence

Comment thread docs/ops.html Outdated

<p>In KRaft mode, only a small group of specially selected servers can act as controllers (unlike the ZooKeeper-based mode, where any server can become the Controller). The specially selected controller servers will participate in the metadata quorum. Each controller server is either active, or a hot standby for the current active controller server.</p>

<p>A Kafka cluster will typically select 3 or 5 servers for this role, depending on factors like cost and the number of concurrent failures your system should withstand without availability impact. A majority of the controllers must be alive in order to maintain availability. With 3 controllers, the cluster can tolerate 1 controller failure; with 5 controllers, the cluster can tolerate 2 controller failures.</p>
Copy link
Copy Markdown
Contributor

@cmccabe cmccabe Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

A Kafka admin will typically select...

the Kafka cluster hasn't achieved sentience .... YET. :)

Comment thread docs/ops.html Outdated

<p>A Kafka cluster will typically select 3 or 5 servers for this role, depending on factors like cost and the number of concurrent failures your system should withstand without availability impact. A majority of the controllers must be alive in order to maintain availability. With 3 controllers, the cluster can tolerate 1 controller failure; with 5 controllers, the cluster can tolerate 2 controller failures.</p>

<p>All of the servers in a Kafka cluster discover the quorum voters using the <code>controller.quorum.voters</code> property. This identifies the quorum controller servers that should be used. All the controllers must be enumerated. Each controller is identified with their <code>id</code>, <code>host</code> and <code>port</code> information. This is an example configuration: <code>controller.quorum.voters=id1@host1:port1,id2@host2:port2,id3@host3:port3</code></p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put the example configuration in a PRE or CODE block, or whatever, so it shows as monospace? (Just a thought)

Comment thread docs/ops.html Outdated

<p>The <code>kafka-dump-log</code> tool can be used to debug the log segments and snapshots for the cluster metadata directory. The tool will scan the provided files and decode the metadata records. For example, this command decodes and prints the records in the first log segment:</p>

<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-dump-log.sh --cluster-metadata-decoder --skip-record-metadat --files metadata_log_dir/__cluster_metadata-0/00000000000000000000.log</code></pre>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we leave off --skip-record-metadata? I recall it making the output a bit weird. also, it's misspelled here.

Comment thread docs/ops.html Outdated

<p>This command decodes and prints the recrods in the a cluster metadata snapshot:</p>

<pre class="line-numbers"><code class="language-bash"> &gt; bin/kafka-dump-log.sh --cluster-metadata-decoder --skip-record-metadat --files metadata_log_dir/__cluster_metadata-0/00000000000000000100-0000000001.checkpoint</code></pre>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

can we leave off --skip-record-metadata? I recall it making the output a bit weird. also, it's misspelled here.

Copy link
Copy Markdown
Contributor

@cmccabe cmccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jsancio . I left some comments. LGTM after they are addressed.

Comment thread config/kraft/README.md
* Modifying certain dynamic configurations on the standalone KRaft controller
* Support for some configurations, like enabling unclean leader election by default or dynamically changing broker endpoints
* Delegation tokens
* Upgrade from ZooKeeper mode
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Should this list be in order of priority, like upgrade zookeeper #1.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should delete this file now that we have moved most of this information to ops.html. I can do that in a future PR. Didn't want to have this discussion in this PR.

Comment thread docs/ops.html Outdated
<li>If <code>process.roles</code> is not set at all, it is assumed to be in ZooKeeper mode.</li>
</ul>

<p>Nodes that act as both brokers and controllers are referred to as "combined" nodes. Combined nodes are simpler to operate for simple use cases like a development environment. The key disadvantage is that the controller will be less isolated from the rest of the system. Combined mode is not recommended is critical deployment environments.</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Kafka Servers instead of Nodes to be consistent with the above configs. ie Kafka servers that act as both brokers and controllers are referred to as "combined" nodes.

Comment thread docs/ops.html Outdated
<li>If <code>process.roles</code> is not set at all, it is assumed to be in ZooKeeper mode.</li>
</ul>

<p>Nodes that act as both brokers and controllers are referred to as "combined" nodes. Combined nodes are simpler to operate for simple use cases like a development environment. The key disadvantage is that the controller will be less isolated from the rest of the system. Combined mode is not recommended is critical deployment environments.</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Combined mode is not recommended IN critical -- last sentence

Comment thread docs/ops.html Outdated

<h5 class="anchor-heading"><a id="kraft_voter" class="anchor-link"></a><a href="#kraft_voter">Controllers</a></h5>

<p>In KRaft mode, only a small group of specially selected servers can act as controllers (unlike the ZooKeeper-based mode, where any server can become the Controller). The specially selected controller servers will participate in the metadata quorum. Each controller server is either active, or a hot standby for the current active controller server.</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion if this is technically correct for better flow.
In Kraft mode, specific servers are selected to be controllers (unlike ZK....). The servers selected to be controllers will participate in the metadata quorum. Each controller is either active or a hot standby for the current active controller.

Comment thread docs/ops.html Outdated

<p>All of the servers in a Kafka cluster discover the quorum voters using the <code>controller.quorum.voters</code> property. This identifies the quorum controller servers that should be used. All the controllers must be enumerated. Each controller is identified with their <code>id</code>, <code>host</code> and <code>port</code> information. This is an example configuration: <code>controller.quorum.voters=id1@host1:port1,id2@host2:port2,id3@host3:port3</code></p>

<p>If the Kafka cluster has 3 controllers named controller1, controller2 and controller3 then controller3 may have the following:</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller3 may have the following configuration: ??

Comment thread docs/ops.html Outdated

<h5 class="anchor-heading"><a id="kraft_metadata_tool" class="anchor-link"></a><a href="#kraft_metadata_tool">Metadata Quorum Tool</a></h5>

<p>The <code>kafka-metadata-quorum</code> tool can be used to describe the runtime state of the cluster metadata partition. For example, the following command display a summary of the metadata quorum:</p>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: display should be displays

Comment thread docs/ops.html Outdated

<ul>
<li>Kafka server's <code>process.role</code> should be set to either <code>broker</code> or <code>controller</code> but not both. Combined mode can be used in development enviroment but it should be avoided in critical deployment evironments.</li>
<li>For redundancy, a Kafka cluster should user 3 controllers. More than 3 servers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka.</li>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For redundancy, a Kafka cluster should have a minimum of 3 controllers. More than 3 servers is not recommended in critical environments. what? If you are saying less than three then strike this sentence. The change I have would cover this.
In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka What does this mean? Does it affect ZK too or is it a KRaft bug.

Comment thread docs/ops.html Outdated
Comment thread docs/ops.html Outdated
Comment thread docs/ops.html Outdated
Comment thread docs/ops.html Outdated
Comment thread docs/ops.html Outdated
Comment thread docs/ops.html Outdated
Comment thread docs/ops.html Outdated
Comment thread docs/ops.html Outdated

<ul>
<li>Kafka server's <code>process.role</code> should be set to either <code>broker</code> or <code>controller</code> but not both. Combined mode can be used in development enviroment but it should be avoided in critical deployment evironments.</li>
<li>For redundancy, a Kafka cluster should user 3 controllers. More than 3 servers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka.</li>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li>For redundancy, a Kafka cluster should user 3 controllers. More than 3 servers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future release of Kafka.</li>
<li>For redundancy, a Kafka cluster should have a minimum of 3 controllers. Less than 3 controllers is not recommended in critical environments. In the rare case of a partial network failure it is possible for the cluster metadata quorum to become unavailable. This limitation will be addresses in a future Kafka release.</li>

Copy link
Copy Markdown
Member Author

@jsancio jsancio Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only recommend 3 controllers. I think we need to implement Pre-vote before recommending using more than 3 controllers.

jsancio and others added 2 commits September 26, 2022 10:53
Co-authored-by: Chase Thomas <forlack@gmail.com>
@jsancio jsancio merged commit 4dec656 into apache:trunk Sep 26, 2022
@jsancio jsancio deleted the kafka-14207-kraft-ops-limitation branch September 26, 2022 18:19
jsancio added a commit that referenced this pull request Sep 26, 2022
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
guozhangwang pushed a commit to guozhangwang/kafka that referenced this pull request Jan 25, 2023
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
rutvijmehta-harness pushed a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>
rutvijmehta-harness added a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Chase Thomas <forlack@users.noreply.github.com>

Co-authored-by: José Armando García Sancio <jsancio@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants