Skip to content

CORS-4208: set default KUBELET_NODE_IPS for dualstack nodes#5384

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
tthvo:CORS-4208
Jan 16, 2026
Merged

CORS-4208: set default KUBELET_NODE_IPS for dualstack nodes#5384
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
tthvo:CORS-4208

Conversation

@tthvo
Copy link
Copy Markdown
Member

@tthvo tthvo commented Oct 31, 2025

- What I did

This updates the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS.

  • DualStack: default to 0.0.0.0
  • DualStackIPv6Primary: default to ::

This set the --node-ip (i.e. to 0.0.0.0 or ::) kubelet argument when enabling dualstack support on cloud providers, where node ip is not beforehand.

- Why I did

When investigating failures related to dual-stack support on AWS, I noticed kubelet ran without the --node-ip=<any-id> argument. As a result, CNI never came online, while complaining that node was missing the InternalIP address. For example, results from a failed attempt returned the following errors:

Component Failed log
ovnkube-controller container F0903 17:41:46.149835 5622 ovnkube.go:138] failed to run ovnkube: [failed to start network controller: failed to start default network controller - while waiting for any node to have zone: "i-041d879bce674db11.ec2.internal", error: context canceled, failed to start node network controller: failed to init default node network controller: i-041d879bce674db11.ec2.internal doesn't have an address with type InternalIP or ExternalIP]
kubelet Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?"
kube-rbac-proxy-crio container failed to initialize certificate reloader: error loading certificates: error loading certificate: open /var/lib/kubelet/pki/kubelet-server-current.pem: no such file or directory

After some research and trial, I determined that the kubelet --node-ip is necessary. It must be set to 0.0.0.0 or :: (ipv6-primary) in case of dualstack. After ensuring the argument is set, node was assigned InternalIP address and CNI progressed successfully.

- How to verify it

Tested with openshift/installer#9930. Alternatively, the installer can lay down a environment file to set the env var (for example, openshift/installer@9fa264d), but I think it seems quite hacky 😞

- Description for the changelog

Update the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS (i.e. 0.0.0.0 for Dualstack and :: for DualStackIPv6Primary)

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 31, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Oct 31, 2025

@tthvo: This pull request references CORS-4208 which is a valid jira issue.

Details

In response to this:

- What I did

This updates the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS.

  • DualStack: default to 0.0.0.0
  • DualStackIPv6Primary: default to ::

This set the --node-ip (i.e. to 0.0.0.0 or ::) kubelet argument when enabling dualstack support on cloud providers, where node ip is not beforehand.

- How to verify it

Tested with openshift/installer#9930. Alternatively, the installer can lay down a environment file to set the env var, but I think it is quite hacky.

- Description for the changelog

Update the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS (i.e. 0.0.0.0 for Dualstack and :: for DualStackIPv6Primary)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Oct 31, 2025

/cc @sadasu @patrickdillon

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Oct 31, 2025

@tthvo: This pull request references CORS-4208 which is a valid jira issue.

Details

In response to this:

- What I did

This updates the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS.

  • DualStack: default to 0.0.0.0
  • DualStackIPv6Primary: default to ::

This set the --node-ip (i.e. to 0.0.0.0 or ::) kubelet argument when enabling dualstack support on cloud providers, where node ip is not beforehand.

- Why I did

When investigating failures related to dual-stack support on AWS, I noticed kubelet ran without the --node-ip=<any-id> argument. As a result, CNI never came online, while complaining that node was missing the InternalIP address. For example, results from a failed attempt returned the following errors:

Component Failed log
ovnkube-controller container F0903 17:41:46.149835 5622 ovnkube.go:138] failed to run ovnkube: [failed to start network controller: failed to start default network controller - while waiting for any node to have zone: "i-041d879bce674db11.ec2.internal", error: context canceled, failed to start node network controller: failed to init default node network controller: i-041d879bce674db11.ec2.internal doesn't have an address with type InternalIP or ExternalIP]
kubelet Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?"
kube-rbac-proxy-crio container failed to initialize certificate reloader: error loading certificates: error loading certificate: open /var/lib/kubelet/pki/kubelet-server-current.pem: no such file or directory

After some research and trial, I determined that the kubelet --node-ip is necessary. It must be set to 0.0.0.0 or :: (ipv6-primary) in case of dualstack. After ensuring the argument is set, node was assigned InternalIP address and CNI progressed successfully.

- How to verify it

Tested with openshift/installer#9930. Alternatively, the installer can lay down a environment file to set the env var (for example, openshift/installer@9fa264d), but I think it seems quite hacky 😞

- Description for the changelog

Update the master and worker kubelet service templates to set the defaults KUBELET_NODE_IPS (i.e. 0.0.0.0 for Dualstack and :: for DualStackIPv6Primary)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Oct 31, 2025

Not sure if I am doing the right thing 😓 , but with openshift/installer#9930, this change worked as expected. This was only tested with AWS.

Pending confirmations for other platforms 👀 PTAL 🙏

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Nov 3, 2025

/retest

@sadasu
Copy link
Copy Markdown
Contributor

sadasu commented Nov 5, 2025

/cc @cybertron and @mkowalski Could you PTAL ? This is required for adding DualStack support for AWS and Azure.

@cybertron
Copy link
Copy Markdown
Member

While it feels a little weird to set KUBELET_NODE_IPS to a single value, since all we're really doing is telling it whether to prefer v4 or v6 I think this should be okay. Also worth noting that for the on-prem platforms we override these values anyway so it shouldn't affect us. Just to be sure though:

/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6

Comment thread templates/master/01-master-kubelet/_base/units/kubelet.service.yaml
@sadasu
Copy link
Copy Markdown
Contributor

sadasu commented Nov 6, 2025

Are the changes to the on-prem files done to maintain consistency? Test prove that the changes are fine. I am leaning towards not making any changes to on-prem files even if harmless.

This updates the master and worker kubelet service templates to set the
defaults KUBELET_NODE_IPS.
- DualStack: default to "0.0.0.0"
- DualStackIPv6Primary: default to "::"
@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Nov 6, 2025

Are the changes to the on-prem files done to maintain consistency? Test prove that the changes are fine. I am leaning towards not making any changes to on-prem files even if harmless.

Right, it was done for consistency. Thus, I removed the changes for on-prem unit files now as suggested 👍

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Nov 6, 2025

Thanks everyone for the reviews and insights! I addressed the comments just now. PTAL again 🙏

@patrickdillon
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2025
@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Dec 3, 2025

/verified by @tthvo

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Dec 3, 2025
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@tthvo: This PR has been marked as verified by @tthvo.

Details

In response to this:

/verified by @tthvo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi
Copy link
Copy Markdown
Member

/lgtm
/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, patrickdillon, tthvo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD c8a6c4f and 2 for PR HEAD 07ae9ad in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 16, 2026

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack a113f54 link false /test e2e-openstack
ci/prow/bootstrap-unit 07ae9ad link false /test bootstrap-unit
ci/prow/okd-scos-e2e-aws-ovn 07ae9ad link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Jan 16, 2026

/retest-required

@openshift-merge-bot openshift-merge-bot Bot merged commit c20bcda into openshift:main Jan 16, 2026
12 of 15 checks passed
@tthvo tthvo deleted the CORS-4208 branch January 16, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants