OCPBUGS-4101: Do not allow empty system reserved values#3439
OCPBUGS-4101: Do not allow empty system reserved values#3439openshift-merge-robot merged 1 commit intoopenshift:masterfrom
Conversation
|
@harche: This pull request references Jira Issue OCPBUGS-4104, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira referesh |
|
/jira refresh |
|
@harche: This pull request references Jira Issue OCPBUGS-4104, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@harche: This pull request references Jira Issue OCPBUGS-4104, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@harche: This pull request references Jira Issue OCPBUGS-4101, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cc @yuqi-zhang |
|
/retest-required |
yuqi-zhang
left a comment
There was a problem hiding this comment.
lgtm, do you want to have Rio from the MCO QE test this? Or is there someone from the node team's side that can help verify?
From the node team I have verified it, so it would be better if MCO QE also verifies it. Thanks. |
|
@rioliu-rh is this something you would like to verify pre-merge? |
|
/hold They are manually overriding files internally used for auto node sizing using machine config. Then even if we handle this in MCO, their upgrade will still fail. I am not sure at this point if this should be even supported. Manually overriding internally used files are difficult to protect for external interference. |
|
I will spend some more on this to see if we can still fix it somehow for them. |
@yuqi-zhang let me try to verify it with cluster-bot image |
|
tried to verify this fix , but it does not work for me apply node sizing mc and kubelet config based on 4.11.16 successfully upgrade the cluster to cluster-bot built image. nodes are not ready from kubelet log I can see |
|
@rioliu-rh Thanks for testing, however I am not sure if the node sizing MC should be applied in the first place. That MC is modifying the files used by auto node sizing directly. Also, the system reserved values it is trying to set should be set using kubelet config. That's the only supported way. |
@harche So, I just need to apply below mc then upgrade the cluster to the image which contains this fix |
|
verified with following steps. @harche apply following mc mcp update is completed successfully upgrade cluster from 4.11.16 to the image contains this pr upgrade is success check file content of check file content of |
66eddc3 to
1eae18c
Compare
Signed-off-by: Harshal Patil <harpatil@redhat.com>
1eae18c to
bd24f17
Compare
|
@rioliu-rh I made the final script more resilient to the unexpected input. sh-4.4# /usr/local/sbin/dynamic-system-reserved-calc.sh true
sh-4.4# cat /etc/node-sizing.env
SYSTEM_RESERVED_MEMORY=3Gi
SYSTEM_RESERVED_CPU=0.08
SYSTEM_RESERVED_ES=1Gi
sh-4.4#
sh-4.4#
sh-4.4# /usr/local/sbin/dynamic-system-reserved-calc.sh false
sh-4.4# cat /etc/node-sizing.env
SYSTEM_RESERVED_MEMORY=1Gi
SYSTEM_RESERVED_CPU=500m
SYSTEM_RESERVED_ES=1Gi
sh-4.4#
sh-4.4#
sh-4.4# /usr/local/sbin/dynamic-system-reserved-calc.sh false 3Gi 1000m 1.5Gi
sh-4.4# cat /etc/node-sizing.env
SYSTEM_RESERVED_MEMORY=3Gi
SYSTEM_RESERVED_CPU=1000m
SYSTEM_RESERVED_ES=1.5Gi
sh-4.4#
sh-4.4# /usr/local/sbin/dynamic-system-reserved-calc.sh false 3Gi 1000m
sh-4.4# cat /etc/node-sizing.env
SYSTEM_RESERVED_MEMORY=3Gi
SYSTEM_RESERVED_CPU=1000m
SYSTEM_RESERVED_ES=1Gi
sh-4.4#
sh-4.4# /usr/local/sbin/dynamic-system-reserved-calc.sh false 3Gi
sh-4.4# cat /etc/node-sizing.env
SYSTEM_RESERVED_MEMORY=3Gi
SYSTEM_RESERVED_CPU=500m
SYSTEM_RESERVED_ES=1Gi
sh-4.4#
|
|
/hold cancel |
|
@harche: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@harche verified with latest code apply below machine config with empty value of check trigger upgrade upgrade is completed successfully check |
|
@harche is it expected, if yes, I will add label |
Sounds good. Please add the I also did additional testing with the actual script that was generating empty value for |
|
I am +1 and can apply lgtm once we are satisfied |
|
/label qe-approved |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: harche, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@harche: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-4101 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.12 |
|
@harche: new pull request created: #3453 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Harshal Patil harpatil@redhat.com
Fixes: https://issues.redhat.com/browse/OCPBUGS-4101
- What I did
Make sure none of the system reserved values end up empty and make kubelet crash during startup.
- How to verify it
Even after deploying that MC, the kubelet should start up and come up with default value of the ephemeral storage for system reserved (1Gi)
- Description for the changelog
Force default values of system reserved in case empty values were supplied.