machines: Set defaults for machine instance types#5841
machines: Set defaults for machine instance types#5841openshift-merge-robot merged 1 commit intoopenshift:masterfrom
Conversation
patrickdillon
left a comment
There was a problem hiding this comment.
Nice. So it looks like you identified aws, gcp, nutanix, and vsphere. gcp I think is close enough so let's leave it alone.
For vsphere and nutanix let's get owner opinions: @thunderboltsid, @rvanderp3, @jcpowermac. I suspect resources might be tighter on these platforms.
For context, it has been pointed out to us that aws has fewer resources on compute nodes that gcp/azure. While we're fixing aws we're also checking other platforms.
500e9d3 to
cf9e527
Compare
|
@rna-afk I saw that the aws default instance type was set to |
|
We need a lot of throughput for all our operations on the cluster. The general guideline is to provide a premium disk type for the control-plane nodes and any for the compute nodes but we set the default to premium for better performance. I think gp3 for the root volume type of the VMs is fine. |
There was a problem hiding this comment.
This implies 16 CPU units (CPUs x Cores per CPU). Is that what we want? I actually have an open PR that was reducing these even further.. For 4 CPU units, I'd recommend setting NumCoresPerSocket to 1.
There was a problem hiding this comment.
I just moved the code from [1] and the minimum requirement is 2 and 2 but the vsphere folks set it to 4 and 4 as part of the performance recommendation for vsphere. I'll remove the nutanix change though to let you change it to the ideal value, thanks!
There was a problem hiding this comment.
Ah I think I understand where I made a mistake. I'll remove those changes then. Thanks!
There was a problem hiding this comment.
@rna-afk I can also take care of bumping up the RAM to 16GB in my PR.
2be3979 to
70c6afe
Compare
In CI we have to up to this value anyway to pass jobs. So lgtm to me. We should update documentation right? |
Yeah this is a good point. This may need more docs than the release note I originally mentioned. Although when I'm looking at the docs: It looks like AWS/GCP don't mention memory in those sections, so it doesn't look like they need updates. Not sure if anywhere else would need to be updated for AWS/GCP. @rna-afk can you take a quick look? |
|
/approve Updating aws, gcp, & vsphere looks good to me. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This actually depends on if we want to increase our minimum requirements spec. Other than this I don't see any other docs change. |
|
@patrickdillon @rna-afk Please let @bscott-rh and I know if doc updates beyond the release note are required and to which platforms. Based on comments, not sure which platforms this PR will ultimately land on. |
|
@rna-afk it appears that we are changing the CPU and memory defaults just for AWS, GCP and vSphere. If that is the case, could you please update the commit message to reflect that (and not all platforms)? Also, #5841 (comment) needs to updated to reflect that too. I'll defer to the other reviewers with more knowledge of the platforms to decide if the defaults are the right ones. |
70c6afe to
40dbc3c
Compare
No, this wouldn't change minimum requirements. |
|
/lgtm @rna-afk would you add a note in the jira card for the docs team that says what the new default instance type is for each platform updated in this pr is? |
|
/hold |
There was a problem hiding this comment.
Don't we need to bump CPUs too?
/cc @jcpowermac
|
/hold cancel |
40dbc3c to
daeb04b
Compare
There was a problem hiding this comment.
Above it is 4 NumCoresPerSocket and here it is 1. Which is right?
There was a problem hiding this comment.
4 cpus and 1 cores per socket is right. I'll fix it
There was a problem hiding this comment.
Changed it to set masters to 4 vcpus and 4 numcores and workers to 4 vcpus and 1 numcores
Setting compute nodes to use the instance types that provide 4 VPUs and 16 GB RAM for AWS, GCP and VSphere.
daeb04b to
16f2827
Compare
|
/lgtm We could update https://github.com/openshift/release/blob/master/ci-operator/step-registry/ipi/conf/vsphere/ipi-conf-vsphere-commands.sh to no longer explicitly set the parameters that are being updated here |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@patrickdillon sure, good idea. I'll follow up on that. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@rna-afk: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
Virtual machines running on vSphere should be configured for cores over sockets for NUMA. The change in openshift#5841 missed that. Changing vSphere back to using a single socket and multiple cores.
started a PR to address this: openshift/release#29234 |
Setting all compute nodes to use the instance types that provide
4 VPUs and 16 GB RAM for AWS, GCP and VSphere platforms.