Skip to content

ci/installer-5327 : aws-ec2 - change instance to m6i#23031

Closed
mtulio wants to merge 3 commits intoopenshift:masterfrom
mtulio:fix-installer-5327
Closed

ci/installer-5327 : aws-ec2 - change instance to m6i#23031
mtulio wants to merge 3 commits intoopenshift:masterfrom
mtulio:fix-installer-5327

Conversation

@mtulio
Copy link
Copy Markdown
Contributor

@mtulio mtulio commented Oct 26, 2021

fixes for installer PR openshift/installer#5327

The idea is to choose the preferred instance type (in general newer gen) when it's available on the equal or higher number of zones on a given region, otherwise, the backup will be chosen.

/cc @staebler

Comment thread ci-operator/step-registry/ipi/conf/aws/ipi-conf-aws-commands.sh Outdated
@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented Oct 28, 2021

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2021
@mtulio mtulio force-pushed the fix-installer-5327 branch from e8b7e5c to dcbbf02 Compare October 29, 2021 03:28
@rvanderp3
Copy link
Copy Markdown
Contributor

/retest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice this probably ends up the same, but I would have expected that the preferred family be used as long as there was at least one zone (or perhaps 2 zones) offering the preferred family.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc, I am not sure if it will ends up the same, it will help to select the correct instance on the region which does not support the preferred one (see the original PR for the table of availability):

output of $ cat install-config.yaml |yq -r '. | (.platform.aws, .controlPlane.platform.aws)',

for LEASED_RESOURCE=sa-east-1:

{
  "region": "sa-east-1",
  "userTags": {
    "expirationDate": "2021-12-01T22:34+00:00"
  }
}
{
  "zones": [],
  "type": "m5.2xlarge"
}

for LEASED_RESOURCE=us-east-1:

{
  "region": "us-east-1",
  "userTags": {
    "expirationDate": "2021-12-01T22:35+00:00"
  }
}
{
  "zones": [],
  "type": "m6i.2xlarge"
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I would have expected that the preferred family be used as long as there was at least one zone (or perhaps 2 zones)

if it's not a problem, we could change to something like:

if [[ ${zones_cnt_pref} -ge 1 ]]; then

wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating the latest availability:

                count(m5.xlarge)  count(m6i.xlarge)  diff(m5_m6i)
region                                                           
af-south-1                     3                  0             3
ap-east-1                      3                  0             3
ap-northeast-1                 3                  2             1
ap-northeast-2                 4                  3             1
ap-northeast-3                 3                  0             3
ap-south-1                     3                  3             0
ap-southeast-1                 3                  3             0
ap-southeast-2                 3                  2             1
ca-central-1                   3                  0             3
eu-central-1                   3                  3             0
eu-north-1                     3                  0             3
eu-south-1                     3                  0             3
eu-west-1                      3                  3             0
eu-west-2                      3                  0             3
eu-west-3                      3                  3             0
me-south-1                     3                  0             3
sa-east-1                      3                  2             1
us-east-1                      5                  5             0
us-east-2                      3                  3             0
us-west-1                      2                  2             0
us-west-2                      6                  4             2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not. I just removed it, and replace the ${master_type} to ${MASTER_TYPE} used in this script. 👍🏼

Comment on lines 53 to 54
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not take into account whether AZs are actually available or whether the AZs were specified already in the install-config.yaml.

Copy link
Copy Markdown
Contributor Author

@mtulio mtulio Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. I just fixed it:

AWS just change the availability of sa-east-1, now I am changing the region to af-south-1 will get the expected results

$ yq .controlPlane.platform.aws install-config.yaml
{
  "zones": [
    "af-south-1a",
    "af-south-1b"
  ]
}

$ export LEASED_RESOURCE=af-south-1
$ bash  ipi-conf-aws-commands.sh
zones already set in install-config.yaml, skipped

$ yq .controlPlane.platform.aws install-config.yaml
{
  "zones": [
    "af-south-1a",
    "af-south-1b"
  ],
  "type": "m5.xlarge"
}

$ export LEASED_RESOURCE=us-east-1
$ cp install-config.yaml-use1 install-config.yaml
$ yq .controlPlane.platform.aws install-config.yaml
{
  "zones": [
    "us-east-1a",
    "us-east-1b"
  ]
}

$ bash  ipi-conf-aws-commands.sh
zones already set in install-config.yaml, skipped

$ yq .controlPlane.platform.aws install-config.yaml
{
  "zones": [
    "us-east-1a",
    "us-east-1b"
  ],
  "type": "m6i.xlarge"
}

@mtulio mtulio force-pushed the fix-installer-5327 branch from dcbbf02 to 64c47e6 Compare December 1, 2021 19:49
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2021
@mtulio mtulio force-pushed the fix-installer-5327 branch from 64c47e6 to 6b2e3f3 Compare December 1, 2021 19:57
@openshift-ci openshift-ci Bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 1, 2021
@mtulio mtulio force-pushed the fix-installer-5327 branch 4 times, most recently from 85e2ade to 44691fd Compare December 1, 2021 20:59
@mtulio mtulio marked this pull request as draft December 1, 2021 21:03
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2021
@mtulio mtulio force-pushed the fix-installer-5327 branch from 44691fd to 8bd275b Compare December 1, 2021 21:15
@mtulio mtulio marked this pull request as ready for review December 1, 2021 21:22
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2021
@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented Dec 1, 2021

@staebler ptal?

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented Dec 16, 2021

/retest-required

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented Jan 11, 2022

/unhold
/retest-required

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2022
| wc -l)
fi

if [[ ${zones_cnt_pref} -ge ${zones_cnt_backup} ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you switch this logic to check if there are enough zones for the preferred or backup type, then make sure that the zones with availability matches exactly when the zones are explicitly specified in the install config.

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues in openshift/release go stale after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 15d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 31, 2022
@openshift-bot
Copy link
Copy Markdown
Contributor

Stale issue in openshift/release rot after 15d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 15d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 15, 2022
@openshift-bot
Copy link
Copy Markdown
Contributor

Rotten issues in openshift/release close after 15d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci Bot closed this May 11, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2022

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues in openshift/release close after 15d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 11, 2022

I need to raise the priority on this. Patric, I will work on this next week. Please let me know if you also can see something else to change or any other suggestions.
/cc @patrickdillon
/reopen

@openshift-ci openshift-ci Bot requested a review from patrickdillon May 11, 2022 17:24
@openshift-ci openshift-ci Bot reopened this May 11, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2022

@mtulio: Reopened this PR.

Details

In response to this:

I need to raise the priority on this. Patric, I will work on this next week. Please let me know if you also can see something else to change or any other suggestion.
/cc @patrickdillon
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented May 11, 2022

/remove-lifecycle rotten
/lifecycle frozen
/hold

@openshift-ci openshift-ci Bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 11, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2022

@mtulio: The lifecycle/frozen label cannot be applied to Pull Requests.

Details

In response to this:

/remove-lifecycle rotten
/lifecycle frozen
/hold

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mtulio
To complete the pull request process, please assign vrutkovs after the PR has been reviewed.
You can assign the PR to them by writing /assign @vrutkovs in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@patrickdillon
Copy link
Copy Markdown
Contributor

I need to raise the priority on this. Patric, I will work on this next week. Please let me know if you also can see something else to change or any other suggestions. /cc @patrickdillon /reopen

Sure Marco, but I think I need to get caught up. I don't understand the goal here. It looks like you are developing logic to choose the best instance type but what good does that do to put it in ci? Should this logic go in the installer?

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 1, 2022

@mtulio: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 1, 2022

@mtulio: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/openshift-controller-manager/release-4.10/openshift-e2e-aws-builds-techpreview dcbbf02d58ddfaac8a827c2f0a07dd162a2baabc link unknown /test pj-rehearse
ci/rehearse/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade dcbbf02d58ddfaac8a827c2f0a07dd162a2baabc link unknown /test pj-rehearse
ci/rehearse/redhat-developer/jenkins-operator/main/e2e 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/origin/release-4.5/e2e-aws-csi 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/origin/release-4.1/e2e-aws-builds 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/origin/release-4.1/e2e-aws-image-ecosystem 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/cluster-kube-controller-manager-operator/release-4.11/e2e-aws-ccm-install 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/cluster-kube-controller-manager-operator/release-4.11/e2e-aws-ccm 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/csi-driver-shared-resource/release-4.11/e2e-aws-csi-driver 8bd275b link unknown /test pj-rehearse
ci/rehearse/kubevirt/hyperconverged-cluster-operator/release-4.11/hco-e2e-image-index-sno-aws 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/openshift-controller-manager/release-4.11/openshift-e2e-aws-builds-techpreview 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/installer/release-4.9/e2e-aws-shared-vpc 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/windows-machine-config-operator/release-4.11/aws-e2e-ccm-install 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/origin/release-4.9/e2e-aws-image-registry 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/origin/release-4.9/e2e-aws-jenkins 8bd275b link unknown /test pj-rehearse
ci/rehearse/tnozicka/openshift-acme/master/e2e-cluster-wide 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/cluster-logging-operator/tech-preview/e2e-operator 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/origin/release-4.2/e2e-cmd 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/aws-efs-csi-driver-operator/release-4.9/operator-e2e 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/ovn-kubernetes/release-4.9/e2e-ovn-hybrid-step-registry 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/cloud-credential-operator/release-4.9/e2e-aws-manual-oidc 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/kubernetes/release-4.9/configmap-scale 8bd275b link unknown /test pj-rehearse
ci/rehearse/operator-framework/operator-marketplace/release-4.9/e2e-aws-serial 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/origin/release-4.9/e2e-aws-disruptive 8bd275b link unknown /test pj-rehearse
ci/rehearse/eclipse-che/che-operator/main/v8-che-behind-proxy 8bd275b link unknown /test pj-rehearse
ci/prow/pj-rehearse 8bd275b link false /test pj-rehearse
ci/rehearse/openshift/openshift-tests-private/master/debug-disasterrecovery-aws-ipi 8bd275b link unknown /test pj-rehearse
ci/rehearse/kubevirt/hyperconverged-cluster-operator/main/hco-e2e-image-index-sno-aws 8bd275b link unknown /test pj-rehearse
ci/rehearse/backube/volsync/main/e2e-openshift 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/csi-external-provisioner/release-4.5/e2e-aws-csi 8bd275b link unknown /test pj-rehearse
ci/rehearse/openshift/sdn/master/e2e-aws-multitenant 8bd275b link unknown /test pj-rehearse
ci/prow/check-gh-automation 8bd275b link true /test check-gh-automation
ci/prow/release-controller-config 8bd275b link true /test release-controller-config
ci/prow/jira-lifecycle-config 8bd275b link true /test jira-lifecycle-config

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@droslean
Copy link
Copy Markdown
Member

@mtulio What is the status of this PR?

@droslean
Copy link
Copy Markdown
Member

/close

@openshift-ci openshift-ci Bot closed this Jul 25, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 25, 2022

@droslean: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented Jul 25, 2022

Sure Marco, but I think I need to get caught up. I don't understand the goal here. It looks like you are developing logic to choose the best instance type but what good does that do to put it in ci? Should this logic go in the installer?

@patrickdillon :
This is a follow-up of PR openshift/installer#5327, I am also not sure if there is a good place to implement that logic or if we just need to change the arch_instance_type=m5 to arch_instance_type=m6i.

The goal in this PR is to simulate the installer behavior when the m6i is not available in all zones within the region - but tbh it seems to be very specific and, as you can see, is becoming very complex to be implemented here.

@mtulio
Copy link
Copy Markdown
Contributor Author

mtulio commented Jul 25, 2022

@mtulio What is the status of this PR?

@droslean sorry I was out since your last message. Let's keep it closed if Patrick agrees with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants