Skip to content

Bug 1774465: aws: pick instance types based on selected availability zones#3051

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
abhinavdahiya:pick_instance_type
Feb 6, 2020
Merged

Bug 1774465: aws: pick instance types based on selected availability zones#3051
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
abhinavdahiya:pick_instance_type

Conversation

@abhinavdahiya
Copy link
Copy Markdown
Contributor

Using new AWS API DescribeInstanceTypeOfferings 1 we can now check what instance types are available for specific region/AZs, which allows us to check which instance types are available in the selected AZs.
Since we default to m4 for most regions and only override to m5 for certain, in cases where new AZs are added to older regions like sa-east-1, see BZ 1774465 2, we end up picking m4 when m5 is present in all the chosen AZs.

Now we can define a list in descreasing priority order for instance classes for a region and the installer can try to pick default based on what works for all the selected AZs. We still do not support heterogenous setup which different instance types in different regions, but this
solves the case when newer m5 needs to be used in certain regions based on the AZ selection.

The default order for instance classes stays m4 > m5.

/cc @wking @cuppett @sdodson

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@abhinavdahiya: This pull request references Bugzilla bug 1774465, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

bug 1774465: aws: pick instance types based on selected availability zones

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 4, 2020
Comment thread pkg/asset/machines/master.go Outdated
@abhinavdahiya abhinavdahiya force-pushed the pick_instance_type branch 2 times, most recently from de04fff to 8a2d4eb Compare February 4, 2020 17:47
@sdodson
Copy link
Copy Markdown
Member

sdodson commented Feb 4, 2020

/approve

Do you think we need to provide this priority mapping information in product documentation?

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2020
@abhinavdahiya
Copy link
Copy Markdown
Contributor Author

/approve

Do you think we need to provide this priority mapping information in product documentation?

@sdodson not really, they are internal defaults when the user doesn't provide them. So i don't think these need to be documented in product docs...

@cuppett
Copy link
Copy Markdown
Member

cuppett commented Feb 5, 2020

At some point, we'll need to or should default to m5. Do we have a timeline we'll swap or reason we won't to (or potentially a later one like m6/m7, etc.)?

@abhinavdahiya
Copy link
Copy Markdown
Contributor Author

At some point, we'll need to or should default to m5. Do we have a timeline we'll swap or reason we won't to (or potentially a later one like m6/m7, etc.)?

@cuppett some of the reasoning for not chaining the defaults to m5 completely are https://bugzilla.redhat.com/show_bug.cgi?id=1710981#c13

@jhixson74
Copy link
Copy Markdown
Member

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhixson74, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking wking changed the title bug 1774465: aws: pick instance types based on selected availability zones Bug 1774465: aws: pick instance types based on selected availability zones Feb 5, 2020
Comment thread pkg/asset/machines/aws/instance_types.go Outdated
@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2020
Comment thread pkg/asset/machines/aws/instance_types.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice to show the constraints to help folks pick a reasonable instance type. Something like:

missing := []string{}
for _, t := range types {
  zoneDiff := reqZones.Difference(sets.NewString(found[t]...))
  if zoneDiff.Len() == 0 {
    return t, nil
  }
  missing = append(missing, fmt.Sprintf("%s not found in %v", t, strings.Join(zoneDiff.List(), ", ")))
}
return types[0], errors.Errorf("no instance type found for the zone constraint (%s)", strings.Join(missing, "; "))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna keep for future work.

Comment thread pkg/asset/machines/master.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why we want this to be a soft warning, instead of a fatal error. Can we make it a fatal error, because a failure to find a default instance type can be worked around by users setting explicit types in their install-config.yaml?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user doesn't have access to run this command, or it's not supported in all the regions or we are rate limited, i don't want the install to fail as this stage, 4.4 feature complete

I think for now the current behavior of use the defaults when we can't dynamically verify it is good enough. we can revisit the requirement in next feature cycle.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2020
@abhinavdahiya
Copy link
Copy Markdown
Contributor Author

going to cancel the hold as nits where fixed.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2020
@sdodson
Copy link
Copy Markdown
Member

sdodson commented Feb 5, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2020
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Using new AWS API DescribeInstanceTypeOfferings [1] we can now check what instance types are available for specific region/AZs, which allows us to check which instance types are available in the selected AZs.
Since we default to m4 for most regions and only override to m5 for certain, in cases where new AZs are added to older regions like sa-east-1, see BZ 1774465 [2], we end up picking m4 when m5 is present in all the chosen AZs.

Now we can define a list in descreasing priority order for instance classes for a region and the installer can try to pick default based on what works for all the selected AZs. We still do not support heterogenous setup which different instance types in different regions, but this
solves the case when newer m5 needs to be used in certain regions based on the AZ selection.

The default order for instance classes stays m4 > m5.

[1]: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypeOfferings.html
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1774465
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2020
@wking
Copy link
Copy Markdown
Member

wking commented Feb 5, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2020
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Copy Markdown
Member

wking commented Feb 6, 2020

update was having trouble with etcdserver: request timed out. Not sure how that could be related to this PR. CI-search shows it occurring in 15% of the past 24h of upgrade CI, which also suggests it is broader than this PR.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 8285bc5 into openshift:master Feb 6, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@abhinavdahiya: All pull requests linked via external trackers have merged. Bugzilla bug 1774465 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1774465: aws: pick instance types based on selected availability zones

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@abhinavdahiya: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-libvirt d45e881 link /test e2e-libvirt
ci/prow/e2e-aws-scaleup-rhel7 d45e881 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -24,8 +24,18 @@ func SetPlatformDefaults(p *aws.Platform) {
// region. We prefer m4 if available (more EBS volumes per node) but will use
// m5 in regions that don't have m4.
func InstanceClass(region string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhinavdahiya With InstanceClasses function added, I am wondering why keeping InstanceClass function around.

wking added a commit to wking/openshift-installer that referenced this pull request Jul 7, 2020
The guts were removed by 805a108 (platformtests: drop aws as no
longer required, 2020-03-11, openshift#3277), citing d45e881 (aws: pick
instance types based on selected availability zones, 2020-02-03, openshift#3051).
No need to keep the useless directory around now that it holds no
code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants