[OCPCLOUD-1107] External cloud-provider support via FeatureGate in post-install#2386
Conversation
20321ce to
ac4d0dd
Compare
ac4d0dd to
f05c4d0
Compare
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
feel like the team/arch also needs to review the enhancement
/hold
|
Feels like this should leverage: #2352 via: https://github.com/openshift/api/blob/master/config/v1/types_feature.go#L121 |
f05c4d0 to
62fc109
Compare
There was a problem hiding this comment.
Does this not need to be in the switch so that we can enable the external provider platform by platform?
There was a problem hiding this comment.
Eventually, yes. In TechPreviewNoUpgrade it is on feature gate to set invariant configuration when found. I don't think anyone will risk setting such gate on production cluster for platform we don't have configs for. This just means less changes in development cycle, if we don't make it a switch.
There was a problem hiding this comment.
I had thought we would want this to only affect things on platforms we have allowed it to, hence being provider specific, let's discuss on the call next week
e4be2f8 to
655727b
Compare
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
@Danil-Grigorev I've gone through the enhancements, but am having problems connecting the dots here.
Could you add some detail/specific description as to what this PR is trying to accomplish, whether this PR is like step 1 of X etc... if this is day2 post-install is this user initiated? etc? Will there be any followup prs or are there prereq prs for this? etc
Thanks! :)
|
@kikisdeliveryservice Sure. The prerequisite PR will be an API change for FeatureGate in openshift/api#738 The name of the featureGate, it's location and related changes in functionality may be corrected based on new data from kubernetes/enhancements#2443. What it will result into - we will have a way to deploy a techPreview clusters with out-of-tree providers code, and attempt migration from in-tree in post install time, while identifying issues (CI use-case). It is user initiated. Ideal scenario - follow up PRs will only remove selections from platform switch case, say: AWS is now by default Actual change, which we will probably need - implement our selection (or more realistically wait for some helper upstream implementation) in MCO install phase, where the bootstrap methods are called. This way a fresh cluster will get out-of-tree providers by default with related |
b042fdd to
d36cbbb
Compare
|
Thanks for the summary @Danil-Grigorev ! |
c81a10b to
1631be6
Compare
|
/retest |
8fd0155 to
f96cd2f
Compare
- Add external-cloud-provider feature gate into render - Add exclusion list to featureGate selection: a list of openshift specific featureGates, which will not be passed to default kubelet config
f96cd2f to
4956c68
Compare
|
@JoelSpeed @kikisdeliveryservice All comments addressed. I checked, permission change happens on revendor, and it seems expected. Please review. |
Perhaps since this seems unrelated to the contents of this PR, should we just drop these two files from the commit? Would make the history cleaner and then the vendor discrepancy can be sorted at a later point? |
4956c68 to
39e07de
Compare
JoelSpeed
left a comment
There was a problem hiding this comment.
/lgtm
Thanks @Danil-Grigorev
|
just wanted to note that i have been test driving this patch for the last few days and it seems to be working for me. i have not given a thorough testing of the negative case (eg when the gate is not applied), but i have done a basic smoke test on that. |
|
/retest |
|
@Danil-Grigorev , i talked with @kikisdeliveryservice in slack and it sounds like we could help here by adding a must-gather for a cluster that has had the feature gate enabled as a day 2 operation. so, basically:
i think we are already capturing this behavior in the openshift-e2e-aws-ccm step. but since we need this PR to make it work, none of those jobs have been passing. @Danil-Grigorev does that make sense? |
|
just to follow-on the above, would this ever be disabled day2 @Danil-Grigorev ? |
|
@kikisdeliveryservice No, applying the feature gate would move cluster to TechPreview state, where we officially do not support rollback. Yet the code does not prevent it from happening and works both ways. |
|
@Danil-Grigorev: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
quick update here, we have shared a must-gather with @kikisdeliveryservice |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Reviewed must gather and verified:
we started off with --cloud-provider=aws
and
successfully switched to --cloud-provider=external
Based on my understanding from: https://github.com/openshift/enhancements/pull/463/files#diff-3e0e2c48e70215076dfe36c13768a823ab7080d929d80292f37db2ef5a2121e8R270 also saw: ExternalCloudProvider: true alongside --cloud-provider=aws which helps force upgrade from intree to out of tree.
Since this has also been extensively tested by that team, let's merge this to unblock work and move forward.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Danil-Grigorev, JoelSpeed, kikisdeliveryservice, rphillips The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
|
/test e2e-agnostic-upgrade |
- What I did
Implementing external cloud provider selection forced by feature gate. Described in the proposal: https://github.com/openshift/enhancements/pull/463/files#diff-3e0e2c48e70215076dfe36c13768a823ab7080d929d80292f37db2ef5a2121e8R201
This PR is main blocker on other work with integrating CCM in Openshift: https://issues.redhat.com/browse/OCPCLOUD-1107
- How to verify it
- Description for the changelog