Skip to content

Conversation

@guillaumerose
Copy link
Contributor

@guillaumerose guillaumerose commented Jul 9, 2020

Implementation of openshift/enhancements#200

This adds a CLUSTER_PROFILE env variable that is used to determine which manifests to apply based on the include.release.openshift.io/ annotation.

It is exactly what is written in the enhancement, not more.

This env. variable is not used by any components and, as it is empty, it defaults to the value of the default profile self-managed-high-availability.

@guillaumerose guillaumerose changed the title PoC: add cluster profile support Add cluster profile support Jul 21, 2020
@vrutkovs
Copy link

AWS failed to bootstrap

/retest

@guillaumerose
Copy link
Contributor Author

@vrutkovs yes it is because I didn't solve the issue of "where we should store the profile". Currently, the CVO expects to find it in a ConfigMap and it is still created by hand. Do you think it's a good idea to read it from a ConfigMap ? There is a chicken and egg issue with this: if the ConfigMap is not present, the CVO cannot run but the ns openshift-config is created by the CVO.

@vrutkovs
Copy link

CVO should fall back to default profile (so that upgrade would succeed) if the configmap is not present

@guillaumerose
Copy link
Contributor Author

guillaumerose commented Jul 22, 2020

OK, so I will change this PR to:

  • load the ConfigMap as soon as the CVO starts.
  • if the ConfigMap doesn't exist, use the default profile.

It will be the responsibility of the installer to have the ConfigMap in place before the CVO runs for the first time. Does it sound good ?

@guillaumerose
Copy link
Contributor Author

guillaumerose commented Jul 22, 2020

I experimented a bit more and found an interesting problem by making optional the ConfigMap.

During the boostrap, the CVO is created via a bootstrap-manifest and it runs before manifests are created (located in /manifests - even before the first one 0000_00_cluster-version-operator_00_namespace.yaml). It could not find the ConfigMap and therefore load the default profile.
Couple of minutes later, bootstrap ends, the ConfigMap is in place, the CVO reboot in the master and it's fine. But there was a period of time where the profile was not the good one.

Any thoughts on this ? How can we make the CVO waiting for the ConfigMap but still having the choice to not have it ?
Isn't it a problem to have the CVO in the bootstrap node not running the right profile ?

Perhaps if we use the ClusterVersion type to hold the profile, we will have the ConfigMap always defined.

@vrutkovs
Copy link

But there was a period of time where the profile was not the good one.

I'm not sure I'm following - why installed profile is incorrect? I assumed that no profile specified == "default" profile. If its not set then all operators (unless explicitly excluded) are enabled

@guillaumerose
Copy link
Contributor Author

I wanted to change the behavior to load once the profile at startup but it doesn't work. CVO is launched before manifests are created. At start the ClusterVersion object doesn't exist and locks the CVO.

To work correctly with a profile, the user have to ensure the ConfigMap is created before the ClusterVersion. For snc, I did something like:

openshift-install create manifests

cp myns.yaml ${INSTALL_DIR}/openshift/0000_00_cluster-version-operator_00_myns.yaml
cp myconfigmap.yaml ${INSTALL_DIR}/openshift/0000_00_cluster-version-operator_00_myconfigmap.yaml

openshift-install create cluster

@guillaumerose
Copy link
Contributor Author

/retest

2 similar comments
@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@jottofar
Copy link
Contributor

/test e2e-gcp-upgrade

@jottofar
Copy link
Contributor

/retest

@openshift-ci-robot
Copy link
Contributor

@jottofar: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test e2e
  • /test e2e-upgrade
  • /test images
  • /test integration
  • /test unit

Use /test all to run all jobs.

Details

In response to this:

/test e2e-gcp-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 27, 2020
@guillaumerose
Copy link
Contributor Author

/retest

Copy link
Contributor

@csrwng csrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, just a couple of comments

}

type ClusterProfileRetriever interface {
RetrieveClusterProfile(ctx context.Context) string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should return an error in this function. It's probably safer to fail in the case you can't read the profile than to assume "default"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, you can check the error for IsNotFound and return default... however other errors probably indicate an api server error and you should probably report.

Copy link
Contributor Author

@guillaumerose guillaumerose Jul 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right, it's better now.
We can see in the log now: I0727 15:08:48.205731 1 updatepayload.go:336] Cannot get ConfigMap openshift-config/cluster-profile, using default profile: configmap "cluster-profile" not found

func (r *clusterProfileRetriever) RetrieveClusterProfile(_ context.Context) string {
configMap, err := r.cmConfigLister.Get(internal.ClusterProfileConfigMap)
if err != nil {
klog.Infof("cannot get configmap openshift-config/cluster-profile, using default profile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment for the interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

@guillaumerose guillaumerose force-pushed the profilepoc branch 2 times, most recently from 494461c to a84ec7a Compare July 27, 2020 14:27
@LalatenduMohanty
Copy link
Member

/test unit

@wking
Copy link
Member

wking commented Dec 16, 2020

/hold

Blocked by operator-framework/operator-lifecycle-manager#1887 to label the InstallPlans CRD manifest (among other things) to avoid cannot list resource "installplans" in API group "operators.coreos.com" in the namespace "openshift".

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2020
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 16, 2020
@guillaumerose
Copy link
Contributor Author

/retest

8 similar comments
@rkukura
Copy link

rkukura commented Jan 4, 2021

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2021
@wking
Copy link
Member

wking commented Jan 11, 2021

/hold cancel

I'd held this for operator-framework/operator-lifecycle-manager#1887, which has since landed.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2021
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guillaumerose, LalatenduMohanty

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 9f50a07 into openshift:master Jan 11, 2021
@openshift-ci-robot
Copy link
Contributor

@guillaumerose: All pull requests linked via external trackers have merged:

Bugzilla bug 1907329 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1907329: Add cluster profile support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.