Merge layering into master by cgwalters · Pull Request #3072 · openshift/machine-config-operator

cgwalters · 2022-04-07T18:01:03Z

Let's keep this PR as a running update. Goal: evaluate specific risky areas of code that might block a merge to master.

…ering-test-package adds e2e-layering test package

…layering-branch sync layering with master

Part of openshift/enhancements#1032 We'll add the new-format image into the payload alongside the old one until we can complete the transition. (There may actually be a separate `rhel-coreos-extensions` image e.g. too, so this is just the start) Note this PR is just laying groundwork; the new format container will not be used by default.

Add `rhel-coreos` image in configmap and `oscontainer` in controllerconfig

This will keep layered and non-layered update logging consistent

Merge master into layering

To properly handle compression. Prep for using butane.

Prep for using Butane APIs more directly as part of the layering work. The logic is also a bit reworked to generate a single Butane fragment which we convert to Ignition in one go, instead of converting butane into ignition repeatedly and using config merging. There's only one wrinkle with doing this, which is that today in the templates we have multiple files which are drop-ins for `crio.service`; and we need to group those together. (I think it would be cleaner to have them in a single file in the templates, but for clarity let's handle this)

We were testing this indirectly.

Update vendoring github.com/coreos/fcct → github.com/coreos/butane

Created MCONamespace constant and used in all *.go files except for test/helpers/utils.go which would create a cyclic import

Create MCONamespace constant

I don't know that we'll ultimately end up with imagestreams with these names but for right now, I think these are at least the ones we think we want that have purposes behind them.

Extends clientbuilder to build openshift Image and Build clients, as our controllers will need to deal with those objects as part of the "build controller" for layering.

Adds a shared ImageInformer to our shared controller context so we can watch imagestreams in our controllers.

Allows using a label to designate a pool as a "layred" pool. Adds an imagestream informer to watch for imagestream updates and queue the pool again for image deployment. Updates tests with new render controller signature. Layered pools will create resources (buildconfigs/imagestreams owned by the pool) to build in-cluster derived images. Layered pools: - Ensure that a base coreos imagestream exists - Ensure that pool-specific imagestreams exist - Ensure that pool-specific buildconfig exist - Render machineconfig into images in imagestreams - Derive images from the coreos imagestream using the content from that rendered-config image stream using aforementioned buildconfig - Will be enqueued again (so node controller can sync them) when their corresponding imagestream gets updated

For a "layred" pool, this makes the node controller: - Wait for an image that is equivalent to the rendered config - Once that image arrives, assign it as an additional (along with desiredConfig) desiredImage annotation on the node The "done" signal for a node at this point is still when currentConfig == desiredConfig, it's just that how we're going to get there now is via the image rather than via applying the config directly. Also refactored setDesiredMachineConfigAnnotation to a generic function that sets an arbitrary annotation (because duplicating for image annotations made "verify" unhappy)

This adds some daemon constants for current/desired image annotations, as well as extending our rpm-ostree "client bindings" to be able to: - Execute a container rebase - Parse more of the `rpm-ostree --status` JSON output so we can figure out whether we're booted into a deployment or not (for live-apply cases)

This give rbac permissions to the machine-config-controller service account to manipulate Builds/BuildRequests and Imagestreams in the machine-config-operator namespace, as well as push images into the internal registry. This gives rbac permissions to the machine-config-daemon serviceaccount to retrieve images from the internal registry. This also updates the operator's sync so that these new registry role files will be generated and deployed properly with the mco.

This creates a separate update flow for "layered" pools, so they will be ignored/not be handled by the standard update function. This will not impact any pool that is not layered. This also tells the config drift detector to ignore OSImageURL for layered pools because it will not match.

A lot of this is version bumps, but go-containerregistry (the thing I used to create my tiny machineconfig wrapper images without a docker build) additionally drags in the following dependencies that nothing else uses: - github.com/spf13/viper/ - github.com/containerd/stargz-snapshotter/ - github.com/docker/cli/ - github.com/docker/docker/pkg/homedir/ This isn't terrible, so I'm leaving it for now since it works, but at some point this might get obviated by buildah or something

…ntroller In cluster build/"build controller" proof-of-concept

Use MCONamespace constant in getPullSecret()

Move log statement to UpdateTuningArgs

cgwalters · 2022-04-07T18:01:36Z

This is based on #3060 and then a git merge layering from master.

cgwalters · 2022-04-07T18:01:52Z

/test e2e-gcp-op
/test e2e-agnostic-upgrade

openshift-ci · 2022-04-07T18:02:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cgwalters · 2022-04-07T18:03:57Z

    from:
      kind: DockerImage
      name: registry.svc.ci.openshift.org/openshift:machine-os-content
+  - name: rhel-coreos


Hmm, actually doing this bit blocks on https://issues.redhat.com/browse/ART-3883

cgwalters · 2022-04-07T18:12:24Z

/hold

jkyros

Some of the cans I know I kicked:

jkyros · 2022-04-07T18:16:43Z

 - apiGroups: ["operator.openshift.io"]
  resources: ["etcds"]
  verbs: ["get", "list", "watch"]
+- apiGroups: ["build.openshift.io"]


I put in a bunch of sloppy RBAC that we need to be more deliberate about.

Hmm, right I think we can actually make these bits a namespaced role, not a clusterrole. But not a blocker IMO.

jkyros · 2022-04-07T18:18:35Z

 		apiextinformers.WithNamespace(targetNamespace), apiextinformers.WithTweakListOptions(assignFilterLabels))
 	configSharedInformer := configinformers.NewSharedInformerFactory(configClient, resyncPeriod()())
 	operatorSharedInformer := operatorinformers.NewSharedInformerFactory(operatorClient, resyncPeriod()())
+	imageSharedInformer := imageinformers.NewSharedInformerFactory(imageClient, resyncPeriod()())


Would like to think about whether we should filter this to our namespace, there's probably a ton of image traffic. (although if we feature gate this and never start the informer, it doesn't matter)

That's a great example of a potential risk! Let's tighten that up indeed.

The number of watches and any unusual increases are being monitored so this is a good callout 👍

jkyros · 2022-04-07T18:20:33Z

+
+		glog.Infof("Pool %s is s special pool, rendering config to imagestream", pool.Name)
+		// jkyros: Just put this here for now since it's convenient
+		err = ctrl.experimentalRenderToImageStream(pool, generated)


Are we cool with the "push to imagestream approach" or do we need to figure out a service endpoint and retrieve this from inside the build?

For the purposes of this PR, I'd like to just evaluate "risks to not-layered".

(It does add and version bump some dependencies but nothing that seemed overtly dangerous )

openshift-ci · 2022-04-07T18:24:55Z

@cgwalters: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-op	`b78d77a`	link	true	`/test e2e-gcp-op`
ci/prow/e2e-agnostic-upgrade	`b78d77a`	link	true	`/test e2e-agnostic-upgrade`
ci/prow/bootstrap-unit	`b78d77a`	link	false	`/test bootstrap-unit`
ci/prow/images	`b78d77a`	link	true	`/test images`
ci/prow/e2e-aws	`b78d77a`	link	true	`/test e2e-aws`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cheesesashimi · 2022-04-07T21:05:51Z

There's a more intensive series of tests from TRT that we should run this PR against. I'll figure out how to do that and follow up once I know more.

yuqi-zhang

Did a general read. The changes seem pretty well gated behind a few annotations existing, etc.

A few notes:

There's a decent amount of refactoring on our postConfigChange action code, which is a bit hard to read from the diff. I assume though the idea is that we don't actually change that flow, just how the functions are used, so we should be good as long as we can validate that
The changes above (and refactors to the update functions) likely will minorly affect the hypershift code, but we should be able to consolidate. Just something to consider when/if we merge

yuqi-zhang · 2022-04-08T15:30:27Z

+	}
+	newformat, ok := cm.Data["baseOperatingSystemContainer"]
+	if !ok {
+		return "", "", fmt.Errorf("Missing baseOperatingSystemContainer from configmap")


Is baseOperatingSystemContainer mandatory to have? On a non-layered system is it just empty (but the key exists?)

openshift-ci · 2022-04-17T22:21:01Z

@cgwalters: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters · 2022-05-03T16:46:38Z

Just a note, I think we need to:

Land [layering] Separate build controller to handle builds for layered pools #3071 into the layering branch
Work on rebasing layering onto master (this is being looked at in Merge master into layering #3060 )
Try to clean up and factor out noninvasive parts to land in master (e.g. rbac: allow build/image manipulation and retrieval #3133 )
Rebase this PR on the above
Ensure that there's a rhel-coreos in the release image, xref https://issues.redhat.com/browse/ART-3883
Go over this PR as a team with an eye 👁️ to risk (specifically MCD and node controller changes)

openshift-bot · 2022-08-01T19:27:20Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2022-09-01T00:30:26Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2022-10-01T08:00:34Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2022-10-01T08:01:15Z

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cheesesashimi and others added 30 commits March 7, 2022 16:10

e2e-layering test package and make target

dabe2ee

Merge pull request openshift#2989 from cheesesashimi/zzlotnik/e2e-lay…

fe35f50

…ering-test-package adds e2e-layering test package

Merge pull request openshift#2990 from cheesesashimi/zzlotnik/update-…

3a91c46

…layering-branch sync layering with master

Merge pull request openshift#2981 from cgwalters/rhcos-reference

ed19148

Add `rhel-coreos` image in configmap and `oscontainer` in controllerconfig

Merge remote-tracking branch 'origin/master' into layering

71d0a0f

Move log statement to UpdateTuningArgs

827f115

This will keep layered and non-layered update logging consistent

Merge pull request openshift#3018 from cgwalters/layering-merge-20220315

f4a51c8

Merge master into layering

server: Use common helper to decode ignition

8263928

To properly handle compression. Prep for using butane.

vendor: Cherry-pick coreos/ignition#1329

2d9b12f

helpers: Unit test DecodeIgnitionFileContents more

096e6ff

We were testing this indirectly.

Merge pull request openshift#2976 from cgwalters/bump-butane

4616b1a

Update vendoring github.com/coreos/fcct → github.com/coreos/butane

Create MCONamespace constant

abda106

Created MCONamespace constant and used in all *.go files except for test/helpers/utils.go which would create a cyclic import

Merge pull request openshift#3020 from mkenigs/MCONamespace

3db0565

Create MCONamespace constant

Adds some constants for layering and a helper.

4298fe6

I don't know that we'll ultimately end up with imagestreams with these names but for right now, I think these are at least the ones we think we want that have purposes behind them.

Extend client builder for openshift images/builds

8dcaa40

Extends clientbuilder to build openshift Image and Build clients, as our controllers will need to deal with those objects as part of the "build controller" for layering.

controller_context: Add image client+informer

dda767c

Adds a shared ImageInformer to our shared controller context so we can watch imagestreams in our controllers.

Merge pull request openshift#3007 from jkyros/layering-proto-build-co…

9be2e3d

…ntroller In cluster build/"build controller" proof-of-concept

Use MCONamespace constant in getPullSecret()

06d26ca

Merge pull request openshift#3031 from mkenigs/use-MCONamespace

4974711

Use MCONamespace constant in getPullSecret()

Merge pull request openshift#3021 from mkenigs/refactor-log

b50d486

Move log statement to UpdateTuningArgs

Move getSupportedExtensions to ctrlcommon

626d066

Move CanonicalizeKernelType to ctrlcommon

4f29bf4

cgwalters mentioned this pull request Apr 7, 2022

[draft] merge layering into master #3046

Closed

openshift-ci Bot requested review from kikisdeliveryservice and yuqi-zhang April 7, 2022 18:02

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2022

cgwalters commented Apr 7, 2022

View reviewed changes

cgwalters marked this pull request as ready for review April 7, 2022 18:12

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2022

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 7, 2022

openshift-ci Bot requested review from mkenigs and sinnykumari April 7, 2022 18:13

jkyros reviewed Apr 7, 2022

View reviewed changes

yuqi-zhang reviewed Apr 8, 2022

View reviewed changes

cgwalters added the layering label Apr 12, 2022

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2022

openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2022

openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 1, 2022

openshift-ci Bot closed this Oct 1, 2022

Conversation

cgwalters commented Apr 7, 2022

Uh oh!

cgwalters commented Apr 7, 2022

Uh oh!

cgwalters commented Apr 7, 2022

Uh oh!

openshift-ci Bot commented Apr 7, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Apr 7, 2022

Uh oh!

jkyros left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkyros Apr 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Apr 7, 2022

Uh oh!

cheesesashimi commented Apr 7, 2022

Uh oh!

yuqi-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Apr 17, 2022

Uh oh!

cgwalters commented May 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-bot commented Aug 1, 2022

Uh oh!

openshift-bot commented Sep 1, 2022

Uh oh!

openshift-bot commented Oct 1, 2022

Uh oh!

openshift-ci Bot commented Oct 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jkyros Apr 7, 2022 •

edited

Loading

cgwalters commented May 3, 2022 •

edited

Loading