[layering] Separate build controller to handle builds for layered pools by jkyros · Pull Request #3071 · openshift/machine-config-operator

jkyros · 2022-04-07T16:59:14Z

This is the build controller code so far. I will type better words later.

Still needs a bunch of cleanup.
Works like before where you create the "layered" pool
Difference from a user perspective is that desiredConfig becomes a sha256:a7c711e2f7ff0608cfdc601bc03f21b0e6cb825aa7b75bd30da53ee4760b48eb instead of a rendered config
- If we like that, we probably need to better correlate that image to the machineconfig for the user
Build status/failure detection is still in progress

This correlates with MCO-228

Current Release Image: https://quay.io/jkyros/ocp-release:pr-3071-may-11

openshift-ci · 2022-04-07T16:59:17Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

mkenigs · 2022-04-08T20:25:34Z

I'm not familiar with the Image API or how the build controller is consuming this, so not sure if this would be possible...but I'm wondering if we could use generated.Name as a tag and then also tag that as latest. Then if I delete a MC, rolling back to the previously rendered config is faster, because you could just check that the image with that tag already exists and tag it as latest.

mkenigs · 2022-04-08T23:18:42Z

Could we do this once rather than in the sync loop?

mkenigs · 2022-04-08T23:25:12Z

Would it be possible to tag with the rendered config and see if an image has been built already rather than using time?

mkenigs · 2022-04-08T23:44:24Z

I think some of my other comments were based on the assumption that the base image won't have changed. Hmm may have to think about this more on Monday 🤔

cgwalters · 2022-04-14T00:21:04Z

OK sorry I had only briefly skimmed this before. So, this is a large change that is highly likely to conflict with any other changes (e.g. #3083 ) so...I think we should try to collaborate on this PR, push this forward and get it into the layering branch.

This removes most of the stuff from the render controller that was added to the build controller. As a result, the render controller for the most part goes back to the way it was. There is no avoiding having some change to the render controller, as it will need to know to ignore the pools that the build controller is managing.

We need to specify what resources the pool should be creating/owning in order to manage builds and the relationships between them (triggers, which buildconfigs should source from which streams, etc). The pool needs some template/plan to follow when it gets triggered to create and verify that its resources exist. This is my not-quite-as-elegant-as-I-had-hoped attempt at that.

This confines the image and build informers to the MCO's namespace. There are potentially many builds and images in the cluster and watching all of them and their events is potentially expensive. As our constraints currently sit, all of the images and builds will be dealing with are present in the MCO namespace.

This adds helpers for getting and setting the current image stream for a pool, separates the per-pool suffix '-base' from the global 'coreos' imagestream, and adds a per-node stream name constant for later.

jkyros · 2022-04-21T07:16:12Z

+			// TODO(jkyros): This is a race I usually win, but I won't always, and we need a better
+			// way to get this metadata in
+			if imagestream.Name == pool.Name+ctrlcommon.ImageStreamSuffixRenderedConfig {
+				ctrl.cheatMachineConfigLabelIntoBuildConfig(imagestream, pool)


Yes, this is silly. This is me trying to sneak the dynamic labels onto the buildconfig before it triggers.

I'm injecting the image labels via the dockerfile + args right now, but if a user runs oc start-build, say to trigger after a build fails, the user's build won't have the args, which means it won't get the label, which means we won't know what machineconfig an image contains (and I know we don't need it anymore because our config is the image, but I really do want to be able to show it back somehow).

jkyros · 2022-04-21T19:39:38Z

+// experimentalAddEntitlements grabs the cluster entitlement certificates out of the openshift-config-managed namespace and
+// stuffs them into a machineconfig for our layered pool, so we can have entitled builds. This is a terrible practice, and
+// we should probably just sync the secrets into our namespace so our builds can use them directly rather than expose them via machineconfig.
+func (ctrl *Controller) experimentalAddEntitlements(pool *mcfgv1.MachineConfigPool) {


Talking with Colin, we will need to think about the RHEL9 ramifications of this -- because it will probably be wrong 😄

Beyond just the RHEL8 versus RHEL9 versus e.g. C9S issues, I think we also need to handle the case of disconnected installs where a user already has a local RHEL mirror. Us trying to overwrite redhat.repo would then just be broken.

I think we need to de-couple "automatically set up repos" from layering; the former needs to be a different opt-in mechanism.

jkyros · 2022-04-21T19:45:07Z

+				Name: image.Image,
+			}
+			// TODO(jkyros): Kind of cheating using this as metadata showback for the user until we figure out our "level triggering" strategy
+			pool.Spec.Configuration.Source = []corev1.ObjectReference{


Talking with Colin still, we were talking about how nice it would be to have some sort of history on the pool -- something like "this update changed X" and being able to step back through those.

jkyros · 2022-04-21T19:55:40Z

+	if desiredConfigName, ok := dn.node.Annotations[constants.DesiredMachineConfigAnnotationKey]; ok {
+		// TODO(jkyros): This works for now, but we should probably add another annotation or at least
+		// do something more elegant than string prefix parsing
+		if strings.HasPrefix(desiredConfigName, "sha256:") {


Preference has been expressed that we have separate config/image annotations and we clean up the ones we aren't using when we switch.

I tried this with the most recent push, but it seems a little more invasive than I'd have liked due to some of the assumptions in the node controller's status (see status.go comment below)

jkyros · 2022-04-26T07:41:05Z

+
+	// Make sure the imagestreams exist so we can populate them with our image builds
+	for _, imageStreamName := range ensureImageStreams {
+		_, err := ctrl.ensureImageStreamForPool(pool, imageStreamName, pbr)


Something else we talked about was whether the resource creation/ensuring should be done by the operator instead of by the build controller. I put them here because they were "dynamic" per-pool and not created from any sort of manifest, but I wanted to make sure we think through that.

I have maybe some hypershift-y "lego brick" concerns if I need the operator to spawn my objects in order to do a build, but maybe that doesn't really matter.

jkyros · 2022-04-26T07:48:51Z

 	}
-	cconfig, ok := node.Annotations[daemonconsts.CurrentMachineConfigAnnotationKey]
-	if !ok || cconfig == "" {
-		return false
+
+	cconfig, hasConfig := node.Annotations[daemonconsts.CurrentMachineConfigAnnotationKey]
+	cimage, hasImage := node.Annotations[daemonconsts.CurrentImageConfigAnnotationKey]
+
+	// if we have an image config present
+	if hasImage && cimage != "" {
+		return true
 	}


Things like this are why I was previously hesitant to have a completely separate set of annotations -- I didn't want our changes to be too invasive -- there are a bunch of places where we assume that the presence/absence of daemonconsts.CurrentMachineConfigAnnotationKey means something.

Yeah...fair enough. Hmm.
So definitely a bit more risk to a master merge with this. I think we can do it though, just needs some auditing. I'm also pretty confident our e2es cover this.

jkyros · 2022-04-26T07:51:45Z

+## This makes a bunch of imagestreams, what are they for? 
+
+|imagestream| intended purpose|
+|---|---|
+|`coreos`                            | The global default base coreos image (rhel-coreos) |
+|`layered-base`                    |   The base image stream our buildconfigs draw from (defaults to coreos:latest) |
+|`layered-external-base`           |External base image stream. If :latest tag is populated, layered-base uses this instead of coreos:latest|  
+|`layered-mco-content`        | This is the image built by our mco dockerfile where ignition-apply runs |


One of the other conversations we also had around this was whether or not the "custom pools" should derive from the worker pool image or if they should build their own. So right now on the layering branch:

each pool in "layred/image mode" (custom or not) gets its own set of imagestreams and buildconfigs.

This is okay because the render controller is merging the config for us -- it's handling "config derivation" on the machineconfig front for custom pools, but the image content is still being built 'from base'.

I'm just writing this here so we remember to think through whether "all custom pools ~~inherit~~ derive from worker" is a meme that we want to carry forward into our new derived image world, or if it is something we only did out of necessity because the pool needed the config in worker (base, templates, etc) to live ?

cgwalters · 2022-04-07T19:30:34Z

Could probably share this with render.go; calling it poolKind could make sense too?

cgwalters · 2022-04-26T22:25:24Z

/test unit
/test verify
/test images
/test e2e-gcp-op
/test e2e-agnostic-upgrade

cgwalters · 2022-04-26T22:25:35Z

/approve

The build controller "takes over" for pools that are marked as layered. It will more or less serve the function of the render controller but for image based pools. The render controller currently retains the functionality of generating the rendered config, and will push that rendered config into an imagestream every time it gets updated.

Since we're going straight into the configuration now (the configuration being an Image (we can do this because the MachineConfigPool spec.Configuration is an ObjectRef), we don't need to have multiple annotations for desiredConfig and desiredImage anymore. The desiredconfig contains the reference for whatever the configuration is supposed to be, and the daemon will figure out how to apply it. This removes the 'halfway-step' desiredImage annotation behavior and allows the node_controller to assign the image sha256 itself as desiredConfig.

Before this, we had separate annotations for current/desired config and current/desired image. We were getting to a particular machineconfig level "via an image". The MachineConfig level was what we were trying to reach, not the image. This changes the approach such that the image is now the desired configuration that we are trying to reach, and that is what is will be contained in desiredConfig for pools that are performing layered/image builds. This removes the old dual-annotation behavior and changes the ordering slightly in a few places because we used to assume that desiredConfig was a machineConfig (so we tried to retrieve it right out of the gate without checking to see what it really was) and the mcd would degrade if it couldn't retrieve it as a machineconfig.

The changes being made to the render/build controllers and the daemon have resulted in them losing or gaining informers/listers/clients in their creationmethod signatures. This, of course, breaks all of the tests. This commit fixes the tests by adjusting the includes and method signatures in the tests to match our new controller + daemon signatures

Previously we had put in some less careful RBAC rules to let the build controller watch images and builds. This replaces those cluster roles with a namespaced role that only lets the MCO watch builds and images inside its own namespace. This corresponds with the controller's build and image informers being namespaced to the MCO namespace. This also lets the daemon retrieve images, as the daemon is being passed a sha256: hash (because that's how openshift images are named) and it needs to convert it to an actual image reference so rpm-ostree can apply it.

So after we talked about it, while the string matching on the sha: sum did work, the desire was ti actually specify the kind of the config, so we're doing that by having separate annotations. This also cleans the currentConfig/desiredConfig off the node when it's set to done. I *think* I addressed most of the consequences of that but there is no way I didn't miss at least one of them somewhere, since the MCO is heavily dependent on the assumption that currentConfig will always be there in a lot of cases (which is why I had been avoiding doing this). I also removed the imagelister while I was in there and switched to a direct client call.

The method signature of the daemon's clusterconnect method changed when I took the lister out, the test was just adjusted to accommodate this.

I'll squash these out if this is what we end up wanting to do, but this embraces the "separate annotation" way of assigning either an image or a machineconfig to a node. There are a lot of places in the node controller status calculation where it assumes that controllerConfig is always there and so I tried to address those so either the image annotations OR the machineconfig annotations would satisfy things like the 'managed' check and the 'done' check. This makes the node controller assign current/desired image annotations to nodes in pools where the config is an image, and the standard current/desired config annotations to nodes in standard pools.

There was an issue where a node would fail to run some pods after it rebased to an image and rebooted. Somehow during a rebase, a blank 'resolv.conf' gets added at /usr/etc/resolv.conf even though it is not present in any config or image. Since we were defaulting all the files previously using rsync for the ostree 3-way merge, we were unfortunately blanking out resolv.conf. That would have been okay except as part of that 'rsync' process, we were also losing the selinux context for /etc/resolv.conf, which meant that NetworkManager couldn't rewrite it because if was no longer net_t. This sidesteps the problem for now by only copying files that have definitely changed between images because we are already calculating that file list and have it available. Furthermore, we are running restorecon just to make sure that should a config image change one of the files with a special context, it will get set back properly. This also moves the drain before the file changes -- we don't want the pods to be affected by any of the config changes (that is why we are draining, after all).

Previously when we were handling kargs, we were failing to apply them on the initial transition because the mcdcontent.json file containing them did not exist in the initial image. We'd rebase then degrade, then the next time through, it would notice we rebased and skip the section. Then the next image update would work because the file was already on disk. Also, the Cat function that was retrieving the kargs data was assuming that the sequence number was always going to be '0' on deployments, and so depending on what had happened (live-apply) it would sometimes fail to find the MCDContent file on disk. This: - Decouples the rebase from the kernel arg handling - Uses the sequence number when reading files out of deployments so the files are properly found - Allows us to fall back to using the on-disk currentConfig to retrieve the kernel args during our first machineconfig to image rebase (since we delete existing, then add, we need to know what existing was)

This was just for playing with layering. We currently explode extensions into lists of packages but do not support arbitrary non-extension packages. Until we figure out an elegant in-cluster way (if any) for a user to supply arbitrary packages, I just hacked this in so they could specified in the extensions field.

This is a half-step that gets us out of having to have extra RBAC perms to let the daemon read secrets. For now, this just uses the machine-config-daemon's serviceaccount token which is mounted in /var/run/secrets. This is also apparently what openshift-controller-manager does to generate the dockercfg tokens, so having the operator inject them isn't really any lower risk. (You can reverse-engineer the sa token from the dockercfg secret). I don't know that I want to write the serviceaccount token to disk? I think I'd rather generate some other pull-only secret, but I don't know if there is a way to do that yet, so this is the best we have.

cgwalters · 2022-05-12T21:45:27Z

+					return err
+				}
+				// Fix the selinux context just in case it had special context
+				err = runCmdSync("restorecon", "-Fv", diffFile)


I think we can actually just use cp -Z btw.

cgwalters

Love the detailed commit messages! Overall, there's a lot of change here and I didn't review line by line, but direction looks good.

I think we'll be reviewing bits of this more carefully when we're getting ready to merge back to main/master.

Awesome work here!

cgwalters · 2022-05-12T21:48:04Z

 }

-func setNodeAnnotations(client corev1client.NodeInterface, lister corev1lister.NodeLister, nodeName string, m map[string]string) (*corev1.Node, error) {
+func setNodeAnnotations(client corev1client.NodeInterface, lister corev1lister.NodeLister, nodeName string, add, remove map[string]string) (*corev1.Node, error) {


Ah yes, we're starting to accumulate more merge conflicts...in this case with #3141

Not a blocker of course, just noting

cgwalters · 2022-05-12T21:49:22Z

/lgtm

openshift-ci · 2022-05-12T21:49:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jkyros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2022-05-12T21:49:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jkyros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2022-05-12T21:59:08Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2022-05-12T22:09:23Z

@jkyros: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-single-node	`b895594`	link	false	`/test e2e-gcp-single-node`
ci/prow/e2e-gcp-layering	`b895594`	link	true	`/test e2e-gcp-layering`
ci/prow/e2e-gcp-op	`b895594`	link	true	`/test e2e-gcp-op`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2022

openshift-ci Bot requested review from cgwalters and yuqi-zhang April 7, 2022 17:00

mkenigs reviewed Apr 7, 2022

View reviewed changes

Comment thread vendor/github.com/coreos/ignition/v2/config/merge/merge.go Outdated

mkenigs reviewed Apr 8, 2022

View reviewed changes

Comment thread pkg/controller/build/build_controller.go Outdated

Copy link
Copy Markdown

Contributor

mkenigs Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this once rather than in the sync loop?

mkenigs reviewed Apr 8, 2022

View reviewed changes

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2022

cgwalters added the layering label Apr 12, 2022

jkyros added 4 commits April 21, 2022 01:47

controller/common: image stream helpers and consts

a50c3f6

This adds helpers for getting and setting the current image stream for a pool, separates the per-pool suffix '-base' from the global 'coreos' imagestream, and adds a per-node stream name constant for later.

jkyros force-pushed the build-controllery-bits branch from 0c7e0d2 to f5f87e7 Compare April 21, 2022 06:56

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2022

jkyros commented Apr 21, 2022

View reviewed changes

Comment thread pkg/daemon/daemon.go Outdated

jkyros commented Apr 21, 2022

View reviewed changes

jkyros force-pushed the build-controllery-bits branch from f5f87e7 to 3443c17 Compare April 26, 2022 07:35

jkyros commented Apr 26, 2022

View reviewed changes

cgwalters reviewed Apr 26, 2022

View reviewed changes

cgwalters mentioned this pull request May 3, 2022

Merge layering into master #3072

Closed

jkyros added 14 commits May 10, 2022 11:34

make go-deps

49ddc29

Add some WIP doc explaining buildcontroller

c319654

Fix daemon test since I removed the image lister

305d565

The method signature of the daemon's clusterconnect method changed when I took the lister out, the test was just adjusted to accommodate this.

jkyros force-pushed the build-controllery-bits branch from 3443c17 to b895594 Compare May 11, 2022 20:53

jkyros marked this pull request as ready for review May 11, 2022 21:01

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 11, 2022

openshift-ci Bot requested a review from cheesesashimi May 11, 2022 21:02

cgwalters reviewed May 12, 2022

View reviewed changes

cgwalters approved these changes May 12, 2022

View reviewed changes

openshift-ci Bot assigned cgwalters May 12, 2022

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2022

cgwalters changed the title ~~[layering] WIP: Separate build controller to handle builds for layered pools~~ [layering] Separate build controller to handle builds for layered pools May 12, 2022

cgwalters merged commit 08317ef into openshift:layering May 12, 2022

Conversation

jkyros commented Apr 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Apr 7, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Apr 14, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cgwalters commented Apr 26, 2022

Uh oh!

cgwalters commented Apr 26, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters commented May 12, 2022

Uh oh!

openshift-ci Bot commented May 12, 2022

Uh oh!

openshift-ci Bot commented May 12, 2022

Uh oh!

openshift-bot commented May 12, 2022

Uh oh!

openshift-ci Bot commented May 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jkyros commented Apr 7, 2022 •

edited

Loading