[layering] Separate build controller to handle builds for layered pools #3071
[layering] Separate build controller to handle builds for layered pools #3071cgwalters merged 18 commits intoopenshift:layeringfrom
Conversation
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
I'm not familiar with the Image API or how the build controller is consuming this, so not sure if this would be possible...but I'm wondering if we could use generated.Name as a tag and then also tag that as latest. Then if I delete a MC, rolling back to the previously rendered config is faster, because you could just check that the image with that tag already exists and tag it as latest.
There was a problem hiding this comment.
Could we do this once rather than in the sync loop?
There was a problem hiding this comment.
Would it be possible to tag with the rendered config and see if an image has been built already rather than using time?
There was a problem hiding this comment.
I think some of my other comments were based on the assumption that the base image won't have changed. Hmm may have to think about this more on Monday 🤔
|
OK sorry I had only briefly skimmed this before. So, this is a large change that is highly likely to conflict with any other changes (e.g. #3083 ) so...I think we should try to collaborate on this PR, push this forward and get it into the layering branch. |
This removes most of the stuff from the render controller that was added to the build controller. As a result, the render controller for the most part goes back to the way it was. There is no avoiding having some change to the render controller, as it will need to know to ignore the pools that the build controller is managing.
We need to specify what resources the pool should be creating/owning in order to manage builds and the relationships between them (triggers, which buildconfigs should source from which streams, etc). The pool needs some template/plan to follow when it gets triggered to create and verify that its resources exist. This is my not-quite-as-elegant-as-I-had-hoped attempt at that.
This confines the image and build informers to the MCO's namespace. There are potentially many builds and images in the cluster and watching all of them and their events is potentially expensive. As our constraints currently sit, all of the images and builds will be dealing with are present in the MCO namespace.
This adds helpers for getting and setting the current image stream for a pool, separates the per-pool suffix '-base' from the global 'coreos' imagestream, and adds a per-node stream name constant for later.
0c7e0d2 to
f5f87e7
Compare
| // TODO(jkyros): This is a race I usually win, but I won't always, and we need a better | ||
| // way to get this metadata in | ||
| if imagestream.Name == pool.Name+ctrlcommon.ImageStreamSuffixRenderedConfig { | ||
| ctrl.cheatMachineConfigLabelIntoBuildConfig(imagestream, pool) |
There was a problem hiding this comment.
Yes, this is silly. This is me trying to sneak the dynamic labels onto the buildconfig before it triggers.
I'm injecting the image labels via the dockerfile + args right now, but if a user runs oc start-build, say to trigger after a build fails, the user's build won't have the args, which means it won't get the label, which means we won't know what machineconfig an image contains (and I know we don't need it anymore because our config is the image, but I really do want to be able to show it back somehow).
| // experimentalAddEntitlements grabs the cluster entitlement certificates out of the openshift-config-managed namespace and | ||
| // stuffs them into a machineconfig for our layered pool, so we can have entitled builds. This is a terrible practice, and | ||
| // we should probably just sync the secrets into our namespace so our builds can use them directly rather than expose them via machineconfig. | ||
| func (ctrl *Controller) experimentalAddEntitlements(pool *mcfgv1.MachineConfigPool) { |
There was a problem hiding this comment.
Talking with Colin, we will need to think about the RHEL9 ramifications of this -- because it will probably be wrong 😄
There was a problem hiding this comment.
Beyond just the RHEL8 versus RHEL9 versus e.g. C9S issues, I think we also need to handle the case of disconnected installs where a user already has a local RHEL mirror. Us trying to overwrite redhat.repo would then just be broken.
I think we need to de-couple "automatically set up repos" from layering; the former needs to be a different opt-in mechanism.
| Name: image.Image, | ||
| } | ||
| // TODO(jkyros): Kind of cheating using this as metadata showback for the user until we figure out our "level triggering" strategy | ||
| pool.Spec.Configuration.Source = []corev1.ObjectReference{ |
There was a problem hiding this comment.
Talking with Colin still, we were talking about how nice it would be to have some sort of history on the pool -- something like "this update changed X" and being able to step back through those.
| if desiredConfigName, ok := dn.node.Annotations[constants.DesiredMachineConfigAnnotationKey]; ok { | ||
| // TODO(jkyros): This works for now, but we should probably add another annotation or at least | ||
| // do something more elegant than string prefix parsing | ||
| if strings.HasPrefix(desiredConfigName, "sha256:") { |
There was a problem hiding this comment.
Preference has been expressed that we have separate config/image annotations and we clean up the ones we aren't using when we switch.
There was a problem hiding this comment.
I tried this with the most recent push, but it seems a little more invasive than I'd have liked due to some of the assumptions in the node controller's status (see status.go comment below)
f5f87e7 to
3443c17
Compare
|
|
||
| // Make sure the imagestreams exist so we can populate them with our image builds | ||
| for _, imageStreamName := range ensureImageStreams { | ||
| _, err := ctrl.ensureImageStreamForPool(pool, imageStreamName, pbr) |
There was a problem hiding this comment.
Something else we talked about was whether the resource creation/ensuring should be done by the operator instead of by the build controller. I put them here because they were "dynamic" per-pool and not created from any sort of manifest, but I wanted to make sure we think through that.
I have maybe some hypershift-y "lego brick" concerns if I need the operator to spawn my objects in order to do a build, but maybe that doesn't really matter.
| } | ||
| cconfig, ok := node.Annotations[daemonconsts.CurrentMachineConfigAnnotationKey] | ||
| if !ok || cconfig == "" { | ||
| return false | ||
|
|
||
| cconfig, hasConfig := node.Annotations[daemonconsts.CurrentMachineConfigAnnotationKey] | ||
| cimage, hasImage := node.Annotations[daemonconsts.CurrentImageConfigAnnotationKey] | ||
|
|
||
| // if we have an image config present | ||
| if hasImage && cimage != "" { | ||
| return true | ||
| } |
There was a problem hiding this comment.
Things like this are why I was previously hesitant to have a completely separate set of annotations -- I didn't want our changes to be too invasive -- there are a bunch of places where we assume that the presence/absence of daemonconsts.CurrentMachineConfigAnnotationKey means something.
There was a problem hiding this comment.
Yeah...fair enough. Hmm.
So definitely a bit more risk to a master merge with this. I think we can do it though, just needs some auditing. I'm also pretty confident our e2es cover this.
| ## This makes a bunch of imagestreams, what are they for? | ||
|
|
||
| |imagestream| intended purpose| | ||
| |---|---| | ||
| |`coreos` | The global default base coreos image (rhel-coreos) | | ||
| |`layered-base` | The base image stream our buildconfigs draw from (defaults to coreos:latest) | | ||
| |`layered-external-base` |External base image stream. If :latest tag is populated, layered-base uses this instead of coreos:latest| | ||
| |`layered-mco-content` | This is the image built by our mco dockerfile where ignition-apply runs | |
There was a problem hiding this comment.
One of the other conversations we also had around this was whether or not the "custom pools" should derive from the worker pool image or if they should build their own. So right now on the layering branch:
- each pool in "layred/image mode" (custom or not) gets its own set of imagestreams and buildconfigs.
This is okay because the render controller is merging the config for us -- it's handling "config derivation" on the machineconfig front for custom pools, but the image content is still being built 'from base'.
I'm just writing this here so we remember to think through whether "all custom pools inherit derive from worker" is a meme that we want to carry forward into our new derived image world, or if it is something we only did out of necessity because the pool needed the config in worker (base, templates, etc) to live ?
There was a problem hiding this comment.
Could probably share this with render.go; calling it poolKind could make sense too?
|
/test unit |
|
/approve |
The build controller "takes over" for pools that are marked as layered. It will more or less serve the function of the render controller but for image based pools. The render controller currently retains the functionality of generating the rendered config, and will push that rendered config into an imagestream every time it gets updated.
Since we're going straight into the configuration now (the configuration being an Image (we can do this because the MachineConfigPool spec.Configuration is an ObjectRef), we don't need to have multiple annotations for desiredConfig and desiredImage anymore. The desiredconfig contains the reference for whatever the configuration is supposed to be, and the daemon will figure out how to apply it. This removes the 'halfway-step' desiredImage annotation behavior and allows the node_controller to assign the image sha256 itself as desiredConfig.
Before this, we had separate annotations for current/desired config and current/desired image. We were getting to a particular machineconfig level "via an image". The MachineConfig level was what we were trying to reach, not the image. This changes the approach such that the image is now the desired configuration that we are trying to reach, and that is what is will be contained in desiredConfig for pools that are performing layered/image builds. This removes the old dual-annotation behavior and changes the ordering slightly in a few places because we used to assume that desiredConfig was a machineConfig (so we tried to retrieve it right out of the gate without checking to see what it really was) and the mcd would degrade if it couldn't retrieve it as a machineconfig.
The changes being made to the render/build controllers and the daemon have resulted in them losing or gaining informers/listers/clients in their creationmethod signatures. This, of course, breaks all of the tests. This commit fixes the tests by adjusting the includes and method signatures in the tests to match our new controller + daemon signatures
Previously we had put in some less careful RBAC rules to let the build controller watch images and builds. This replaces those cluster roles with a namespaced role that only lets the MCO watch builds and images inside its own namespace. This corresponds with the controller's build and image informers being namespaced to the MCO namespace. This also lets the daemon retrieve images, as the daemon is being passed a sha256: hash (because that's how openshift images are named) and it needs to convert it to an actual image reference so rpm-ostree can apply it.
So after we talked about it, while the string matching on the sha: sum did work, the desire was ti actually specify the kind of the config, so we're doing that by having separate annotations. This also cleans the currentConfig/desiredConfig off the node when it's set to done. I *think* I addressed most of the consequences of that but there is no way I didn't miss at least one of them somewhere, since the MCO is heavily dependent on the assumption that currentConfig will always be there in a lot of cases (which is why I had been avoiding doing this). I also removed the imagelister while I was in there and switched to a direct client call.
The method signature of the daemon's clusterconnect method changed when I took the lister out, the test was just adjusted to accommodate this.
I'll squash these out if this is what we end up wanting to do, but this embraces the "separate annotation" way of assigning either an image or a machineconfig to a node. There are a lot of places in the node controller status calculation where it assumes that controllerConfig is always there and so I tried to address those so either the image annotations OR the machineconfig annotations would satisfy things like the 'managed' check and the 'done' check. This makes the node controller assign current/desired image annotations to nodes in pools where the config is an image, and the standard current/desired config annotations to nodes in standard pools.
There was an issue where a node would fail to run some pods after it rebased to an image and rebooted. Somehow during a rebase, a blank 'resolv.conf' gets added at /usr/etc/resolv.conf even though it is not present in any config or image. Since we were defaulting all the files previously using rsync for the ostree 3-way merge, we were unfortunately blanking out resolv.conf. That would have been okay except as part of that 'rsync' process, we were also losing the selinux context for /etc/resolv.conf, which meant that NetworkManager couldn't rewrite it because if was no longer net_t. This sidesteps the problem for now by only copying files that have definitely changed between images because we are already calculating that file list and have it available. Furthermore, we are running restorecon just to make sure that should a config image change one of the files with a special context, it will get set back properly. This also moves the drain before the file changes -- we don't want the pods to be affected by any of the config changes (that is why we are draining, after all).
Previously when we were handling kargs, we were failing to apply them on the initial transition because the mcdcontent.json file containing them did not exist in the initial image. We'd rebase then degrade, then the next time through, it would notice we rebased and skip the section. Then the next image update would work because the file was already on disk. Also, the Cat function that was retrieving the kargs data was assuming that the sequence number was always going to be '0' on deployments, and so depending on what had happened (live-apply) it would sometimes fail to find the MCDContent file on disk. This: - Decouples the rebase from the kernel arg handling - Uses the sequence number when reading files out of deployments so the files are properly found - Allows us to fall back to using the on-disk currentConfig to retrieve the kernel args during our first machineconfig to image rebase (since we delete existing, then add, we need to know what existing was)
This was just for playing with layering. We currently explode extensions into lists of packages but do not support arbitrary non-extension packages. Until we figure out an elegant in-cluster way (if any) for a user to supply arbitrary packages, I just hacked this in so they could specified in the extensions field.
This is a half-step that gets us out of having to have extra RBAC perms to let the daemon read secrets. For now, this just uses the machine-config-daemon's serviceaccount token which is mounted in /var/run/secrets. This is also apparently what openshift-controller-manager does to generate the dockercfg tokens, so having the operator inject them isn't really any lower risk. (You can reverse-engineer the sa token from the dockercfg secret). I don't know that I want to write the serviceaccount token to disk? I think I'd rather generate some other pull-only secret, but I don't know if there is a way to do that yet, so this is the best we have.
3443c17 to
b895594
Compare
| return err | ||
| } | ||
| // Fix the selinux context just in case it had special context | ||
| err = runCmdSync("restorecon", "-Fv", diffFile) |
There was a problem hiding this comment.
I think we can actually just use cp -Z btw.
cgwalters
left a comment
There was a problem hiding this comment.
Love the detailed commit messages! Overall, there's a lot of change here and I didn't review line by line, but direction looks good.
I think we'll be reviewing bits of this more carefully when we're getting ready to merge back to main/master.
Awesome work here!
| } | ||
|
|
||
| func setNodeAnnotations(client corev1client.NodeInterface, lister corev1lister.NodeLister, nodeName string, m map[string]string) (*corev1.Node, error) { | ||
| func setNodeAnnotations(client corev1client.NodeInterface, lister corev1lister.NodeLister, nodeName string, add, remove map[string]string) (*corev1.Node, error) { |
There was a problem hiding this comment.
Ah yes, we're starting to accumulate more merge conflicts...in this case with #3141
Not a blocker of course, just noting
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, jkyros The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, jkyros The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@jkyros: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is the build controller code so far. I will type better words later.
Still needs a bunch of cleanup.
Works like before where you create the "layered" pool
Difference from a user perspective is that desiredConfig becomes a
sha256:a7c711e2f7ff0608cfdc601bc03f21b0e6cb825aa7b75bd30da53ee4760b48ebinstead of a rendered configBuild status/failure detection is still in progress
This correlates with MCO-228
Current Release Image: https://quay.io/jkyros/ocp-release:pr-3071-may-11