Merge layering into master#3072
Conversation
…ering-test-package adds e2e-layering test package
…layering-branch sync layering with master
Part of openshift/enhancements#1032 We'll add the new-format image into the payload alongside the old one until we can complete the transition. (There may actually be a separate `rhel-coreos-extensions` image e.g. too, so this is just the start) Note this PR is just laying groundwork; the new format container will not be used by default.
Add `rhel-coreos` image in configmap and `oscontainer` in controllerconfig
This will keep layered and non-layered update logging consistent
Merge master into layering
To properly handle compression. Prep for using butane.
Prep for using Butane APIs more directly as part of the layering work. The logic is also a bit reworked to generate a single Butane fragment which we convert to Ignition in one go, instead of converting butane into ignition repeatedly and using config merging. There's only one wrinkle with doing this, which is that today in the templates we have multiple files which are drop-ins for `crio.service`; and we need to group those together. (I think it would be cleaner to have them in a single file in the templates, but for clarity let's handle this)
We were testing this indirectly.
Update vendoring github.com/coreos/fcct → github.com/coreos/butane
Created MCONamespace constant and used in all *.go files except for test/helpers/utils.go which would create a cyclic import
Create MCONamespace constant
I don't know that we'll ultimately end up with imagestreams with these names but for right now, I think these are at least the ones we think we want that have purposes behind them.
Extends clientbuilder to build openshift Image and Build clients, as our controllers will need to deal with those objects as part of the "build controller" for layering.
Adds a shared ImageInformer to our shared controller context so we can watch imagestreams in our controllers.
Allows using a label to designate a pool as a "layred" pool. Adds an imagestream informer to watch for imagestream updates and queue the pool again for image deployment. Updates tests with new render controller signature. Layered pools will create resources (buildconfigs/imagestreams owned by the pool) to build in-cluster derived images. Layered pools: - Ensure that a base coreos imagestream exists - Ensure that pool-specific imagestreams exist - Ensure that pool-specific buildconfig exist - Render machineconfig into images in imagestreams - Derive images from the coreos imagestream using the content from that rendered-config image stream using aforementioned buildconfig - Will be enqueued again (so node controller can sync them) when their corresponding imagestream gets updated
For a "layred" pool, this makes the node controller: - Wait for an image that is equivalent to the rendered config - Once that image arrives, assign it as an additional (along with desiredConfig) desiredImage annotation on the node The "done" signal for a node at this point is still when currentConfig == desiredConfig, it's just that how we're going to get there now is via the image rather than via applying the config directly. Also refactored setDesiredMachineConfigAnnotation to a generic function that sets an arbitrary annotation (because duplicating for image annotations made "verify" unhappy)
This adds some daemon constants for current/desired image annotations, as well as extending our rpm-ostree "client bindings" to be able to: - Execute a container rebase - Parse more of the `rpm-ostree --status` JSON output so we can figure out whether we're booted into a deployment or not (for live-apply cases)
This give rbac permissions to the machine-config-controller service account to manipulate Builds/BuildRequests and Imagestreams in the machine-config-operator namespace, as well as push images into the internal registry. This gives rbac permissions to the machine-config-daemon serviceaccount to retrieve images from the internal registry. This also updates the operator's sync so that these new registry role files will be generated and deployed properly with the mco.
This creates a separate update flow for "layered" pools, so they will be ignored/not be handled by the standard update function. This will not impact any pool that is not layered. This also tells the config drift detector to ignore OSImageURL for layered pools because it will not match.
A lot of this is version bumps, but go-containerregistry (the thing I used to create my tiny machineconfig wrapper images without a docker build) additionally drags in the following dependencies that nothing else uses: - github.com/spf13/viper/ - github.com/containerd/stargz-snapshotter/ - github.com/docker/cli/ - github.com/docker/docker/pkg/homedir/ This isn't terrible, so I'm leaving it for now since it works, but at some point this might get obviated by buildah or something
…ntroller In cluster build/"build controller" proof-of-concept
Use MCONamespace constant in getPullSecret()
Move log statement to UpdateTuningArgs
|
This is based on #3060 and then a |
|
/test e2e-gcp-op |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| from: | ||
| kind: DockerImage | ||
| name: registry.svc.ci.openshift.org/openshift:machine-os-content | ||
| - name: rhel-coreos |
There was a problem hiding this comment.
Hmm, actually doing this bit blocks on https://issues.redhat.com/browse/ART-3883
|
/hold |
jkyros
left a comment
There was a problem hiding this comment.
Some of the cans I know I kicked:
| - apiGroups: ["operator.openshift.io"] | ||
| resources: ["etcds"] | ||
| verbs: ["get", "list", "watch"] | ||
| - apiGroups: ["build.openshift.io"] |
There was a problem hiding this comment.
I put in a bunch of sloppy RBAC that we need to be more deliberate about.
There was a problem hiding this comment.
Hmm, right I think we can actually make these bits a namespaced role, not a clusterrole. But not a blocker IMO.
| apiextinformers.WithNamespace(targetNamespace), apiextinformers.WithTweakListOptions(assignFilterLabels)) | ||
| configSharedInformer := configinformers.NewSharedInformerFactory(configClient, resyncPeriod()()) | ||
| operatorSharedInformer := operatorinformers.NewSharedInformerFactory(operatorClient, resyncPeriod()()) | ||
| imageSharedInformer := imageinformers.NewSharedInformerFactory(imageClient, resyncPeriod()()) |
There was a problem hiding this comment.
Would like to think about whether we should filter this to our namespace, there's probably a ton of image traffic. (although if we feature gate this and never start the informer, it doesn't matter)
There was a problem hiding this comment.
That's a great example of a potential risk! Let's tighten that up indeed.
There was a problem hiding this comment.
The number of watches and any unusual increases are being monitored so this is a good callout 👍
|
|
||
| glog.Infof("Pool %s is s special pool, rendering config to imagestream", pool.Name) | ||
| // jkyros: Just put this here for now since it's convenient | ||
| err = ctrl.experimentalRenderToImageStream(pool, generated) |
There was a problem hiding this comment.
Are we cool with the "push to imagestream approach" or do we need to figure out a service endpoint and retrieve this from inside the build?
There was a problem hiding this comment.
For the purposes of this PR, I'd like to just evaluate "risks to not-layered".
There was a problem hiding this comment.
(It does add and version bump some dependencies but nothing that seemed overtly dangerous )
|
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
There's a more intensive series of tests from TRT that we should run this PR against. I'll figure out how to do that and follow up once I know more. |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Did a general read. The changes seem pretty well gated behind a few annotations existing, etc.
A few notes:
- There's a decent amount of refactoring on our postConfigChange action code, which is a bit hard to read from the diff. I assume though the idea is that we don't actually change that flow, just how the functions are used, so we should be good as long as we can validate that
- The changes above (and refactors to the update functions) likely will minorly affect the hypershift code, but we should be able to consolidate. Just something to consider when/if we merge
| } | ||
| newformat, ok := cm.Data["baseOperatingSystemContainer"] | ||
| if !ok { | ||
| return "", "", fmt.Errorf("Missing baseOperatingSystemContainer from configmap") |
There was a problem hiding this comment.
Is baseOperatingSystemContainer mandatory to have? On a non-layered system is it just empty (but the key exists?)
|
@cgwalters: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Just a note, I think we need to:
|
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Let's keep this PR as a running update. Goal: evaluate specific risky areas of code that might block a merge to master.