Skip to content

[draft] merge layering into master#3046

Closed
mkenigs wants to merge 57 commits intomasterfrom
layering
Closed

[draft] merge layering into master#3046
mkenigs wants to merge 57 commits intomasterfrom
layering

Conversation

@mkenigs
Copy link
Copy Markdown
Contributor

@mkenigs mkenigs commented Mar 30, 2022

I'm just curious how painful this would be/want to keep an eye on what code paths we're actually changing on master

cheesesashimi and others added 30 commits March 7, 2022 16:10
…t-package

adds e2e-layering test package
Part of openshift/enhancements#1032

We'll add the new-format image into the payload alongside the old
one until we can complete the transition.

(There may actually be a separate `rhel-coreos-extensions` image
 e.g. too, so this is just the start)

Note this PR is just laying groundwork; the new format container
will not be used by default.
Add `rhel-coreos` image in configmap and `oscontainer` in controllerconfig
This will keep layered and non-layered update logging consistent
To properly handle compression.  Prep for using butane.
Prep for using Butane APIs more directly as part of the layering
work.

The logic is also a bit reworked to generate a single Butane fragment
which we convert to Ignition in one go, instead of converting
butane into ignition repeatedly and using config merging.

There's only one wrinkle with doing this, which is that today in
the templates we have multiple files which are drop-ins for `crio.service`;
and we need to group those together.

(I think it would be cleaner to have them in a single file in the
 templates, but for clarity let's handle this)
We were testing this indirectly.
Update vendoring github.com/coreos/fcct → github.com/coreos/butane
Created MCONamespace constant and used in all *.go files except for
test/helpers/utils.go which would create a cyclic import
I don't know that we'll ultimately end up with imagestreams with these
names but for right now, I think these are at least the ones we think
we want that have purposes behind them.
Extends clientbuilder to build openshift Image and Build clients, as our
controllers will need to deal with those objects as part of the "build
controller" for layering.
Adds a shared ImageInformer to our shared controller context so we can watch imagestreams
in our controllers.
Allows using a label to designate a pool as a "layred" pool.

Adds an imagestream informer to watch for imagestream updates and
queue the pool again for image deployment.

Updates tests with new render controller signature.

Layered pools will create resources (buildconfigs/imagestreams owned by the pool) to build
in-cluster derived images.

Layered pools:
- Ensure that a base coreos imagestream exists
- Ensure that pool-specific imagestreams exist
- Ensure that pool-specific buildconfig exist
- Render machineconfig into images in imagestreams
- Derive images from the coreos imagestream using  the content from that
rendered-config image stream using aforementioned buildconfig
- Will be enqueued again (so node controller can sync them) when their corresponding imagestream gets
updated
For a "layred" pool, this makes the node controller:
- Wait for an image that is equivalent to the rendered config
- Once that image arrives, assign it as an additional (along with desiredConfig) desiredImage annotation on the
node

The "done" signal for a node at this point is still when currentConfig ==
desiredConfig, it's just that how we're going to get there now is via
the image rather than via applying the config directly.

Also refactored setDesiredMachineConfigAnnotation to a generic function that sets an arbitrary annotation
(because duplicating for image annotations made "verify" unhappy)
This adds some daemon constants for current/desired image annotations,
as well as extending our rpm-ostree "client bindings" to be able to:
- Execute a container rebase
- Parse more of the `rpm-ostree --status` JSON output so we can figure
out whether we're booted into a deployment or not (for live-apply
cases)
This give rbac permissions to the machine-config-controller service
account to manipulate Builds/BuildRequests and Imagestreams in the
machine-config-operator namespace, as well as push images into the internal registry.

This gives rbac permissions to the machine-config-daemon serviceaccount
to retrieve images from the internal registry.

This also updates the operator's sync so that these new registry role files will be
generated and deployed properly with the mco.
This creates a separate update flow for "layered" pools, so they will be
ignored/not be handled by the standard update function.

This will not impact any pool that is not layered.

This also tells the config drift detector to ignore OSImageURL for
layered pools because it will not match.
A lot of this is version bumps, but go-containerregistry (the thing I used to create my tiny machineconfig wrapper images without a docker build) additionally drags in the following dependencies that nothing else uses:
- github.com/spf13/viper/
- github.com/containerd/stargz-snapshotter/
- github.com/docker/cli/
- github.com/docker/docker/pkg/homedir/

This isn't terrible, so I'm leaving it for now since it works, but
at some point this might get obviated by buildah or something
In cluster build/"build controller" proof-of-concept
Use MCONamespace constant in getPullSecret()
Move log statement to UpdateTuningArgs
Layered updates will only need to perform post config change actions.

Comments about uncordoning the node and rebooting immediately were
dropped as they have been out of date since
d200340
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 30, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 30, 2022

@mkenigs: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 30, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkenigs
To complete the pull request process, please assign yuqi-zhang after the PR has been reviewed.
You can assign the PR to them by writing /assign @yuqi-zhang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kikisdeliveryservice kikisdeliveryservice added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2022
openshift-merge-robot and others added 19 commits March 31, 2022 15:57
Factor out functionality from update() that will be needed for layering
This is a refactor to allow reusing updateKernelArgs() for layering. It
is a minor improvement for master since it better protects
MachineConfigs from unintended modification by updateKernelArgs(), and
it simplifies test code which doesn't have to create dummy
MachineConfigs
updateKernelArgs: pass kargs instead of MCs
Create machine_config_to_ignition
Add a comment to Rebase
Don't return changed since there should be an error if nothing changed
Remove logs about pivot and custom origin since we don't use pivot:// or
--custom-origin-url for layered rebases
Also add LiveReplaced field

A layered update will need to perform operations on the staged and
deployed Deployments, which will require variables of type Deployment
Factored some code out of experimentalUpdateLayeredConfig into a
function in daemon.go since eventually we'll want to write pull secret
on daemon startup. We don't handle layering there currently so for now
we'll continue to call the function in experimentalUpdateLayeredConfig

Also moved getPullSecret to daemon
Translate MachineConfig to Ignition and pass to mco-build-content build
Currently the Docker build leaves behind some files that trigger a
reboot when we should be applying live, so those files should be cleaned
up instead.
Set ImageOptimizationPolicy=ImageOptimizationSkipLayers since currently
ostree does not support whiteouts so file deletions won't work unless
layers are squashed
Integrate functionality from update into layered update
There is an issue where after some period of time,(usually around 20
minutes of the cluster being up)  the requests to images start getting
forwarded directly to the cloud storage buckets wherethey reside.

Those endpoints present certificates that are signed by
the cloud provider (e.g. amazon,google,microsoft,etc) and not by our
cluster certificates. We did not previously load the system cert pool,
so those requests would fail.

This makes sure we also load our system cert pool so we can verify those
certificates and don't fail with x509 errors when we attempt to
communicate with the registry.
Load system cert pool for image push to accommodate direct cloud storage access
@cgwalters
Copy link
Copy Markdown
Member

Closing in favor of #3072

@cgwalters cgwalters closed this Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants