[4.11] Backports for RHCOS CI (--transient option & test entrypoint)#2963
[4.11] Backports for RHCOS CI (--transient option & test entrypoint)#2963cgwalters merged 15 commits intocoreos:rhcos-4.11from
Conversation
We go to a lot of effort to create various caches of builds, and this is very useful for local incremental development to avoid re-downloading RPMs and rewriting data. It is useless for CI builds that discard everything when they're done. Add a `cosa init --transient` flag that for now just disables `fsync()` on `tmp/repo`. There's more we can do here in the future, for example we need to propagate this too into `cache/repo-build` and `cache/pkgcache-repo`, and actually in general we don't need the pkgcache repo at all here.
To make it clearer that things worked.
Followup to the addition of `--transient` for CI flows. If we had coreos/rpm-ostree#3719 this would mostly be unnecessary, but today in CI and prod builds, ostree is invoking `fsync()` inside the supermin VM, and then qemu in the container is going to `fsync()` all changes down to the host system (overlay)fs. For transient flows, switch to `cache=unsafe` so we stop doing that which should greatly help speed.
So we test it, and so we also gain the speed benefits.
Hopefully will fix the `ci/prow/rhcos` context here.
Use a COSA specifc test entry point to focus on tests relevant for COSA.
|
Looks like CI here hit #2946 - let's backport that too? |
This is part of coreos/fedora-coreos-tracker#1151 Our generated disk images are largely just a "shell" around the egg of an ostree commit. There is almost nothing that lives in the disk image that isn't in the commit. (This is especially true now that a preparatory commit previous to this moved the *content* of our static `grub.cfg` into `image.json`) In the original coreos-assembler design I'd tried to cleanly separate builds of the ostree from disk image builds, but also support linking them together (with matching version numbers, etc.) The separate `image.yaml` was part of this. This...mostly worked. This change furthers that separation by having image builds input from *just the ostree commit*. Crucially we would no longer need the config git repository to perform an image build. And this in turn unlocks truly better separating ostree builds from disk image builds in the pipeline *and* supporting downstream tooling generating disk images from custom containers. One neat thing here is we will finally fix a longstanding issue where coreos-assembler fails when just the `image.yaml` changes: Closes: coreos#972
For now, we need to handle this. But I think we can switch back to hard requiring it later, which will happen naturally when we rework the build system to more cleanly separate ostree container builds and disk image builds. Closes: coreos#2885
If the cache was nuked, we should re-download everything even if nothing changed since the last build. Otherwise, it's impossible to do a `cosa build --force` afterwards.
Prep for future patch.
This inlines back `impl_rpmostree_compose` in the two places where it's called. Prep for future patch.
Prep for future patch.
Now that the variable is declared publicly, we can just add it to the list of rpm-ostree args as part of the compose helper. This has no functional change in the `cosa build` path. In the `cosa fetch` path, this will add the switch but rpm-ostree will ignore it since we're not actually writing a commit.
All the other overlays have the prefix `overlay/`. Do that here too for consistency.
Instead of having rpm-ostree effectively doing a `pull-local` under the hood over 9p, change things so we compose in a local repo, export the commit to an archive and only copy *that* over 9p. This should greatly help with pipelines hitting ENOMEM due to transferring many small files over 9p: openshift/os#594 An initial approach exported to OCI archive instead, but encapsulating and unencapsulating are more expensive operations. Unfortunately, we still need to unpack into `tmp/repo` given that many follow-up commands expect the commit to be available in `tmp/repo`. If we can drop that assumption (and get rid of `tmp/repo` entirely), we could refactor things further so that `compose.sh` creates the final chunked OCI artifact upfront. In local testing, this adds about 30s to `cosa build`. We still compress just once, and we get hardlinks pulling out of the tarball.
|
I've added
This is starting to look like a big backport. |
|
Tagging @jlebon as I'm not sure this is complete |
Maybe do the 9p changes as a separate backport? |
|
We could also force reset the branch to current main...a similar debate in coreos/rpm-ostree#3819
No objection from me, but it also seems to me we need the 9p fixes. CI here failed on: Hmm, though it seems like they succeeded in a retest? /retest |
|
I'm leaning toward a force reset too to the first commit that has all the changes that we need. |
I'm really gun shy about doing a force reset on a branch that should be in code freeze. We'd need to do incredibly thorough testing/analysis to ensure we don't introduce any unwanted changes. |
|
Yeah, let's get this merged now since we have green CI. We can choose to do a force push later too. |
Backports for 4.11:
--transientflag for ephemeral builds #2889pull-localover 9p during rpm-ostree compose #2946Revealed by: openshift/release#29949