Skip to content

test openshift os in openshift#26912

Closed
cheesesashimi wants to merge 2 commits intoopenshift:masterfrom
cheesesashimi:zzlotnik/openshift-os-testing
Closed

test openshift os in openshift#26912
cheesesashimi wants to merge 2 commits intoopenshift:masterfrom
cheesesashimi:zzlotnik/openshift-os-testing

Conversation

@cheesesashimi
Copy link
Copy Markdown
Member

@cheesesashimi cheesesashimi commented Mar 10, 2022

This PR makes use of the new scripts found in openshift/os to build and test an RHCOS image. The new tests include an OS derivation test wherein the candidate RHCOS image is mutated in a test cluster, applied to a cluster node, and the cluster node rebooted to verify that it successfully boots into the new OS.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 10, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot requested review from saqibali-2k and travier March 10, 2022 19:20
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2022
@cheesesashimi cheesesashimi force-pushed the zzlotnik/openshift-os-testing branch from e64222b to 818701b Compare April 4, 2022 14:40
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 4, 2022
@cheesesashimi cheesesashimi force-pushed the zzlotnik/openshift-os-testing branch 2 times, most recently from ae78aff to f0a265b Compare April 4, 2022 17:26
@cheesesashimi cheesesashimi force-pushed the zzlotnik/openshift-os-testing branch from b06c71d to 664d146 Compare April 6, 2022 20:50
@cheesesashimi cheesesashimi marked this pull request as ready for review April 11, 2022 18:43
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 11, 2022
@openshift-ci openshift-ci Bot requested review from bgilbert and cgwalters April 11, 2022 18:43
@cgwalters
Copy link
Copy Markdown
Member

Neat! So I think ultimately we can probably extend this to support a full flow where the OS build is actually used for the initial update target, i.e. we override machine-os-content or equivalent too.

But testing one worker node is a very useful pattern.

@cgwalters
Copy link
Copy Markdown
Member

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, cheesesashimi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 11, 2022
@cgwalters
Copy link
Copy Markdown
Member

Hmm though, a lot of PRs today have near-zero chance to break cluster provisioning or layering.
I think right now our current "build-test-qemu" flow is a decent gate - this is making that CI check much more expensive and likely to flake (e.g. capacity issues).

WDYT about introducing e.g. /test e2e-node-layering as an optional opt-in context (and continue to execute it in a periodic).

Ultimately if we start using lockfiles for RHCOS - that's where pull request based gating would become much more useful, because that's where we'd be able to gate new kernel/systemd and really all the other packages.

(And if we had that, then we could drop the promotion job)

@travier
Copy link
Copy Markdown
Member

travier commented Apr 12, 2022

Hmm though, a lot of PRs today have near-zero chance to break cluster provisioning or layering. I think right now our current "build-test-qemu" flow is a decent gate - this is making that CI check much more expensive and likely to flake (e.g. capacity issues).

WDYT about introducing e.g. /test e2e-node-layering as an optional opt-in context (and continue to execute it in a periodic).

That's my concern too. It would be preferable to have two different tests to decouple the failures. Can we have tests launch only if some other tests succeed in Prow?

@cheesesashimi
Copy link
Copy Markdown
Member Author

WDYT about introducing e.g. /test e2e-node-layering as an optional opt-in context (and continue to execute it in a periodic).

I'm not opposed to decoupling the tests. However, it is a bit more problematic than one might think. I'm not sure if such a thing is possible because (to answer @travier's) question, you can't have a top-level test execute conditionally based upon the outcome of another top-level test in OpenShift CI. A test can define multiple steps which execute sequentially (and only if the previous step succeeds), but that runs into the same problem we have now.

The second issue I can see with the two tests being decoupled is: Because we cannot use the standard OpenShift CI workflow (e.g., the payload is produced by the images stage, the test stages consume the payload, and the post-submit promotes the payload), both tests would have to build the image, run it through the relevant tests, then push the image to our registry namespace upon completion. While this isn't a problem for PR builds (because we decided not to produce artifacts during PR builds), it would become a problem for periodics. In that case, which image should be promoted to :latest (or the relevant branch tag)? Are RHCOS builds deterministic? What would happen if one of the test stages fails? Also, because we cannot use the OpenShift CI system as intended, the e2e-node-layering test produces an artifact regardless so that it can be ingested by the test cluster since, to my understanding, there is no way for me to push an arbitrary image to the ephemeral CI ImageStream.

This is my naiveté about cosa and RHCOS builds showing, and I don't know if such a thing is possible, but: It would be great if we could ingest the latest base RHCOS OCI image from our dev pipeline, apply the configs from openshift/os to it via an image build, which would then allow us to use the OpenShift CI system as intended. In fact I think this is exactly what's being experimented with for FCOS and OKD: openshift/okd-machine-os#299.

@travier
Copy link
Copy Markdown
Member

travier commented Apr 13, 2022

I'm not up to speed about how Prow works and what we can do so this might not make sense.

Could we do the image build in the build stage and store the result directly in the container image? We could then pull it for testing with distinct tests:

  • kola (full)
  • kola (basic) + e2e
  • kola (basic) + layering test

Similarly, I'm working on #27686 to make SCOS builds from the same branch.

@cheesesashimi
Copy link
Copy Markdown
Member Author

cheesesashimi commented Apr 13, 2022

@travier Funny you ask that question! When I was doing early exploratory work, I didn't think that was possible. The issue that I was running into was somehow converting the OCI image archive that COSA produces into an image without doing something like $ skopeo copy. I was getting stuck on what I thought was an issue with the OpenShift image builder (it was appending a / onto the path for the OCI archive). On a hunch yesterday afternoon, I revisited my assumptions, and it turns out that it is possible to convert that OCI archive into your final image; you have to configure your image build to accept an input and then you have to provide an absolute path to where that input is, keeping in mind that the OpenShift image builder places it in /tmp/build/inputs/<input name>.

In the example highlighted above, I'm using skopeo to pull an image into an OCI archive and placing that into the build context so that I can then do FROM oci-archive:/absolute/path/to/oci/archive to create the final image from the on-disk OCI archive. It's a contrived example to determine if this is possible since the source of the OCI archive isn't important. I was able to get $ cosa build to run in an image build yesterday although I'll need to move the testing steps to test phases as you've described. I plan on opening a PR to openshift/os today to decouple the cosa build and kola test portions of the build and test script.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 25, 2022

@cheesesashimi: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cheesesashimi
Copy link
Copy Markdown
Member Author

This has been superceded by #27779

@cheesesashimi cheesesashimi deleted the zzlotnik/openshift-os-testing branch May 3, 2022 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants