-
Notifications
You must be signed in to change notification settings - Fork 2.2k
openshift/os: fix user issues on validate job #28790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
openshift-merge-robot
merged 1 commit into
openshift:master
from
miabbott:openshift_user_validate
May 25, 2022
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this to
coreos_coreos-assembler_latestwould probably break becausesrcpoints to what's defined under.images.build_rootand I'm not sure if the COSA image has a./ci/validate.shscript inside it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this and re-reading your description of this change, we may want to change this to
build-test-qemu-imginstead since that image will be the result of https://github.com/openshift/os/blob/master/ci/Dockerfile, which is the COSA image and the contents of theopenshift/osrepo. Doing that will include both the./ci/vaildate.shand./ci/set-openshift-user.shscripts.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about this, since the
pj-rehearsejobs appear to have passed. If you look at the history for the job on this PR, you can see some failures where during some iterations the scripts weren't found due to incorrect paths, etc. From what I can tell it looks like theopenshift/osrepo is present (magic!) and theci/validate.shscript is able to be executed successfully.See the latest job where I stuck some debug output as part of the job - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/28790/rehearse-28790-pull-ci-openshift-os-master-validate/1529165858037829632/artifacts/test/build-log.txt
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm a little confused as well! But I think I have an understanding of what's going on here and can explain the source of my confusion (and hopefully make you less confused as well!):
openshift/osis inci/Dockerfileand is what becomesbuild-test-qemu-img. We probably should make this the.build_rootpart of the CI config, but that's a separate concern and not relevant right now. The resulting image is what the COSA build scripts use to run. It has the scripts and the layering test binary fromopenshift/oslayered on top of thecoreos-assemblerimage. This is why I thought we should use that image to run./ci/validate.sh.srcimage isregistry.ci.openshift.org/coreos/fcos-buildroot:testing-devel, produced by thecoreos/fedora-coreos-configrepo. Interestingly, thevalidatestep is the only part of theopenshift/osCI config that uses this image. None of the other image builds or tests directly consume or use this image.validatetest runs, a bunch of init containers are run. Amongst those is acloneRefsstep ($ curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/28790/rehearse-28790-pull-ci-openshift-os-master-validate/1529165858037829632/artifacts/ci-operator-step-graph.json' | jq -r '.[1].manifests[0].spec.initContainers[].name') which clonesopenshift/os(and thefedora-coreos-configsubmodule).So in a nutshell, what's happening in the
validatestep is we're takingregistry.ci.openshift.org/coreos/coreos-assembler:latest, cloningopenshift/os(and thefedora-coreos-configsubmodule) into it and running./ci/validate.sh. My confusion came from forgetting about thecloneRefsstep and thinking that sincecoreos-assembler:latestdoesn't have the./ci/validate.shscript present that it would fail.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed information, Zack! It certainly helps improve my understanding of the flow of the jobs.
Should we drop the use of the
fcos-buildrootas part of this PR? Would we need to reconfigurebuild_rootto point to another image?Is the
cloneRefsstep logged in any of the test artifacts?If you think the PR is good to go as is, please drop an
/approveif you can.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could configure it to point to
ci/Dockerfileand build that. We'd then need to replacebuild-test-qemu-imgwithsrc. There's no rush in doing that, so if we want to do that, let's handle that as a separate PR.Unfortunately it's not. From what I can tell, the container used purposely does not create logs (although I don't know why). You might be able to catch it while the job is running if you click through to the console for the CI namespace. The only place I was really able to find it was in the job test steps artifact and even then, I had to know that it was an init container.
Will do.