openshift/os: fix user issues on validate job by miabbott · Pull Request #28790 · openshift/release

miabbott · 2022-05-23T17:33:49Z

The validate job started to fail with:

fatal: unsafe repository ('/go/src/github.com/openshift/os' is owned by someone else)
To add an exception for this directory, call:
   
       git config --global --add safe.directory /go/src/github.com/openshift/os

Applying this suggestion as part of the ci/validate.sh script
(see openshift/os#802) fails with the error:

+ git config --global --add safe.directory /go/src/github.com/openshift/os
error: could not lock config file //.gitconfig: Permission denied

I suspect this is related to how the random user ID is configured in
OCP pods, similar to what is described in openshift/os#781, so I tried
using the ci/set-openshift-user.sh script as part of the validate
job.

Through trial and error, I found that using the fcos-buildroot
container based on F36 would not work with this change and had to
switch to using the cosa:latest container. Going further down the
rabbit hole, I found that I didn't need to use the
ci/set-openshift-user.sh script at all and just the cosa:latest
container was enough to get the validate job to pass. I don't claim
to fully understand why this is the case, but it does effectively
unblock the validate job.

miabbott · 2022-05-23T18:04:11Z

/retest

miabbott · 2022-05-23T18:14:54Z

/test ci/rehearse/openshift/os/c9s/validate

openshift-ci · 2022-05-23T18:14:57Z

@miabbott: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test app-ci-config-dry
/test boskos-config
/test boskos-config-generation
/test build-clusters
/test build01-dry
/test build02-dry
/test build03-dry
/test build04-dry
/test build05-dry
/test ci-operator-config
/test ci-operator-config-metadata
/test ci-operator-registry
/test ci-secret-bootstrap-config-validation
/test ci-testgrid-allow-list
/test config
/test core-valid
/test correctly-sharded-config
/test deprecate-templates
/test generated-config
/test generated-dashboards
/test hive-dry
/test openshift-image-mirror-mappings
/test ordered-prow-config
/test owners
/test pj-rehearse-blocking
/test prow-config
/test prow-config-filenames
/test prow-config-semantics
/test pylint
/test release-config
/test release-controller-config
/test secret-generator-config-valid
/test services-valid
/test step-registry-metadata
/test step-registry-shellcheck
/test sync-rover-groups
/test yamllint

The following commands are available to trigger optional jobs:

/test arm01-dry
/test pj-rehearse
/test pj-rehearse-max
/test pj-rehearse-more
/test vsphere-dry

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-release-master-build-clusters
pull-ci-openshift-release-master-ci-operator-config
pull-ci-openshift-release-master-ci-operator-config-metadata
pull-ci-openshift-release-master-ci-operator-registry
pull-ci-openshift-release-master-core-valid
pull-ci-openshift-release-master-correctly-sharded-config
pull-ci-openshift-release-master-deprecate-templates
pull-ci-openshift-release-master-generated-config
pull-ci-openshift-release-master-ordered-prow-config
pull-ci-openshift-release-master-owners
pull-ci-openshift-release-master-pj-rehearse
pull-ci-openshift-release-master-prow-config-filenames
pull-ci-openshift-release-openshift-image-mirror-mappings
pull-ci-openshift-release-yamllint

Details

In response to this:

/test ci/rehearse/openshift/os/c9s/validate

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

miabbott · 2022-05-23T18:15:20Z

/retest

miabbott · 2022-05-23T18:40:55Z

/test pj-rehearse

miabbott · 2022-05-23T18:43:10Z

/test pj-rehearse

miabbott · 2022-05-23T20:22:20Z

/test pj-rehearse

miabbott · 2022-05-23T21:14:52Z

/test pj-rehearse

miabbott · 2022-05-24T12:53:17Z

/test pj-rehearse

miabbott · 2022-05-24T15:39:25Z

/test pj-rehearse

The `validate` job started to fail with: ``` fatal: unsafe repository ('/go/src/github.com/openshift/os' is owned by someone else) To add an exception for this directory, call: git config --global --add safe.directory /go/src/github.com/openshift/os ``` Applying this suggestion as part of the `ci/validate.sh` script (see openshift/os#802) fails with the error: ``` + git config --global --add safe.directory /go/src/github.com/openshift/os error: could not lock config file //.gitconfig: Permission denied ``` I suspect this is related to how the random user ID is configured in OCP pods, similar to what is described in openshift/os#781, so I tried using the `ci/set-openshift-user.sh` script as part of the `validate` job. Through trial and error, I found that using the `fcos-buildroot` container based on F36 would not work with this change and had to switch to using the `cosa:latest` container. Going further down the rabbit hole, I found that I didn't need to use the `ci/set-openshift-user.sh` script at all and just the `cosa:latest` container was enough to get the `validate` job to pass. I don't claim to fully understand why this is the case, but it does effectively unblock the `validate` job.

miabbott · 2022-05-24T16:27:00Z

cc: @cheesesashimi

Wanted to make you aware of this change and see if the error I encountered rang any bells

cheesesashimi · 2022-05-24T16:59:34Z

+    ./ci/validate.sh
  container:
-    from: src
+    from: coreos_coreos-assembler_latest


Changing this to coreos_coreos-assembler_latest would probably break because src points to what's defined under .images.build_root and I'm not sure if the COSA image has a ./ci/validate.sh script inside it.

Thinking about this and re-reading your description of this change, we may want to change this to build-test-qemu-img instead since that image will be the result of https://github.com/openshift/os/blob/master/ci/Dockerfile, which is the COSA image and the contents of the openshift/os repo. Doing that will include both the ./ci/vaildate.sh and ./ci/set-openshift-user.sh scripts.

Changing this to coreos_coreos-assembler_latest would probably break because src points to what's defined under .images.build_root and I'm not sure if the COSA image has a ./ci/validate.sh script inside it.

I'm confused about this, since the pj-rehearse jobs appear to have passed. If you look at the history for the job on this PR, you can see some failures where during some iterations the scripts weren't found due to incorrect paths, etc. From what I can tell it looks like the openshift/os repo is present (magic!) and the ci/validate.sh script is able to be executed successfully.

See the latest job where I stuck some debug output as part of the job - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/28790/rehearse-28790-pull-ci-openshift-os-master-validate/1529165858037829632/artifacts/test/build-log.txt

Now I'm a little confused as well! But I think I have an understanding of what's going on here and can explain the source of my confusion (and hopefully make you less confused as well!):

The Dockerfile for openshift/os is in ci/Dockerfile and is what becomes build-test-qemu-img. We probably should make this the .build_root part of the CI config, but that's a separate concern and not relevant right now. The resulting image is what the COSA build scripts use to run. It has the scripts and the layering test binary from openshift/os layered on top of the coreos-assembler image. This is why I thought we should use that image to run ./ci/validate.sh.

The src image is registry.ci.openshift.org/coreos/fcos-buildroot:testing-devel, produced by the coreos/fedora-coreos-config repo. Interestingly, the validate step is the only part of the openshift/os CI config that uses this image. None of the other image builds or tests directly consume or use this image.

The "magic" is that before the validate test runs, a bunch of init containers are run. Amongst those is a cloneRefs step ($ curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/28790/rehearse-28790-pull-ci-openshift-os-master-validate/1529165858037829632/artifacts/ci-operator-step-graph.json' | jq -r '.[1].manifests[0].spec.initContainers[].name') which clones openshift/os (and the fedora-coreos-config submodule).

So in a nutshell, what's happening in the validate step is we're taking registry.ci.openshift.org/coreos/coreos-assembler:latest, cloning openshift/os (and the fedora-coreos-config submodule) into it and running ./ci/validate.sh. My confusion came from forgetting about the cloneRefs step and thinking that since coreos-assembler:latest doesn't have the ./ci/validate.sh script present that it would fail.

Thanks for the detailed information, Zack! It certainly helps improve my understanding of the flow of the jobs.

Interestingly, the validate step is the only part of the openshift/os CI config that uses this image. None of the other image builds or tests directly consume or use this image.

Should we drop the use of the fcos-buildroot as part of this PR? Would we need to reconfigure build_root to point to another image?

Is the cloneRefs step logged in any of the test artifacts?

If you think the PR is good to go as is, please drop an /approve if you can.

Should we drop the use of the fcos-buildroot as part of this PR? Would we need to reconfigure build_root to point to another image?

We could configure it to point to ci/Dockerfile and build that. We'd then need to replace build-test-qemu-img with src. There's no rush in doing that, so if we want to do that, let's handle that as a separate PR.

Is the cloneRefs step logged in any of the test artifacts?

Unfortunately it's not. From what I can tell, the container used purposely does not create logs (although I don't know why). You might be able to catch it while the job is running if you click through to the console for the CI namespace. The only place I was really able to find it was in the job test steps artifact and even then, I had to know that it was an init container.

If you think the PR is good to go as is, please drop an /approve if you can.

Will do.

cheesesashimi · 2022-05-25T16:41:22Z

/approve
/lgtm

openshift-ci · 2022-05-25T16:45:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, miabbott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift/os/OWNERS~~ [miabbott]
~~ci-operator/jobs/openshift/os/OWNERS~~ [miabbott]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2022-05-25T17:02:56Z

@miabbott: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci · 2022-05-25T17:03:25Z

@miabbott: Updated the following 3 configmaps:

ci-operator-misc-configs configmap in namespace ci at cluster app.ci using the following files:
- key openshift-os-c9s.yaml using file ci-operator/config/openshift/os/openshift-os-c9s.yaml
ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
- key openshift-os-master.yaml using file ci-operator/config/openshift/os/openshift-os-master.yaml
- key openshift-os-master__periodic.yaml using file ci-operator/config/openshift/os/openshift-os-master__periodic.yaml
job-config-misc configmap in namespace ci at cluster app.ci using the following files:
- key openshift-os-c9s-presubmits.yaml using file ci-operator/jobs/openshift/os/openshift-os-c9s-presubmits.yaml

Details

In response to this:

The validate job started to fail with:
fatal: unsafe repository ('/go/src/github.com/openshift/os' is owned by someone else)
To add an exception for this directory, call:
   
       git config --global --add safe.directory /go/src/github.com/openshift/os
Applying this suggestion as part of the ci/validate.sh script
(see openshift/os#802) fails with the error:
+ git config --global --add safe.directory /go/src/github.com/openshift/os
error: could not lock config file //.gitconfig: Permission denied
I suspect this is related to how the random user ID is configured in
OCP pods, similar to what is described in openshift/os#781, so I tried
using the ci/set-openshift-user.sh script as part of the validate
job.

Through trial and error, I found that using the fcos-buildroot
container based on F36 would not work with this change and had to
switch to using the cosa:latest container. Going further down the
rabbit hole, I found that I didn't need to use the
ci/set-openshift-user.sh script at all and just the cosa:latest
container was enough to get the validate job to pass. I don't claim
to fully understand why this is the case, but it does effectively
unblock the validate job.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci Bot requested review from cgwalters and dustymabe May 23, 2022 17:35

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 23, 2022

miabbott force-pushed the openshift_user_validate branch from 2b01bcc to 97dcf63 Compare May 23, 2022 17:38

miabbott marked this pull request as draft May 23, 2022 17:50

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 23, 2022

miabbott force-pushed the openshift_user_validate branch from 97dcf63 to 4ca23e4 Compare May 23, 2022 18:03

miabbott force-pushed the openshift_user_validate branch from 52a1452 to df64658 Compare May 23, 2022 21:14

miabbott force-pushed the openshift_user_validate branch from df64658 to 50f3327 Compare May 24, 2022 12:53

miabbott force-pushed the openshift_user_validate branch from 50f3327 to ddace0f Compare May 24, 2022 15:36

miabbott mentioned this pull request May 24, 2022

ci: fix prow-build.sh to fetch same repos openshift/os#802

Merged

miabbott marked this pull request as ready for review May 24, 2022 16:04

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2022

openshift-ci Bot requested a review from prestist May 24, 2022 16:05

miabbott force-pushed the openshift_user_validate branch from 9454f59 to 844fb2d Compare May 24, 2022 16:05

miabbott force-pushed the openshift_user_validate branch from 844fb2d to c497602 Compare May 24, 2022 16:16

cheesesashimi reviewed May 24, 2022

View reviewed changes

miabbott force-pushed the openshift_user_validate branch from bc94f3c to c497602 Compare May 24, 2022 23:21

miabbott mentioned this pull request May 25, 2022

tree-wide: Minor whitespace fixes openshift/os#804

Merged

openshift-ci Bot assigned cheesesashimi May 25, 2022

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 25, 2022

openshift-merge-robot merged commit 02ba5c0 into openshift:master May 25, 2022

miabbott mentioned this pull request May 31, 2022

new periodic jobs are failing on a missing repo openshift/os#801

Closed

Conversation

miabbott commented May 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miabbott commented May 23, 2022

Uh oh!

miabbott commented May 23, 2022

Uh oh!

openshift-ci Bot commented May 23, 2022

Uh oh!

miabbott commented May 23, 2022

Uh oh!

miabbott commented May 23, 2022

Uh oh!

miabbott commented May 23, 2022

Uh oh!

miabbott commented May 23, 2022

Uh oh!

miabbott commented May 23, 2022

Uh oh!

miabbott commented May 24, 2022

Uh oh!

miabbott commented May 24, 2022

Uh oh!

miabbott commented May 24, 2022

Uh oh!

cheesesashimi May 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cheesesashimi May 24, 2022

Choose a reason for hiding this comment

Uh oh!

miabbott May 24, 2022

Choose a reason for hiding this comment

Uh oh!

cheesesashimi May 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miabbott May 25, 2022

Choose a reason for hiding this comment

Uh oh!

cheesesashimi May 25, 2022

Choose a reason for hiding this comment

Uh oh!

cheesesashimi commented May 25, 2022

Uh oh!

openshift-ci Bot commented May 25, 2022

Uh oh!

openshift-ci Bot commented May 25, 2022

Uh oh!

openshift-ci Bot commented May 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

miabbott commented May 23, 2022 •

edited

Loading

cheesesashimi May 24, 2022 •

edited

Loading

cheesesashimi May 24, 2022 •

edited

Loading