Skip to content

openshift/os: fix user issues on validate job#28790

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
miabbott:openshift_user_validate
May 25, 2022
Merged

openshift/os: fix user issues on validate job#28790
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
miabbott:openshift_user_validate

Conversation

@miabbott
Copy link
Copy Markdown
Member

@miabbott miabbott commented May 23, 2022

The validate job started to fail with:

fatal: unsafe repository ('/go/src/github.com/openshift/os' is owned by someone else)
To add an exception for this directory, call:
   
       git config --global --add safe.directory /go/src/github.com/openshift/os

Applying this suggestion as part of the ci/validate.sh script
(see openshift/os#802) fails with the error:

+ git config --global --add safe.directory /go/src/github.com/openshift/os
error: could not lock config file //.gitconfig: Permission denied

I suspect this is related to how the random user ID is configured in
OCP pods, similar to what is described in openshift/os#781, so I tried
using the ci/set-openshift-user.sh script as part of the validate
job.

Through trial and error, I found that using the fcos-buildroot
container based on F36 would not work with this change and had to
switch to using the cosa:latest container. Going further down the
rabbit hole, I found that I didn't need to use the
ci/set-openshift-user.sh script at all and just the cosa:latest
container was enough to get the validate job to pass. I don't claim
to fully understand why this is the case, but it does effectively
unblock the validate job.

@openshift-ci openshift-ci Bot requested review from cgwalters and dustymabe May 23, 2022 17:35
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 23, 2022
@miabbott miabbott force-pushed the openshift_user_validate branch from 2b01bcc to 97dcf63 Compare May 23, 2022 17:38
@miabbott miabbott marked this pull request as draft May 23, 2022 17:50
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 23, 2022
@miabbott miabbott force-pushed the openshift_user_validate branch from 97dcf63 to 4ca23e4 Compare May 23, 2022 18:03
@miabbott
Copy link
Copy Markdown
Member Author

/retest

@miabbott
Copy link
Copy Markdown
Member Author

/test ci/rehearse/openshift/os/c9s/validate

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 23, 2022

@miabbott: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test app-ci-config-dry
  • /test boskos-config
  • /test boskos-config-generation
  • /test build-clusters
  • /test build01-dry
  • /test build02-dry
  • /test build03-dry
  • /test build04-dry
  • /test build05-dry
  • /test ci-operator-config
  • /test ci-operator-config-metadata
  • /test ci-operator-registry
  • /test ci-secret-bootstrap-config-validation
  • /test ci-testgrid-allow-list
  • /test config
  • /test core-valid
  • /test correctly-sharded-config
  • /test deprecate-templates
  • /test generated-config
  • /test generated-dashboards
  • /test hive-dry
  • /test openshift-image-mirror-mappings
  • /test ordered-prow-config
  • /test owners
  • /test pj-rehearse-blocking
  • /test prow-config
  • /test prow-config-filenames
  • /test prow-config-semantics
  • /test pylint
  • /test release-config
  • /test release-controller-config
  • /test secret-generator-config-valid
  • /test services-valid
  • /test step-registry-metadata
  • /test step-registry-shellcheck
  • /test sync-rover-groups
  • /test yamllint

The following commands are available to trigger optional jobs:

  • /test arm01-dry
  • /test pj-rehearse
  • /test pj-rehearse-max
  • /test pj-rehearse-more
  • /test vsphere-dry

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-release-master-build-clusters
  • pull-ci-openshift-release-master-ci-operator-config
  • pull-ci-openshift-release-master-ci-operator-config-metadata
  • pull-ci-openshift-release-master-ci-operator-registry
  • pull-ci-openshift-release-master-core-valid
  • pull-ci-openshift-release-master-correctly-sharded-config
  • pull-ci-openshift-release-master-deprecate-templates
  • pull-ci-openshift-release-master-generated-config
  • pull-ci-openshift-release-master-ordered-prow-config
  • pull-ci-openshift-release-master-owners
  • pull-ci-openshift-release-master-pj-rehearse
  • pull-ci-openshift-release-master-prow-config-filenames
  • pull-ci-openshift-release-openshift-image-mirror-mappings
  • pull-ci-openshift-release-yamllint
Details

In response to this:

/test ci/rehearse/openshift/os/c9s/validate

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@miabbott
Copy link
Copy Markdown
Member Author

/retest

@miabbott
Copy link
Copy Markdown
Member Author

/test pj-rehearse

2 similar comments
@miabbott
Copy link
Copy Markdown
Member Author

/test pj-rehearse

@miabbott
Copy link
Copy Markdown
Member Author

/test pj-rehearse

@miabbott miabbott force-pushed the openshift_user_validate branch from 52a1452 to df64658 Compare May 23, 2022 21:14
@miabbott
Copy link
Copy Markdown
Member Author

/test pj-rehearse

@miabbott miabbott force-pushed the openshift_user_validate branch from df64658 to 50f3327 Compare May 24, 2022 12:53
@miabbott
Copy link
Copy Markdown
Member Author

/test pj-rehearse

@miabbott miabbott force-pushed the openshift_user_validate branch from 50f3327 to ddace0f Compare May 24, 2022 15:36
@miabbott
Copy link
Copy Markdown
Member Author

/test pj-rehearse

@miabbott miabbott marked this pull request as ready for review May 24, 2022 16:04
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2022
@openshift-ci openshift-ci Bot requested a review from prestist May 24, 2022 16:05
@miabbott miabbott force-pushed the openshift_user_validate branch from 9454f59 to 844fb2d Compare May 24, 2022 16:05
The `validate` job started to fail with:

```
fatal: unsafe repository ('/go/src/github.com/openshift/os' is owned by someone else)
To add an exception for this directory, call:

	git config --global --add safe.directory /go/src/github.com/openshift/os
```

Applying this suggestion as part of the `ci/validate.sh` script
(see openshift/os#802) fails with the error:

```
+ git config --global --add safe.directory /go/src/github.com/openshift/os
error: could not lock config file //.gitconfig: Permission denied
```

I suspect this is related to how the random user ID is configured in
OCP pods, similar to what is described in openshift/os#781, so I tried
using the `ci/set-openshift-user.sh` script as part of the `validate`
job.

Through trial and error, I found that using the `fcos-buildroot`
container based on F36 would not work with this change and had to
switch to using the `cosa:latest` container. Going further down the
rabbit hole, I found that I didn't need to use the
`ci/set-openshift-user.sh` script at all and just the `cosa:latest`
container was enough to get the `validate` job to pass. I don't claim
to fully understand why this is the case, but it does effectively
unblock the `validate` job.
@miabbott miabbott force-pushed the openshift_user_validate branch from 844fb2d to c497602 Compare May 24, 2022 16:16
@miabbott
Copy link
Copy Markdown
Member Author

cc: @cheesesashimi

Wanted to make you aware of this change and see if the error I encountered rang any bells

./ci/validate.sh
container:
from: src
from: coreos_coreos-assembler_latest
Copy link
Copy Markdown
Member

@cheesesashimi cheesesashimi May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to coreos_coreos-assembler_latest would probably break because src points to what's defined under .images.build_root and I'm not sure if the COSA image has a ./ci/validate.sh script inside it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this and re-reading your description of this change, we may want to change this to build-test-qemu-img instead since that image will be the result of https://github.com/openshift/os/blob/master/ci/Dockerfile, which is the COSA image and the contents of the openshift/os repo. Doing that will include both the ./ci/vaildate.sh and ./ci/set-openshift-user.sh scripts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to coreos_coreos-assembler_latest would probably break because src points to what's defined under .images.build_root and I'm not sure if the COSA image has a ./ci/validate.sh script inside it.

I'm confused about this, since the pj-rehearse jobs appear to have passed. If you look at the history for the job on this PR, you can see some failures where during some iterations the scripts weren't found due to incorrect paths, etc. From what I can tell it looks like the openshift/os repo is present (magic!) and the ci/validate.sh script is able to be executed successfully.

See the latest job where I stuck some debug output as part of the job - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/28790/rehearse-28790-pull-ci-openshift-os-master-validate/1529165858037829632/artifacts/test/build-log.txt

Copy link
Copy Markdown
Member

@cheesesashimi cheesesashimi May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm a little confused as well! But I think I have an understanding of what's going on here and can explain the source of my confusion (and hopefully make you less confused as well!):

  • The Dockerfile for openshift/os is in ci/Dockerfile and is what becomes build-test-qemu-img. We probably should make this the .build_root part of the CI config, but that's a separate concern and not relevant right now. The resulting image is what the COSA build scripts use to run. It has the scripts and the layering test binary from openshift/os layered on top of the coreos-assembler image. This is why I thought we should use that image to run ./ci/validate.sh.
  • The src image is registry.ci.openshift.org/coreos/fcos-buildroot:testing-devel, produced by the coreos/fedora-coreos-config repo. Interestingly, the validate step is the only part of the openshift/os CI config that uses this image. None of the other image builds or tests directly consume or use this image.
  • The "magic" is that before the validate test runs, a bunch of init containers are run. Amongst those is a cloneRefs step ($ curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/28790/rehearse-28790-pull-ci-openshift-os-master-validate/1529165858037829632/artifacts/ci-operator-step-graph.json' | jq -r '.[1].manifests[0].spec.initContainers[].name') which clones openshift/os (and the fedora-coreos-config submodule).

So in a nutshell, what's happening in the validate step is we're taking registry.ci.openshift.org/coreos/coreos-assembler:latest, cloning openshift/os (and the fedora-coreos-config submodule) into it and running ./ci/validate.sh. My confusion came from forgetting about the cloneRefs step and thinking that since coreos-assembler:latest doesn't have the ./ci/validate.sh script present that it would fail.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed information, Zack! It certainly helps improve my understanding of the flow of the jobs.

Interestingly, the validate step is the only part of the openshift/os CI config that uses this image. None of the other image builds or tests directly consume or use this image.

Should we drop the use of the fcos-buildroot as part of this PR? Would we need to reconfigure build_root to point to another image?

Is the cloneRefs step logged in any of the test artifacts?

If you think the PR is good to go as is, please drop an /approve if you can.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop the use of the fcos-buildroot as part of this PR? Would we need to reconfigure build_root to point to another image?

We could configure it to point to ci/Dockerfile and build that. We'd then need to replace build-test-qemu-img with src. There's no rush in doing that, so if we want to do that, let's handle that as a separate PR.

Is the cloneRefs step logged in any of the test artifacts?

Unfortunately it's not. From what I can tell, the container used purposely does not create logs (although I don't know why). You might be able to catch it while the job is running if you click through to the console for the CI namespace. The only place I was really able to find it was in the job test steps artifact and even then, I had to know that it was an init container.

If you think the PR is good to go as is, please drop an /approve if you can.

Will do.

@miabbott miabbott force-pushed the openshift_user_validate branch from bc94f3c to c497602 Compare May 24, 2022 23:21
@cheesesashimi
Copy link
Copy Markdown
Member

/approve
/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 25, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, miabbott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2022

@miabbott: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 02ba5c0 into openshift:master May 25, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 25, 2022

@miabbott: Updated the following 3 configmaps:

  • ci-operator-misc-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-os-c9s.yaml using file ci-operator/config/openshift/os/openshift-os-c9s.yaml
  • ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-os-master.yaml using file ci-operator/config/openshift/os/openshift-os-master.yaml
    • key openshift-os-master__periodic.yaml using file ci-operator/config/openshift/os/openshift-os-master__periodic.yaml
  • job-config-misc configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-os-c9s-presubmits.yaml using file ci-operator/jobs/openshift/os/openshift-os-c9s-presubmits.yaml
Details

In response to this:

The validate job started to fail with:

fatal: unsafe repository ('/go/src/github.com/openshift/os' is owned by someone else)
To add an exception for this directory, call:
   
       git config --global --add safe.directory /go/src/github.com/openshift/os

Applying this suggestion as part of the ci/validate.sh script
(see openshift/os#802) fails with the error:

+ git config --global --add safe.directory /go/src/github.com/openshift/os
error: could not lock config file //.gitconfig: Permission denied

I suspect this is related to how the random user ID is configured in
OCP pods, similar to what is described in openshift/os#781, so I tried
using the ci/set-openshift-user.sh script as part of the validate
job.

Through trial and error, I found that using the fcos-buildroot
container based on F36 would not work with this change and had to
switch to using the cosa:latest container. Going further down the
rabbit hole, I found that I didn't need to use the
ci/set-openshift-user.sh script at all and just the cosa:latest
container was enough to get the validate job to pass. I don't claim
to fully understand why this is the case, but it does effectively
unblock the validate job.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants