Skip to content

OCPBUGS-11124, OCPBUGS-11411: Remove systemd-pcrphase dependency on network#3704

Closed
mkowalski wants to merge 1 commit intoopenshift:masterfrom
mkowalski:pcrphase
Closed

OCPBUGS-11124, OCPBUGS-11411: Remove systemd-pcrphase dependency on network#3704
mkowalski wants to merge 1 commit intoopenshift:masterfrom
mkowalski:pcrphase

Conversation

@mkowalski
Copy link
Copy Markdown
Contributor

This PR removes After= section from the definition of systemd-pcrphase. It is because currently it blocks possibility to SSH into the node which for any reason has nodeip-configuration or configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which network-online.targed is not yet reached but we already want to access the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and depends on remote-fs, this creates a deadlock. In order to remediate this situation, we are removing dependency on remote-fs here. It is justified as OpenShift nodes are not meant to use remote home directories.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411

…etwork

This PR removes `After=` section from the definition of
systemd-pcrphase. It is because currently it blocks possibility to SSH
into the node which for any reason has nodeip-configuration or
configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which
network-online.targed is not yet reached but we already want to access
the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and
depends on remote-fs, this creates a deadlock. In order to remediate
this situation, we are removing dependency on remote-fs here. It is
justified as OpenShift nodes are not meant to use remote home
directories.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 9, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@mkowalski: This pull request references Jira Issue OCPBUGS-11411, which is invalid:

  • expected the bug to target the "4.14.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This PR removes After= section from the definition of systemd-pcrphase. It is because currently it blocks possibility to SSH into the node which for any reason has nodeip-configuration or configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which network-online.targed is not yet reached but we already want to access the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and depends on remote-fs, this creates a deadlock. In order to remediate this situation, we are removing dependency on remote-fs here. It is justified as OpenShift nodes are not meant to use remote home directories.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 9, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mkowalski
Copy link
Copy Markdown
Contributor Author

/test all
/cc @cybertron
/cc @cgwalters
/cc @zaneb

@openshift-ci openshift-ci Bot requested review from cgwalters, cybertron and zaneb May 9, 2023 17:11
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 9, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkowalski
Once this PR has been reviewed and has the lgtm label, please assign sinnykumari for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mkowalski
Copy link
Copy Markdown
Contributor Author

/test e2e-metal-ipi

@cgwalters
Copy link
Copy Markdown
Member

IMO this is also workaround an OS level bug, so we could directly do this as a PR to https://github.com/openshift/os too

@mkowalski
Copy link
Copy Markdown
Contributor Author

IMO this is also workaround an OS level bug, so we could directly do this as a PR to https://github.com/openshift/os too

Sounds fair, let's see if I manage to get it in via openshift/os#1279

@mkowalski mkowalski marked this pull request as ready for review May 10, 2023 15:36
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2023
@openshift-ci openshift-ci Bot requested review from jkyros and yuqi-zhang May 10, 2023 15:38
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 10, 2023

@mkowalski: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-alibabacloud-ovn 35bb67b link false /test e2e-alibabacloud-ovn
ci/prow/e2e-gcp-op 35bb67b link true /test e2e-gcp-op
ci/prow/e2e-aws-ovn-upgrade 35bb67b link true /test e2e-aws-ovn-upgrade
ci/prow/okd-scos-e2e-gcp-ovn-upgrade 35bb67b link false /test okd-scos-e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mkowalski
Copy link
Copy Markdown
Contributor Author

/close

Let's fix it inside o/os

@openshift-ci openshift-ci Bot closed this May 11, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2023

@mkowalski: Closed this PR.

Details

In response to this:

/close

Let's fix it inside o/os

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@mkowalski: This pull request references Jira Issue OCPBUGS-11411. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

This PR removes After= section from the definition of systemd-pcrphase. It is because currently it blocks possibility to SSH into the node which for any reason has nodeip-configuration or configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which network-online.targed is not yet reached but we already want to access the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and depends on remote-fs, this creates a deadlock. In order to remediate this situation, we are removing dependency on remote-fs here. It is justified as OpenShift nodes are not meant to use remote home directories.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants