Skip to content

OCPBUGS-11124, OCPBUGS-11411: overlay: Inject pcrphase service definition#1279

Closed
mkowalski wants to merge 1 commit intoopenshift:masterfrom
mkowalski:pcrphase
Closed

OCPBUGS-11124, OCPBUGS-11411: overlay: Inject pcrphase service definition#1279
mkowalski wants to merge 1 commit intoopenshift:masterfrom
mkowalski:pcrphase

Conversation

@mkowalski
Copy link
Copy Markdown

@mkowalski mkowalski commented May 10, 2023

This PR removes After= section from the definition of
systemd-pcrphase. It is because currently it blocks possibility to SSH
into the node which for any reason has nodeip-configuration or
configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which
network-online.targed is not yet reached but we already want to access
the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and
depends on remote-fs, this creates a deadlock. In order to remediate
this situation, we are removing dependency on remote-fs here. It is
justified as OpenShift nodes are not meant to use remote home
directories.

We are also modifying dependencies of systemd-user-sessions.service so
that we are explicitly saying that we do not need network in order to
allow access to the node.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411

@openshift-ci-robot openshift-ci-robot added the jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. label May 10, 2023
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2023
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 10, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 10, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Copy Markdown

@mkowalski: This pull request references Jira Issue OCPBUGS-11411, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mike-nguyen

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This PR removes After= section from the definition of systemd-pcrphase. It is because currently it blocks possibility to SSH into the node which for any reason has nodeip-configuration or configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which network-online.targed is not yet reached but we already want to access the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and depends on remote-fs, this creates a deadlock. In order to remediate this situation, we are removing dependency on remote-fs here. It is justified as OpenShift nodes are not meant to use remote home directories.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 10, 2023
@openshift-ci openshift-ci Bot requested a review from mike-nguyen May 10, 2023 10:56
@mkowalski
Copy link
Copy Markdown
Author

/test all
/cc @cybertron
/cc @cgwalters

@openshift-ci openshift-ci Bot requested review from cgwalters and cybertron May 10, 2023 10:56
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 10, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkowalski
Once this PR has been reviewed and has the lgtm label, please assign adam0brien for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mkowalski mkowalski marked this pull request as ready for review May 10, 2023 11:08
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2023
@openshift-ci openshift-ci Bot requested review from jlebon and jmarrero May 10, 2023 11:09
@jlebon
Copy link
Copy Markdown
Member

jlebon commented May 10, 2023

OK, I have read some of the background on this. ISTM like we should just mask all the systemd-pcrphase stuff for now instead. Reading systemd-pcrphase.service(8), it and other PCR-related services are only used when booting a UKI via systemd-stub. UKIs are on the roadmap eventually, so we may have to undo this masking at some point, but for now there's no point in trying keep it working if it will need to be adjusted further. I doubt we'll be the only ones wanting to e.g. be able to log into the host to debug things before networking is up.

@mkowalski
Copy link
Copy Markdown
Author

I am not sure if I'm missing something, but we can't do lots of operations on this service (i.e. I cannot mask it as well as we couldn't simply disable it). Maybe it's related to the fact that this is some systemd-originated service

[root@master-0 kubelet.service.d]# systemctl mask systemd-pcrphase.service
Failed to mask unit: File /etc/systemd/system/systemd-pcrphase.service already exists.

@cgwalters
Copy link
Copy Markdown
Member

Sorry I know we've debugged this in multiple places in those bugs, but are you really, 100% sure that this will fix the problem?

This PR removes After= section from the definition of systemd-pcrphase.

One thing I notice here is that indeed we have

[root@cosa-devsh ~]# rpm-ostree status
State: idle
Deployments:
● 4ab367a7d23cc53de1b3c5ddd0e1939c0458d912d3eb69df6d33afa0f6c34659
                  Version: 413.92.202303281804-0 (2023-03-28T18:07:15Z)
[root@cosa-devsh ~]# grep After= /usr/lib/systemd/system/systemd-pcrphase.service
After=remote-fs.target remote-cryptsetup.target
[root@cosa-devsh ~]#

but also:

[root@cosa-devsh ~]# grep After= /usr/lib/systemd/system/systemd-user-sessions.service
After=remote-fs.target nss-user-lookup.target network.target home.mount
[root@cosa-devsh ~]#

So systemd-pcrphase.service is just pulling in remote-cryptsetup.target in addition.

I am still not convinced that I was wrong over here https://issues.redhat.com/browse/OCPBUGS-11124?focusedId=22056799&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-22056799

@jlebon
Copy link
Copy Markdown
Member

jlebon commented May 10, 2023

I am not sure if I'm missing something, but we can't do lots of operations on this service (i.e. I cannot mask it as well as we couldn't simply disable it). Maybe it's related to the fact that this is some systemd-originated service

[root@master-0 kubelet.service.d]# systemctl mask systemd-pcrphase.service
Failed to mask unit: File /etc/systemd/system/systemd-pcrphase.service already exists.

To address this part: is this with a custom RHCOS containing this PR? It worked for me locally on the latest RHCOS 4.13 build. If we go this way, I can provide more information about how we'd do this at compose time.

If you want to test whether it fixes the issue, you can also write an Ignition config/MC to mask the unit.

@mkowalski
Copy link
Copy Markdown
Author

Sorry I know we've debugged this in multiple places in those bugs, but are you really, 100% sure that this will fix the problem?

This PR removes After= section from the definition of systemd-pcrphase.

One thing I notice here is that indeed we have

[...]

[root@cosa-devsh ~]# grep After= /usr/lib/systemd/system/systemd-user-sessions.service
After=remote-fs.target nss-user-lookup.target network.target home.mount
[root@cosa-devsh ~]#

So systemd-pcrphase.service is just pulling in remote-cryptsetup.target in addition.

I am still not convinced that I was wrong over here https://issues.redhat.com/browse/OCPBUGS-11124?focusedId=22056799&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-22056799

The more I look at this the more confused I am, but I am now looking at RHCOS 411.86.202302021552-0 and what I see is following

[Unit]
Description=Permit User Sessions
Documentation=man:systemd-user-sessions.service(8)
After=remote-fs.target nss-user-lookup.target network.target home.mount

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/systemd/systemd-user-sessions start
ExecStop=/usr/lib/systemd/systemd-user-sessions stop

So we have OCP 4.11 which is not affected by this problem but systemd-user-sessions.service also has After=remote-fs.target. This is the reason why we never pointed fingers directly at remote-fs.

But the more I think about it, the more I feel that maybe we should simply edit systemd-user-sessions.service so that instead of After=remote-fs.target nss-user-lookup.target network.target home.mount it gets only After=nss-user-lookup.target network.target home.mount

My rationale overall is as follows - between RHEL8 and RHEL9 we didn't change definitions of already existing stuff. But we did introduce pcrphase service that has some dependencies. By removing those dependencies, we almost go back to the previous state.

Note that pcrphase service pulls After=remote-fs.target remote-cryptsetup.target and I am removing both of them. Even if somewhere along the way we took remote-fs as our target when we should have focused on remote-cryptsetup, pcrphase now becomes dependency-less

@mkowalski
Copy link
Copy Markdown
Author

mkowalski commented May 11, 2023

/hold

The following combination of changes gives me ability to login at any time (i.e. when nodeip-configuration is still running)

[core@worker-1 ~]$ cat /etc/systemd/system/systemd-user-sessions.service | grep -E "After=|Before="
After=nss-user-lookup.target home.mount

and

core@worker-1 ~]$ cat /etc/systemd/system/systemd-pcrphase.service | grep -E "After=|Before="
Before=systemd-user-sessions.service

This looks relatively elegant to me with the only disadvantage being that we cannot simply drop-in replacement for dependencies but need to copy-paste the whole unit.

Of course instead of copy-pasta of systemd-pcrphase.service we could mask it. Together with modifying deps of user-sessions.service this looks like something easy to understand.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 11, 2023
…tion

This PR removes `After=` section from the definition of
systemd-pcrphase. It is because currently it blocks possibility to SSH
into the node which for any reason has nodeip-configuration or
configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which
network-online.targed is not yet reached but we already want to access
the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and
depends on remote-fs, this creates a deadlock. In order to remediate
this situation, we are removing dependency on remote-fs here. It is
justified as OpenShift nodes are not meant to use remote home
directories.

We are also modifying dependencies of systemd-user-sessions.service so
that we are explicitly saying that we do not need network in order to
allow access to the node.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411
@openshift-ci-robot
Copy link
Copy Markdown

@mkowalski: This pull request references Jira Issue OCPBUGS-11411, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mike-nguyen

Details

In response to this:

This PR removes After= section from the definition of
systemd-pcrphase. It is because currently it blocks possibility to SSH
into the node which for any reason has nodeip-configuration or
configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which
network-online.targed is not yet reached but we already want to access
the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and
depends on remote-fs, this creates a deadlock. In order to remediate
this situation, we are removing dependency on remote-fs here. It is
justified as OpenShift nodes are not meant to use remote home
directories.

We are also modifying dependencies of systemd-user-sessions.service so
that we are explicitly saying that we do not need network in order to
allow access to the node.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2023

@mkowalski: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/scos-9-build-test-qemu a6a51c5 link true /test scos-9-build-test-qemu

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@travier
Copy link
Copy Markdown
Member

travier commented May 11, 2023

I agree with Colin and I think that the problem is somewhere else. If masking this unit really unblocks us then I'll say OK but this still needs more investigation.


You don't need to copy/paste an entire unit to change After= dependencies. You can write a drop-in in foo.service.d/override.conf with:

...

Edit: See comment below.

@mkowalski
Copy link
Copy Markdown
Author

You don't need to copy/paste an entire unit to change After= dependencies. You can write a drop-in in foo.service.d/override.conf with:

That's not true. As per upstream documentation https://github.com/systemd/systemd/blob/250e9fadbcc0ca90e697d7efb40855b054ed3b8f/man/systemd.unit.xml#L1786 you cannot set empty After= in a drop-in

@travier
Copy link
Copy Markdown
Member

travier commented May 11, 2023

That's not true. As per upstream documentation systemd/systemd@250e9fa/man/systemd.unit.xml#L1786 you cannot set empty After= in a drop-in

Wow, nice catch. I did not know that. Thanks and sorry.

@travier
Copy link
Copy Markdown
Member

travier commented May 11, 2023

This unfortunately makes this whole fix less desirable for us as we'll have to keep those files in sync with the systemd ones.

If we go this route, a sed line in a post-process script would be more maintainable.

@mkowalski
Copy link
Copy Markdown
Author

If we go this route, a sed line in a post-process script would be more maintainable.

Agreed, that make sense. Can we just agree here whether the pcrphase itself should be masked or copy-pasted with removed dependencies?

We have user-sessions.service for which there is only one way of dealing with the issue (modifying After=) but for pcrphase there is more than one

@travier
Copy link
Copy Markdown
Member

travier commented May 15, 2023

Shouldn't masking the unit also work in this case?

@jlebon
Copy link
Copy Markdown
Member

jlebon commented May 15, 2023

One thing that would really help drill down into the exact issue is to try to reproduce this outside of OCP on a single node RHCOS. E.g. have a Butane config which creates a systemd service similar in effect to the systemd units concerned here (e.g. with the same dependencies, but the actual ExecStart can just be sleep 5m for example) and then see it block logins on 4.13, but not on 4.12.

I briefly tried this exercise, and did get it to block logins on 4.13, but it exhibited the same behaviour on 4.12, so clearly I wasn't capturing all the subtleties.

@cgwalters
Copy link
Copy Markdown
Member

Totally agree about isolating this. @jlebon can you post your butane config?

@jlebon
Copy link
Copy Markdown
Member

jlebon commented May 19, 2023

Here's what I've been playing with

variant: fcos
version: 1.4.0
systemd:
  units:
    # override the builtin openvswitch.service
    - name: openvswitch.service
      enabled: true
      contents: |
        [Unit]
        Description=Fake Open vSwitch
        # These match openvswitch.service
        Before=network.target network.service
        After=network-pre.target ovsdb-server.service ovs-vswitchd.service
        PartOf=network.target
        Requires=ovsdb-server.service
        Requires=ovs-vswitchd.service

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=true

        [Install]
        WantedBy=multi-user.target
    - name: nodeip-configuration.service
      enabled: true
      contents: |
        [Unit]
        Description=Fake nodeip-configuration
        # These match the nodeip-configuration.service template
        Wants=NetworkManager-wait-online.service crio-wipe.service
        After=NetworkManager-wait-online.service ignition-firstboot-complete.service crio-wipe.service
        Before=kubelet.service crio.service ovs-configuration.service

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=sleep infinity

        [Install]
        WantedBy=multi-user.target
    - name: ovs-configuration.service
      enabled: true
      contents: |
        [Unit]
        Description=Fake ovs-configuration
        # These match the ovs-configuration.service template
        Requires=openvswitch.service
        Wants=NetworkManager-wait-online.service
        After=NetworkManager-wait-online.service openvswitch.service network.service nodeip-configuration.service
        Before=network-online.target kubelet.service crio.service node-valid-hostname.service

        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=sleep infinity

        [Install]
        WantedBy=multi-user.target

So nodeip-configuration.service and ovs-configuration are set up to indefinitely hang. But testing on both the latest 4.12 and 4.13 releases, I get the same result: the serial console comes up after a short delay (maybe 5s?). Once in, I can tell that systemd-user-sessions.service and sshd.service are up and running.

So likely there's some other detail I'm missing in this mock test that actually makes it hang on 4.13 but not 4.12.

@mkowalski
Copy link
Copy Markdown
Author

mkowalski commented May 22, 2023

So nodeip-configuration.service and ovs-configuration are set up to indefinitely hang. But testing on both the latest 4.12 and 4.13 releases, I get the same result: the serial console comes up after a short delay (maybe 5s?). Once in, I can tell that systemd-user-sessions.service and sshd.service are up and running.

I may need to digest the details, but this is not the behaviour we see in OCP 4.13; what I do see when setting nodeip-configuration.service to hang indefinitely is that I do not have the serial console usable (so no matter if I had root password or not, I couldn't login anyway)

Not sure what's missing from the butane config, but I don't understand how possibly we could get a console with nodeip-configuration.service that hangs.

In fact, if we consider only serial console, 4.12 and 4.13 are the same. The difference is that with 4.13 when the console is stuck, I cannot SSH to the node. In 4.12 I can (so no console login, but SSH is possible)

2023-05-22-123621_1062x931_scrot

@jlebon
Copy link
Copy Markdown
Member

jlebon commented May 22, 2023

That's interesting. I'd like to dig into this. I'm mostly offline today but will take a look at it more deeply tomorrow.

@jlebon
Copy link
Copy Markdown
Member

jlebon commented May 23, 2023

OK, I think this is superseded by #1294.

@cgwalters cgwalters closed this May 23, 2023
@openshift-ci-robot
Copy link
Copy Markdown

@mkowalski: This pull request references Jira Issue OCPBUGS-11411. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

This PR removes After= section from the definition of
systemd-pcrphase. It is because currently it blocks possibility to SSH
into the node which for any reason has nodeip-configuration or
configure-ovs not succeeding.

The self-healing functionality of the latter creates a scenario in which
network-online.targed is not yet reached but we already want to access
the node for debugging purposes.

At the same time as by default systemd-pcrphase blocks user sessions and
depends on remote-fs, this creates a deadlock. In order to remediate
this situation, we are removing dependency on remote-fs here. It is
justified as OpenShift nodes are not meant to use remote home
directories.

We are also modifying dependencies of systemd-user-sessions.service so
that we are explicitly saying that we do not need network in order to
allow access to the node.

Fixes: OCPBUGS-11124
Fixes: OCPBUGS-11411

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mkowalski mkowalski deleted the pcrphase branch May 24, 2023 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants