overlay: disable iscsi.service by default#1294
overlay: disable iscsi.service by default#1294openshift-merge-robot merged 1 commit intoopenshift:masterfrom
Conversation
|
@mkowalski Can you verify that the real life situation in OCPBUGS-11124 is indeed fixed if you also add a MachineConfig/Ignition fragment to disable {
"ignition": {
"version": "3.3.0"
},
"systemd": {
"units": [
{
"enabled": false,
"name": "iscsi.service"
}
]
}
} |
| @@ -1 +1,7 @@ | |||
| disable iscsid.socket | |||
There was a problem hiding this comment.
Arg sorry, it was meant as a workaround to be investigated later and then I forgot.
There was a problem hiding this comment.
I think I disabled it as it was an additionally enabled unit and that made our kola test fail.
There was a problem hiding this comment.
Ack. I'll remove it in a follow-up PR and if anything breaks, at least we'll be able to add some context to it. :)
There was a problem hiding this comment.
Sorry, I hadn't seen your second comment here. If that's the case, I think that was fixed by coreos/coreos-assembler#3275.
There was a problem hiding this comment.
My bad, we disable the iscsi.service service here, not the d one.
There was a problem hiding this comment.
We can drop this here or in another PR. Up to you.
There was a problem hiding this comment.
Yeah, I'll drop it in a follow-up.
|
It's possible (though IMO unlikely) some customers are configuring host-level OS-initiated iSCSI mounts and relying on |
|
Nice investigation! That makes total sense. |
I can confirm that disabling |
|
/retest |
|
/lgtm |
|
/hold Revision a6c9055 was retested 3 times: holding |
|
/retest |
This test was originally added in RHCOS[[1]], but it's equally valid to run in FCOS. That would have caught the original issue when it was still present in Fedora. [1]: openshift/os#1294
|
The test added in this PR is being upstreamed in coreos/fedora-coreos-config#2437. Once the FCOS submodule is updated to include it, we can remove our copy here. |
This test was originally added in RHCOS[[1]], but it's equally valid to run in FCOS. That would have caught the original issue when it was still present in Fedora. [1]: openshift/os#1294
|
This requires coreos/coreos-assembler#3487. |
`iscsi.service` has `Before=remote-fs-pre.target` *and* `After=network-online.target`. This forces `remote-fs-pre.target` to block on `network-online.target` and hence in OCP, on `ovs-configuration.service` (which has `Before=network-online.target`). So this transitively makes `systemd-user-sessions.service` block on `network-online.target`. This was an issue in Fedora as well and was discussed in a devel thread[[1]]. `iscsi.service` was subsequently reworked[[2]][[3]] so that it was only activated if iSCSI was actually used by the system. On RHEL 8, `iscsi.service` and co. were directly enabled by RPM scriptlets rather than using presets. In RHCOS, we explicitly make presets canonical[[4]] so we shipped with `iscsi.service` disabled by default. On RHEL 9, the units were fixed to use presets[[5]]. This is why we started seeing this issue after moving to RHEL 9. So all we need in theory is to have the Fedora patch backported to RHEL 9. However, since we don't really need the functionality from `iscsi.service` by default in RHCOS, we can fast-track its (re-)disablement and not wait for the `iscsi-starter.service` workaround. Note that `iscsi.service` is only used to bring up iSCSI sessions marked for autostart in `/var/lib/iscsi/nodes` and is separate from `iscsid.service`, which is what actually manages the iSCSI connections. In OpenShift, we rely on the latter only (e.g. configured iSCSI PVCs are done by the kubelet directly calling out to `iscsiadm`). It's also separate from iSCSI devices that use host bus adapters, which are transparent to RHCOS/OCP. Fixes: https://issues.redhat.com/browse/OCPBUGS-11124 [1]: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/HACVEJ3FMOIM2TOENOVH5CPOUNR7NCMS [2]: https://src.fedoraproject.org/rpms/iscsi-initiator-utils/c/1e689cd0c6667eca838c85975a1b7a070209e5ad [3]: https://src.fedoraproject.org/rpms/fedora-release/pull-request/246 [4]: https://github.com/coreos/fedora-coreos-config/blob/1553518214088a89d6a2360a6fcdddbd3915628a/manifests/ignition-and-ostree.yaml#L35-L44 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1930458
This test was originally added in RHCOS[[1]], but it's equally valid to run in FCOS. That would have caught the original issue when it was still present in Fedora. [1]: openshift/os#1294
|
@jlebon: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
This one just needs another /lgtm now! (No changes, just restamped to tickle a full CI rerun after coreos/coreos-assembler#3487.) |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: c4rt0, cgwalters, jlebon, travier The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick release-4.13 |
|
@mkowalski: new pull request created: #1301 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Thanks for this a lot, @jlebon! Given that we need backport to 4.13, should coreos/coreos-assembler#3487 get backport to |
|
It will need an f-c-c backport and bump as well |
This test was originally added in RHCOS[[1]], but it's equally valid to run in FCOS. That would have caught the original issue when it was still present in Fedora. [1]: openshift/os#1294 (cherry picked from commit 5f322cb)
This test was originally added in RHCOS[[1]], but it's equally valid to run in FCOS. That would have caught the original issue when it was still present in Fedora. [1]: openshift/os#1294 (cherry picked from commit 5f322cb)
This test was originally added in RHCOS[[1]], but it's equally valid to run in FCOS. That would have caught the original issue when it was still present in Fedora. [1]: openshift/os#1294
This test was originally added in RHCOS[[1]], but it's equally valid to run in FCOS. That would have caught the original issue when it was still present in Fedora. [1]: openshift/os#1294
iscsi.servicehasBefore=remote-fs-pre.targetandAfter=network-online.target. This forcesremote-fs-pre.targetto block onnetwork-online.targetand hence in OCP, onovs-configuration.service(which hasBefore=network-online.target).So this transitively makes
systemd-user-sessions.serviceblock onnetwork-online.target.This was an issue in Fedora as well and was discussed in a devel thread[1].
iscsi.servicewas subsequently reworked[2][3] so that it was only activated if iSCSI was actually used by the system.On RHEL 8,
iscsi.serviceand co. were directly enabled by RPM scriptlets rather than using presets. In RHCOS, we explicitly make presets canonical[4] so we shipped withiscsi.servicedisabled by default. On RHEL 9, the units were fixed to use presets[5]. This is why we started seeing this issue after moving to RHEL 9.So all we need in theory is to have the Fedora patch backported to RHEL
9. However, since we don't really need the functionality from
iscsi.serviceby default in RHCOS, we can fast-track its (re-)disablement and not wait for theiscsi-starter.serviceworkaround.Note that
iscsi.serviceis only used to bring up iSCSI sessions marked for autostart in/var/lib/iscsi/nodesand is separate fromiscsid.service, which is what actually manages the iSCSI connections. In OpenShift, we rely on the latter only (e.g. configured iSCSI PVCs are done by the kubelet directly calling out toiscsiadm). It's also separate from iSCSI devices that use host bus adapters, which are transparent to RHCOS/OCP.Fixes: https://issues.redhat.com/browse/OCPBUGS-11124