This bug was originally filed in Launchpad as LP: #1913354
Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2021-01-26T23:35:15.552578+00:00
date_fix_committed = None
date_fix_released = None
id = 1913354
importance = medium
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1913354
milestone = None
owner = hggdh2
owner_name = C de-Avillez
private = False
status = triaged
submitter = hggdh2
submitter_name = C de-Avillez
tags = []
duplicates = []
Launchpad user C de-Avillez(hggdh2) wrote on 2021-01-26T23:35:15.552578+00:00
Azure, RHEL 7.8, 7.9 and OEL 7.8, 7.9.
On OEL 7.8 cloud-init is cloud-init-18.5-6.el7.x86_64
On both OEL and RHel 7.* (certainly 7.8 and 7.9), if we have a NFS mount in /etc/fstab (unknown if this applies to NFSv4), then boot may not complete. The end result is a hang, and the system is inaccessible from SSH or serial console login.
All points to a deadlock between the starting of the rpc.statd and rpc.statd-notify services and the cloud-init.service.
This happens because rpc.statd and rpc.statd-notify have the following dependencies declared:
rcp-statd.service
[Unit]
Description=NFS status monitor for NFSv2/3 locking.
DefaultDependencies=no
Conflicts=umount.target
Requires=nss-lookup.target rpcbind.socket
Wants=network-online.target # <---
After=network-online.target nss-lookup.target rpcbind.socket # <---
PartOf=nfs-utils.service
Wants=nfs-config.service
After=nfs-config.service
[Service]
Environment=RPC_STATD_NO_NOTIFY=1
EnvironmentFile=-/run/sysconfig/nfs-utils
Type=forking
PIDFile=/var/run/rpc.statd.pid
ExecStart=/usr/sbin/rpc.statd $STATDARGS
rpc-statd-notify.service:
[Unit]
Description=Notify NFS peers of a restart
DefaultDependencies=no
Wants=network-online.target # <---
After=local-fs.target network-online.target nss-lookup.target # <---
Do not start up in HA environments
ConditionPathExists=!/var/lib/nfs/statd/sm.ha
if we run an nfs server, it needs to be running before we
tell clients that it has restarted.
After=nfs-server.service
PartOf=nfs-utils.service
Wants=nfs-config.service
After=nfs-config.service
[Service]
EnvironmentFile=-/run/sysconfig/nfs-utils
Type=forking
ExecStart=-/usr/sbin/sm-notify $SMNOTIFYARGS
while cloud-init.service is:
[Unit]
Description=Initial cloud-init job (metadata service crawler)
Wants=cloud-init-local.service
Wants=sshd-keygen.service
Wants=sshd.service
After=cloud-init-local.service
After=NetworkManager.service network.service
Before=network-online.target # <---
Before=sshd-keygen.service
Before=sshd.service
Before=systemd-user-sessions.service
ConditionPathExists=!/etc/cloud/cloud-init.disabled
ConditionKernelCommandLine=!cloud-init=disabled
[Service]
Type=oneshot
ExecStart=/usr/bin/cloud-init init
RemainAfterExit=yes
TimeoutSec=0
Output needs to appear in instance console output
StandardOutput=journal+console
[Install]
WantedBy=cloud-init.target
So cloud-init is to be started before network-online.target, while rpc-statd* are to be started after network-online.target.
CX has demonstrated this to my satisfaction.
I see a few possible paths here:
- CX has to change the (rpc-statd|rpc-statd-notify).service so that they now state:
Before=network-online.target
#Wants=network-online.target
#After network-online.target
- CX has to change cloud-init.service so that it now states:
Wants=network-online.target
After=network-online.target
#Before=network-online.target
- CX removes the NFS mount from /etc/fstab, and adds it as a systemd .mount unit
CX opted for change #1 above, and now sees no boot issues.
There is a Red Hat bug about that: https://bugzilla.redhat.com/show_bug.cgi?id=1858930, but it was closed WONTFIX because... support for RHEL7 ended :-(. . I also tried to search on bugzilla and Launchpad for related bugs on RHEL(7|8), but did not find any.
This bug was originally filed in Launchpad as LP: #1913354
Launchpad details
Launchpad user C de-Avillez(hggdh2) wrote on 2021-01-26T23:35:15.552578+00:00
Azure, RHEL 7.8, 7.9 and OEL 7.8, 7.9.
On OEL 7.8 cloud-init is cloud-init-18.5-6.el7.x86_64
On both OEL and RHel 7.* (certainly 7.8 and 7.9), if we have a NFS mount in /etc/fstab (unknown if this applies to NFSv4), then boot may not complete. The end result is a hang, and the system is inaccessible from SSH or serial console login.
All points to a deadlock between the starting of the rpc.statd and rpc.statd-notify services and the cloud-init.service.
This happens because rpc.statd and rpc.statd-notify have the following dependencies declared:
rcp-statd.service
[Unit]
Description=NFS status monitor for NFSv2/3 locking.
DefaultDependencies=no
Conflicts=umount.target
Requires=nss-lookup.target rpcbind.socket
Wants=network-online.target # <---
After=network-online.target nss-lookup.target rpcbind.socket # <---
PartOf=nfs-utils.service
Wants=nfs-config.service
After=nfs-config.service
[Service]
Environment=RPC_STATD_NO_NOTIFY=1
EnvironmentFile=-/run/sysconfig/nfs-utils
Type=forking
PIDFile=/var/run/rpc.statd.pid
ExecStart=/usr/sbin/rpc.statd $STATDARGS
rpc-statd-notify.service:
[Unit]
Description=Notify NFS peers of a restart
DefaultDependencies=no
Wants=network-online.target # <---
After=local-fs.target network-online.target nss-lookup.target # <---
Do not start up in HA environments
ConditionPathExists=!/var/lib/nfs/statd/sm.ha
if we run an nfs server, it needs to be running before we
tell clients that it has restarted.
After=nfs-server.service
PartOf=nfs-utils.service
Wants=nfs-config.service
After=nfs-config.service
[Service]
EnvironmentFile=-/run/sysconfig/nfs-utils
Type=forking
ExecStart=-/usr/sbin/sm-notify $SMNOTIFYARGS
while cloud-init.service is:
[Unit]
Description=Initial cloud-init job (metadata service crawler)
Wants=cloud-init-local.service
Wants=sshd-keygen.service
Wants=sshd.service
After=cloud-init-local.service
After=NetworkManager.service network.service
Before=network-online.target # <---
Before=sshd-keygen.service
Before=sshd.service
Before=systemd-user-sessions.service
ConditionPathExists=!/etc/cloud/cloud-init.disabled
ConditionKernelCommandLine=!cloud-init=disabled
[Service]
Type=oneshot
ExecStart=/usr/bin/cloud-init init
RemainAfterExit=yes
TimeoutSec=0
Output needs to appear in instance console output
StandardOutput=journal+console
[Install]
WantedBy=cloud-init.target
So cloud-init is to be started before network-online.target, while rpc-statd* are to be started after network-online.target.
CX has demonstrated this to my satisfaction.
I see a few possible paths here:
Before=network-online.target
#Wants=network-online.target
#After network-online.target
Wants=network-online.target
After=network-online.target
#Before=network-online.target
CX opted for change #1 above, and now sees no boot issues.
There is a Red Hat bug about that: https://bugzilla.redhat.com/show_bug.cgi?id=1858930, but it was closed WONTFIX because... support for RHEL7 ended :-(. . I also tried to search on bugzilla and Launchpad for related bugs on RHEL(7|8), but did not find any.