Handle error if SSH service no present.#1422
Conversation
blackboxsw
left a comment
There was a problem hiding this comment.
Thanks for this PR @aciba90.
I don't think we want to spend time calling into update_ssh_config on a system that doesn't have the necessary service to run ssh.
This suggestion below may be more work than it's worth, I look forward to your thoughts either way.
Ideally I think we want a pre-flight check at the beginning of handle_ssh_pwauth function if the ssh_svcname in question does not exist for restarting. If not present, exit early with a LOG.warning(f"Ignoring config 'ssh_pwauth: {pw_auth}'. Service {service} is not installed")
This PR is not straight-forward for this situation because we can't easily invoke subp.which for the service we are trying to restart because we are talking to the distro.manage_service layer which has significantly different implementations on each distro for systemd vs. non-systemd init systems.
I wonder if we might be able to add a "status" command to distro.manage_service to report whether or not the service actually exists. We could then try/except around a call to distro.manage_service("status", service).
| try: | ||
| distro.manage_service("restart", service) | ||
| except subp.ProcessExecutionError as e: | ||
| LOG.warning("Failed to restart the SSH deamon. %s: %s", service, e) |
There was a problem hiding this comment.
Generally let's try to avoid the words "Failed" and "error" in warning level logs as they are confusing when you grep in /var/log/cloud-init.log. If we stick with this approach, please replace Failed
| LOG.warning("Failed to restart the SSH deamon. %s: %s", service, e) | |
| LOG.warning(f"Wrote '{cfg_name} {cfg_val}' to {DEF_SSHD_CFG}. Unable to restart the SSH deamon {service}: {e}") |
There was a problem hiding this comment.
Thanks for the suggestion @blackboxsw.
I wonder if we might be able to add a "status" command to
distro.manage_serviceto report whether or not the service actually exists. We could then try/except around a call todistro.manage_service("status", service).
I was thinking exactly about that. Let me rework this PR.
6a6faba to
75c3583
Compare
aciba90
left a comment
There was a problem hiding this comment.
@blackboxsw this one is ready to be reviewed again. I have implemented the status command and the pre-flight check as suggested.
Thanks for you comments.
| except subp.ProcessExecutionError as e: | ||
| LOG.warning( | ||
| ( | ||
| "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is " |
There was a problem hiding this comment.
Ideally I think we want a pre-flight check at the beginning of handle_ssh_pwauth function if the ssh_svcname in question does not exist for restarting. If not present, exit early with a LOG.warning(f"Ignoring config 'ssh_pwauth: {pw_auth}'. Service {service} is not installed")
I though that it was more generic to say that the service is not running, as a non 0 exit from systemctl status service no always does mean that the service is not installed.
There was a problem hiding this comment.
I think we can differentiate our messages here based on e.exit_code in most cases.
On any distro with systemctl (most):
if e.exit_code == 4: the named service (so we can report "is not installed" in our warning)
elif e.exit_code == 3 then named sevice is installed but in stopped state
In this case, we should write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is stopped.
else:
return with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"There was a problem hiding this comment.
I think we can either differentiate our messages here based on e.exit_code in most cases or try to instrument a "is-enabled" action for manage_service that would
Checking man systemctl we can look to the following error codes
├──────┼─────────────────────┼─────────────────────┤
│0 │ "program is running │ unit is active │
│ │ or service is OK" │ │
├──────┼─────────────────────┼─────────────────────┤
│1 │ "program is dead │ unit not failed │
│ │ and /var/run pid │ (used by is-failed) │
│ │ file exists" │ │
├──────┼─────────────────────┼─────────────────────┤
│2 │ "program is dead │ unused │
│ │ and /var/lock lock │ │
│ │ file exists" │ │
├──────┼─────────────────────┼─────────────────────┤
│3 │ "program is not │ unit is not active │
│ │ running" │ │
├──────┼─────────────────────┼─────────────────────┤
│4 │ "program or service │ no such unit │
│ │ status is unknown" │ │
└──────┴─────────────────────┴─────────────────────┘
On any distro with systemctl (most):
if e.exit_code == 4:
e can report "is not installed" in our warning and return
elif e.exit_code == 3 then named sevice is installed but in disabled state
we should still write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is disabled.
else:
exit with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"There was a problem hiding this comment.
I have verified these exit_codes align on opensuse, rockylinux and ubuntu.
blackboxsw
left a comment
There was a problem hiding this comment.
I think adding the "status" operation to manage_services is helpful, we could more specifically call systemctl is-enabled but then we'd have to provide a different option any distro which doesn't support "is-enabled` command. I think it is probably cleaner at this iteration to properly look at exit codes:
if exit_code == 0: write config and restart
exit_code == 3: write_config but warn about not restarting ssh because service is in disabled state
exit_code == 4: warn sshd is not installed and return
other exit code: warn "sshd is unavailable due to {e}", to provide the exception message and return
here's a mockup diff of what I'm alluding to if this makes sense. I'm open to discussion and or disagreement here if you think this suggestion doesn't make sense.
diff --git a/cloudinit/config/cc_set_passwords.py b/cloudinit/config/cc_set_passwords.py
index d952ba4d7..927d0bf7b 100644
--- a/cloudinit/config/cc_set_passwords.py
+++ b/cloudinit/config/cc_set_passwords.py
@@ -88,23 +88,25 @@ def handle_ssh_pwauth(pw_auth, distro: Distro):
@return: None"""
service = distro.get_option("ssh_svcname", "ssh")
+ restart_ssh = True
try:
distro.manage_service("status", service)
except subp.ProcessExecutionError as e:
- LOG.warning(
- (
- "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
- "not running."
- ),
- pw_auth,
- service,
- )
- LOG.debug(
- "SSH deamon service '%s' is not running: %s",
- service,
- e,
- )
- return
+ if e.exit_code == 3:
+ # Continue to write ssh but don't invoke restart
+ restart_ssh = False
+ elif e.exit_code == 4:
+ LOG.warning(
+ f"Ignoring config 'ssh_pwauth: {pw_auth}'."
+ f" SSH daemon '{service}' is not installed."
+ )
+ return
+ else:
+ LOG.warning(
+ f"Ignoring config 'ssh_pwauth: {pw_auth}'."
+ f" SSH daemon '{service}' is unavailable. Error: {e}."
+ )
+ return
cfg_name = "PasswordAuthentication"
@@ -131,7 +133,10 @@ def handle_ssh_pwauth(pw_auth, distro: Distro):
LOG.debug("No need to restart SSH service, %s not updated.", cfg_name)
return
- distro.manage_service("restart", service)
+ if restart_ssh:
+ distro.manage_service("restart", service)
+ else:
+ LOG.debug("Not restarting SSH service: service is stopped.")
LOG.debug("Restarted the SSH daemon.")| except subp.ProcessExecutionError as e: | ||
| LOG.warning( | ||
| ( | ||
| "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is " |
There was a problem hiding this comment.
| "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is " | |
| "Ignoring config 'ssh_pwauth: %s'. SSH daemon service '%s' is " |
| service, | ||
| ) | ||
| LOG.debug( | ||
| "SSH deamon service '%s' is not running: %s", |
There was a problem hiding this comment.
| "SSH deamon service '%s' is not running: %s", | |
| "SSH daemon service '%s' is not running: %s", |
| except subp.ProcessExecutionError as e: | ||
| LOG.warning( | ||
| ( | ||
| "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is " |
There was a problem hiding this comment.
I think we can differentiate our messages here based on e.exit_code in most cases.
On any distro with systemctl (most):
if e.exit_code == 4: the named service (so we can report "is not installed" in our warning)
elif e.exit_code == 3 then named sevice is installed but in stopped state
In this case, we should write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is stopped.
else:
return with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"| except subp.ProcessExecutionError as e: | ||
| LOG.warning( | ||
| ( | ||
| "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is " |
There was a problem hiding this comment.
I think we can either differentiate our messages here based on e.exit_code in most cases or try to instrument a "is-enabled" action for manage_service that would
Checking man systemctl we can look to the following error codes
├──────┼─────────────────────┼─────────────────────┤
│0 │ "program is running │ unit is active │
│ │ or service is OK" │ │
├──────┼─────────────────────┼─────────────────────┤
│1 │ "program is dead │ unit not failed │
│ │ and /var/run pid │ (used by is-failed) │
│ │ file exists" │ │
├──────┼─────────────────────┼─────────────────────┤
│2 │ "program is dead │ unused │
│ │ and /var/lock lock │ │
│ │ file exists" │ │
├──────┼─────────────────────┼─────────────────────┤
│3 │ "program is not │ unit is not active │
│ │ running" │ │
├──────┼─────────────────────┼─────────────────────┤
│4 │ "program or service │ no such unit │
│ │ status is unknown" │ │
└──────┴─────────────────────┴─────────────────────┘
On any distro with systemctl (most):
if e.exit_code == 4:
e can report "is not installed" in our warning and return
elif e.exit_code == 3 then named sevice is installed but in disabled state
we should still write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is disabled.
else:
exit with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"| except subp.ProcessExecutionError as e: | ||
| LOG.warning( | ||
| ( | ||
| "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is " |
There was a problem hiding this comment.
I have verified these exit_codes align on opensuse, rockylinux and ubuntu.
- Hanlde the case when user-data provides ssh_pwauth and the open-ssh service is not available - Add openssh-server as suggested package for debian-based distros
27b8a96 to
9bdf932
Compare
|
Thanks for the suggestion @blackboxsw, I should have reacted to error_codes. Those error codes are part of a Reference specification, therefore I though it is more likely that non-compliant init system adopt it than that they implement an sysvinit init systems seem to be also complaint with the specification, for example: Fedora. Ready to be reviewed. |
blackboxsw
left a comment
There was a problem hiding this comment.
Thank you for the supplemental work here @aciba90 and unittest coverage. I ran through various exit_code scenarios on a live environment and this behavior performs as expected.
@aciba90 please also provide pull requests to add Suggests: openssh-server to debian/control against upstream ubuntu/bionic, ubuntu/focal, ubuntu/jammy and ubuntu/devel branches.
|
I know this has already merged, but I think we need a systemd check around the error checking and handle non-systemd in a more generic way. @aciba90 , you point to the LSB, but the LSB hasn't been updated in 7 years and has never really been respected by many distros. I don't think we can say "do it LSB way or your distro is wrong". There's a few other sections on init scripts, and systemd breaks most of them. E.g., on alpine, calling What do you think? |
If non-systemd systems / services do not follow a convention we could either:
My gut decision would be to go with 1), as 2) sound complex to maintain. What do you think?
The only problematic thing here is that if @TheRealFalcon, do you think it makes sense to do a deeper investigation about exit codes? |
|
To me option 1 makes the most sense with additional support for systemd. |
Thanks pointing out this situation @TheRealFalcon. PR created: #1450 |
Proposed Commit Message
Additional Context
Test Steps
Expect status Done and a warning in the logs instead of the error.
Warning:
Failed to restart the SSH deamon. ...Checklist: