Skip to content

Handle error if SSH service no present.#1422

Merged
blackboxsw merged 5 commits into
canonical:mainfrom
aciba90:SC-968/better_warnings_instead_of_tracebacks_when_missing_openssh-server_in_minimal_images
May 10, 2022
Merged

Handle error if SSH service no present.#1422
blackboxsw merged 5 commits into
canonical:mainfrom
aciba90:SC-968/better_warnings_instead_of_tracebacks_when_missing_openssh-server_in_minimal_images

Conversation

@aciba90
Copy link
Copy Markdown
Contributor

@aciba90 aciba90 commented May 2, 2022

Proposed Commit Message

Add pre-fight check to verify if SSH is running

- In negative case then perform an early return with a warning and no 
  modifications
- Handle an exception when user-data provides 'ssh_pwauth: True'
  and the open-ssh service is not available
- Add openssh-server as suggested package for debian-based distros

SC-968
LP: #1969526

Additional Context

Test Steps

cat > ssh_needed.yaml << EOF
#cloud-config
ssh_pwauth: true
EOF

lxc launch ubuntu-daily:focal f1 -c user.user-data="$(cat ssh_needed.yaml)"
lxc exec f1 -- cloud-init status --wait
lxc exec f1 apt remove opensssh-server
# reset PasswordAuthentication off again so that cloud-init wants to reset ssh server
lxc exec f1 -- sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
# force clean reboot so cloud-init reruns
lxc exec f1 -- cloud-init clean --logs --reboot
lxc exec f1 -- cloud-init status --wait --long

Expect status Done and a warning in the logs instead of the error.
Warning: Failed to restart the SSH deamon. ...

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @aciba90.

I don't think we want to spend time calling into update_ssh_config on a system that doesn't have the necessary service to run ssh.
This suggestion below may be more work than it's worth, I look forward to your thoughts either way.

Ideally I think we want a pre-flight check at the beginning of handle_ssh_pwauth function if the ssh_svcname in question does not exist for restarting. If not present, exit early with a LOG.warning(f"Ignoring config 'ssh_pwauth: {pw_auth}'. Service {service} is not installed")

This PR is not straight-forward for this situation because we can't easily invoke subp.which for the service we are trying to restart because we are talking to the distro.manage_service layer which has significantly different implementations on each distro for systemd vs. non-systemd init systems.

I wonder if we might be able to add a "status" command to distro.manage_service to report whether or not the service actually exists. We could then try/except around a call to distro.manage_service("status", service).

Comment thread cloudinit/config/cc_set_passwords.py Outdated
try:
distro.manage_service("restart", service)
except subp.ProcessExecutionError as e:
LOG.warning("Failed to restart the SSH deamon. %s: %s", service, e)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally let's try to avoid the words "Failed" and "error" in warning level logs as they are confusing when you grep in /var/log/cloud-init.log. If we stick with this approach, please replace Failed

Suggested change
LOG.warning("Failed to restart the SSH deamon. %s: %s", service, e)
LOG.warning(f"Wrote '{cfg_name} {cfg_val}' to {DEF_SSHD_CFG}. Unable to restart the SSH deamon {service}: {e}")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @blackboxsw.

I wonder if we might be able to add a "status" command to distro.manage_service to report whether or not the service actually exists. We could then try/except around a call to distro.manage_service("status", service).

I was thinking exactly about that. Let me rework this PR.

@aciba90 aciba90 marked this pull request as draft May 4, 2022 06:10
@aciba90 aciba90 force-pushed the SC-968/better_warnings_instead_of_tracebacks_when_missing_openssh-server_in_minimal_images branch 2 times, most recently from 6a6faba to 75c3583 Compare May 6, 2022 09:23
Copy link
Copy Markdown
Contributor Author

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blackboxsw this one is ready to be reviewed again. I have implemented the status command and the pre-flight check as suggested.

Thanks for you comments.

Comment thread cloudinit/config/cc_set_passwords.py Outdated
except subp.ProcessExecutionError as e:
LOG.warning(
(
"Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I think we want a pre-flight check at the beginning of handle_ssh_pwauth function if the ssh_svcname in question does not exist for restarting. If not present, exit early with a LOG.warning(f"Ignoring config 'ssh_pwauth: {pw_auth}'. Service {service} is not installed")

I though that it was more generic to say that the service is not running, as a non 0 exit from systemctl status service no always does mean that the service is not installed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can differentiate our messages here based on e.exit_code in most cases.

On any distro with systemctl (most):

    if e.exit_code == 4: the named service  (so we can report "is not installed" in our warning)
    elif e.exit_code == 3 then named sevice is installed but in stopped state 
         In this case, we should write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is stopped.
    else:
         return with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can either differentiate our messages here based on e.exit_code in most cases or try to instrument a "is-enabled" action for manage_service that would

Checking man systemctl we can look to the following error codes


       ├──────┼─────────────────────┼─────────────────────┤
       │0     │ "program is running │ unit is active      │
       │      │ or service is OK"   │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │1     │ "program is dead    │ unit not failed     │
       │      │ and /var/run pid    │ (used by is-failed) │
       │      │ file exists"        │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │2     │ "program is dead    │ unused              │
       │      │ and /var/lock lock  │                     │
       │      │ file exists"        │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │3     │ "program is not     │ unit is not active  │
       │      │ running"            │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │4     │ "program or service │ no such unit        │
       │      │ status is unknown"  │                     │
       └──────┴─────────────────────┴─────────────────────┘
On any distro with systemctl (most):
    if e.exit_code == 4: 
           e can report "is not installed" in our warning and return
    elif e.exit_code == 3 then named sevice is installed but in disabled state 
         we should still write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is disabled.
    else:
         exit with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified these exit_codes align on opensuse, rockylinux and ubuntu.

@aciba90 aciba90 marked this pull request as ready for review May 6, 2022 09:57
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding the "status" operation to manage_services is helpful, we could more specifically call systemctl is-enabled but then we'd have to provide a different option any distro which doesn't support "is-enabled` command. I think it is probably cleaner at this iteration to properly look at exit codes:
if exit_code == 0: write config and restart
exit_code == 3: write_config but warn about not restarting ssh because service is in disabled state
exit_code == 4: warn sshd is not installed and return
other exit code: warn "sshd is unavailable due to {e}", to provide the exception message and return

here's a mockup diff of what I'm alluding to if this makes sense. I'm open to discussion and or disagreement here if you think this suggestion doesn't make sense.

diff --git a/cloudinit/config/cc_set_passwords.py b/cloudinit/config/cc_set_passwords.py
index d952ba4d7..927d0bf7b 100644
--- a/cloudinit/config/cc_set_passwords.py
+++ b/cloudinit/config/cc_set_passwords.py
@@ -88,23 +88,25 @@ def handle_ssh_pwauth(pw_auth, distro: Distro):
 
     @return: None"""
     service = distro.get_option("ssh_svcname", "ssh")
+    restart_ssh = True
     try:
         distro.manage_service("status", service)
     except subp.ProcessExecutionError as e:
-        LOG.warning(
-            (
-                "Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
-                "not running."
-            ),
-            pw_auth,
-            service,
-        )
-        LOG.debug(
-            "SSH deamon service '%s' is not running: %s",
-            service,
-            e,
-        )
-        return
+        if e.exit_code == 3:
+            # Continue to write ssh but don't invoke restart
+            restart_ssh = False
+        elif e.exit_code == 4:
+            LOG.warning(
+                f"Ignoring config 'ssh_pwauth: {pw_auth}'."
+                f" SSH daemon '{service}' is not installed."
+            )
+            return
+        else:
+            LOG.warning(
+                f"Ignoring config 'ssh_pwauth: {pw_auth}'."
+                f" SSH daemon '{service}' is unavailable. Error: {e}."
+            )
+            return
 
     cfg_name = "PasswordAuthentication"
 
@@ -131,7 +133,10 @@ def handle_ssh_pwauth(pw_auth, distro: Distro):
         LOG.debug("No need to restart SSH service, %s not updated.", cfg_name)
         return
 
-    distro.manage_service("restart", service)
+    if restart_ssh:
+        distro.manage_service("restart", service)
+    else:
+        LOG.debug("Not restarting SSH service: service is stopped.")
     LOG.debug("Restarted the SSH daemon.")

Comment thread cloudinit/config/cc_set_passwords.py Outdated
except subp.ProcessExecutionError as e:
LOG.warning(
(
"Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
"Ignoring config 'ssh_pwauth: %s'. SSH daemon service '%s' is "

Comment thread cloudinit/config/cc_set_passwords.py Outdated
service,
)
LOG.debug(
"SSH deamon service '%s' is not running: %s",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"SSH deamon service '%s' is not running: %s",
"SSH daemon service '%s' is not running: %s",

Comment thread cloudinit/config/cc_set_passwords.py Outdated
except subp.ProcessExecutionError as e:
LOG.warning(
(
"Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can differentiate our messages here based on e.exit_code in most cases.

On any distro with systemctl (most):

    if e.exit_code == 4: the named service  (so we can report "is not installed" in our warning)
    elif e.exit_code == 3 then named sevice is installed but in stopped state 
         In this case, we should write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is stopped.
    else:
         return with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"

Comment thread cloudinit/config/cc_set_passwords.py Outdated
except subp.ProcessExecutionError as e:
LOG.warning(
(
"Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can either differentiate our messages here based on e.exit_code in most cases or try to instrument a "is-enabled" action for manage_service that would

Checking man systemctl we can look to the following error codes


       ├──────┼─────────────────────┼─────────────────────┤
       │0     │ "program is running │ unit is active      │
       │      │ or service is OK"   │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │1     │ "program is dead    │ unit not failed     │
       │      │ and /var/run pid    │ (used by is-failed) │
       │      │ file exists"        │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │2     │ "program is dead    │ unused              │
       │      │ and /var/lock lock  │                     │
       │      │ file exists"        │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │3     │ "program is not     │ unit is not active  │
       │      │ running"            │                     │
       ├──────┼─────────────────────┼─────────────────────┤
       │4     │ "program or service │ no such unit        │
       │      │ status is unknown"  │                     │
       └──────┴─────────────────────┴─────────────────────┘
On any distro with systemctl (most):
    if e.exit_code == 4: 
           e can report "is not installed" in our warning and return
    elif e.exit_code == 3 then named sevice is installed but in disabled state 
         we should still write out the config changes, but emit a warning that cloud-init has written the ssh config but will not restart the service because it is disabled.
    else:
         exit with the warning, gnoring config 'ssh_pwauth: %s'. SSH deamon service '%s' "is not available"

Comment thread cloudinit/config/cc_set_passwords.py Outdated
except subp.ProcessExecutionError as e:
LOG.warning(
(
"Ignoring config 'ssh_pwauth: %s'. SSH deamon service '%s' is "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified these exit_codes align on opensuse, rockylinux and ubuntu.

aciba90 and others added 5 commits May 10, 2022 13:35
- Hanlde the case when user-data provides ssh_pwauth
and the open-ssh service is not available
- Add openssh-server as suggested package for debian-based distros
@aciba90 aciba90 force-pushed the SC-968/better_warnings_instead_of_tracebacks_when_missing_openssh-server_in_minimal_images branch from 27b8a96 to 9bdf932 Compare May 10, 2022 11:37
@aciba90
Copy link
Copy Markdown
Contributor Author

aciba90 commented May 10, 2022

Thanks for the suggestion @blackboxsw, I should have reacted to error_codes.

Those error codes are part of a Reference specification, therefore I though it is more likely that non-compliant init system adopt it than that they implement an is-enabled command.

sysvinit init systems seem to be also complaint with the specification, for example: Fedora.

Ready to be reviewed.

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the supplemental work here @aciba90 and unittest coverage. I ran through various exit_code scenarios on a live environment and this behavior performs as expected.

@aciba90 please also provide pull requests to add Suggests: openssh-server to debian/control against upstream ubuntu/bionic, ubuntu/focal, ubuntu/jammy and ubuntu/devel branches.

@blackboxsw blackboxsw merged commit 5054ffe into canonical:main May 10, 2022
@aciba90 aciba90 deleted the SC-968/better_warnings_instead_of_tracebacks_when_missing_openssh-server_in_minimal_images branch May 10, 2022 20:06
@TheRealFalcon
Copy link
Copy Markdown
Contributor

I know this has already merged, but I think we need a systemd check around the error checking and handle non-systemd in a more generic way. @aciba90 , you point to the LSB, but the LSB hasn't been updated in 7 years and has never really been respected by many distros. I don't think we can say "do it LSB way or your distro is wrong". There's a few other sections on init scripts, and systemd breaks most of them.

E.g., on alpine, calling service sshd status returns 1 if not installed, 3 if stopped, and 32 if crashed.

What do you think?

@aciba90
Copy link
Copy Markdown
Contributor Author

aciba90 commented May 12, 2022

I know this has already merged, but I think we need a systemd check around the error checking and handle non-systemd in a more generic way. @aciba90 , you point to the LSB, but the LSB hasn't been updated in 7 years and has never really been respected by many distros. I don't think we can say "do it LSB way or your distro is wrong". There's a few other sections on init scripts, and systemd breaks most of them.

If non-systemd systems / services do not follow a convention we could either:

  1. Only distinguish between 0 and non 0 exit codes. Apply the config and restart only if 0.
  2. Concrete implementations for non-complaint exit_codes for every non-complaint (init system, ssh service implementation, distro). As for example in System V, each init script implements its status logic.

My gut decision would be to go with 1), as 2) sound complex to maintain. What do you think?

E.g., on alpine, calling service sshd status returns 1 if not installed, 3 if stopped, and 32 if crashed.

The only problematic thing here is that if exit_code == 1 we say service is not available which is technically correct. I do not see a problem here.

@TheRealFalcon, do you think it makes sense to do a deeper investigation about exit codes?

@TheRealFalcon
Copy link
Copy Markdown
Contributor

To me option 1 makes the most sense with additional support for systemd.

@aciba90
Copy link
Copy Markdown
Contributor Author

aciba90 commented May 12, 2022

To me option 1 makes the most sense with additional support for systemd.

Thanks pointing out this situation @TheRealFalcon. PR created: #1450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants