declare dependency on ssh-import-id (SC-864)#1334
Conversation
|
More context: It looks like the LXD image |
|
Update: It looks like the same is true for |
|
Looking over Given that the image you ran across didn't have openssh-server installed either, I think it's fair to expect cloud-init to not work in this scenario where a dependency is mssing and I also don't want to push ssh-import-id (and openssh-server) onto every image cloud-init is installed on as there are many use cases where ssh_import_id is not provided in user-data and there also may be use-cases where openssh-server is undesired on a slim image. Free free to disagree with these points,
2a: think we could be a bit more proactive in this scenario to inform users better of the underlying dependency issue if not sub.which('ssh-import-id'):
log.warning(f"Command not found: ssh-import-id. Unable to import user keys {ids}")
returnAlso add docs to cc_ssh_import_id module that this will only run on systems with ssh-import-id and openssh-server installed 2b: If ssh-import-id doesn't exist we could document that we will install that package and openssh-server if ssh_import_ids are provided in user-data. Option 2a is easy and clears up expectations instead of just logging an ugly traceback about the failure.
What do folks think? |
This is surprising to me. How else do you connect to your instance? I get that there are use cases where all non-essential services are removed and only the ports open necessary for your app, but aren't those usually heavily customized images anyway?
Our prior art looks to be ubuntu/debian modules exclusively, so installing dependencies in those modules is somewhat safer. Technically, our ssh-import-id module is debian/ubuntu specific too, but there's no reason it has to be. I'd much prefer we go the route of 2a. We could have a docs note in modules that they require an additional dependency depending on your distro, and to add that dependency to the install packages section of the user data. Is there a reason we can't move the install packages module to be earlier in our boot cycle? Some modules are already invoking it earlier than when we run it. |
It's worth noting that openssh-server recommends ssh-import-id, it doesn't depend on it. The functionality of ssh-import-id is something akin to: This is an auxiliary to openssh-server, not a core requirement. I'd argue that similarly cloud-init should "recommend" ssh-import-id. Per https://www.debian.org/doc/debian-policy/ch-relationships.html: If we aren't going to declare it a full dependency, this sounds right to me. On the other hand, is 1.5MB (the dependency of openssh-server + ssh-import-id) of disk even a concern in cloud deployments these days? I'm fine with saying "no we value the space right now", but I figure it's worth asking - this industry changes fast. If we'd rather not deal with the dependencies, maybe we just
I was unaware we had modules that installed packages at runtime. Agreed, that sounds fragile. I'd prefer we stay away from 2b. Currently, I think 2a + Recommends makes the most sense, but also I'm no packaging expert so feel free to tell me why I'm wrong. |
|
2b: If ssh-import-id doesn't exist we could document that we will install
that package and openssh-server if ssh_import_ids are provided in user-data.
I'm rather strictly against silently adding any kind of potential open door
to any system.
Gladly the discussion has kind of settled on variations of 2a.
|
|
This is an auxiliary to openssh-server, not a core requirement. I'd argue that similarly cloud-init should "recommend" ssh-import-id
I was first thinking the same, but let us consider the possible scenarios:
a) system has no ssh-server and thereby also not the ssh-import-id that it
recommends.
Cloud-init now brings ssh-import-id via this new dependencies ...
People think it will work (positive log message) but it won't
We consume space on disk for nothing.
b) system has ssh-server, ssh-import-id was installed as recommends -
cloud-init works fine
c) system has ssh-server but someone took explicit measures to remove or
not install it in the first place
I'd be more tempted to call this a "Suggests" dependency on cloud-init
combined with Chads suggestion of
1) harden cloud-init to report well about the problem
2) mention in the documentation that ssh-import-id and an ssh server will
need to be present
2bonus) if there is a way to ensure via cloud init config that they are
present for the unlikely case they are not, mention that as an example
there. This allows users/admins to make a choice instead of forcing it onto
them.
Finally, yes every byte counts for image size considerations.
It is more often the summary of "that bit won't hurt" than a single big
chunk.
P.S. I have no deciding voice here, but wanted to add my POV.
|
fd2fc08 to
e356062
Compare
eb682cf to
04abd5d
Compare
|
@cpaelzer @TheRealFalcon @blackboxsw - Thanks for the input on this. I think it's ready for review. |
TheRealFalcon
left a comment
There was a problem hiding this comment.
One nit inline but approve either way
43d01d9 to
1dd6da3
Compare
|
@blackboxsw - Any reason we can't just walk the config checking for keys matching |
|
|
||
|
|
||
| # vi: ts=4 expandtab | ||
| def is_key_in_nested_dict(config: dict, search_key: str) -> bool: |
There was a problem hiding this comment.
This looks reasonable. It's probably a non-issue, but here are the costs of this generic function:
- false positives if there are
ssh_import_idkeys under paths besides "users", and "user". (This implies invalid user-data schema anyway) - this function does more work that a specific and targeted function which only looks for the supported ssh_import_id paths. (user-data is generally small for systems and a small recursive loop across cloud-config dict really doesn't cost too much)
The only alternative I could think of here would be only looking at the places where we support this key with something like the following:
if "ssh_import_id" in config:
return True
if "user" in config and isinstance(config["user"], dict):
if "ssh_import_id" in config["user"]:
return True
if "users" in config:
if isinstance(config["users"], dict):
for user, user_config:
if "ssh_import_id" in user_config:
return True
if isinstance(config["users"], list):
for user_config in config["users"]:
if "ssh_import_id" in user_config:
return True
return FalseWhen I run timeit over the following I see the generic function is twice as long (though it's a tiny cost per run amount)
# {"ssh_import_id": "xxxx", "packages": ["sl"]}
specific 1.0499926080001387
general recursive 1.904686696000681
# d2 = {"user": {"ssh_import_id": "yep"}, "packages": ["sl"]}
specific 2.0969378739991953
general recursive 4.546597008999925
# d3 = {"users": [{"ssh_import_id": "yep"}], "packages": ["sl"]}
specific 3.2126418679999915
general recursive 6.143558423999821
There was a problem hiding this comment.
My junker timeit test
#!/usr/bin/python3
def is_ssh_import_id_set(config, key="ssh_import_id"):
if key in config:
return True
if "user" in config and isinstance(config["user"], dict):
if key in config["user"]:
return True
if "users" in config:
if isinstance(config["users"], dict):
for _username, user_config in config["users"].items():
if key in user_config:
return True
if isinstance(config["users"], list):
for user_config in config["users"]:
if key in user_config:
return True
return False
def is_key_in_nested_dict(config: dict, search_key: str) -> bool:
"""Search for key nested in config.
Note: A dict embedded in a list of lists will not be found walked - but in
this case we don't need it.
"""
for config_key in config.keys():
if search_key == config_key:
return True
if isinstance(config[config_key], dict):
return is_key_in_nested_dict(config[config_key], search_key)
if isinstance(config[config_key], list):
# this code could probably be generalized to walking the whole
# config by iterating lists in search of dictionaries
for item in config[config_key]:
if isinstance(item, dict):
return is_key_in_nested_dict(item, search_key)
return False
import timeit
d1 = {"ssh_import_id": "yep", "packages": ["sl"]}
d2 = {"user": {"ssh_import_id": "yep"}, "packages": ["sl"]}
d3 = {"users": [{"ssh_import_id": "yep"}], "packages": ["sl"]}
print("specific ", timeit.timeit('is_ssh_import_id_set(d3, "ssh_import_id")', globals=globals(), number=10000000))
print("general recursive ", timeit.timeit('is_key_in_nested_dict(d3, "ssh_import_id")', globals=globals(), number=10000000))ef40baa to
bec04bf
Compare
Co-authored-by: ubuntu-server-builder <blackboxsw+builder@gmail.com>
blackboxsw
left a comment
There was a problem hiding this comment.
LGTM. Assuming CI passes, should be good here.
Context:
We have a module that uses ssh-import-id. Users of ssh-import-id will want this package installed.