feat(wsl): Reuse metadata as Landscape installation_request_id#6200
Conversation
Such that if we have the relevant landscape config, we amend it with the instance as as installation_request_id immediately.
We only supported loading from the user dir. Pro has no usage for metadata so far. Now we got one: passing the instance ID assigned from Landscape.
For consistency.
In preparation for injecting metadata written by the pro agent
And validating installation_request_id is assigned when landscape config exists.
|
@blackboxsw , pinging you since you have all the context of the spec. |
…loud-init (#1211) We discussed the details of this implementation in the specification WS042. The gist of it is: - The Landscape server wants to track the completion of a child instance installation via a new field added to the client configuration - The field is called `installation_request_id` and the client sends that value as part of the registration message - The server expects that value to match the `requestID` of the installation command sent to the UP4W agent. - That's a unique, per instance, value, only assigned to instances installed upon a Landscape server request, thus won't fit into agent.yaml. - It has the same semantics of a machine ID or cloud-init instance ID. - Thus, we agreed to make the WSL datasource assign `metadata.instance-id` to that value if a Landscape client config exists but without the `installation_request_id` field Cloud-init PR canonical/cloud-init#6200 handles the changes needed in the datasource itself. Implementation details may change, but we agreed in the spec that reusing the `metadata.instance-id` for this purpose just makes sense, so I believe we can proceed with the agent side in the meanwhile. --- UDENG-6772
blackboxsw
left a comment
There was a problem hiding this comment.
Apologies for the delay @CarlosNihelton thanks for your continued work on WSL features for cloud-init. I've logged a few inline comments to hopefully add to clarity of the code and adhere to best-practices for __init__ and static strings.
I think one comment and question about whether we expect ['landscape']['client'] in all config provided by Pro and whether we should raise a ValueError and explanatory message if we receive a config_dict which doesn't contain such keys.
|
|
||
| # If we don't have a landscape client installation_request_id | ||
| # but have valid instance-id in the metadata, let's reuse it: | ||
| self.config_dict["landscape"]["client"][ |
There was a problem hiding this comment.
Is it possible that we get config_dict without "landscape" or "client" key defined? If it is, we may need to check and set the empty dict to start:
if LANDSCAPE_CFG_KEY not in self.config_dict:
self.config_dict[LANDSCAPE_CFG_KEY] = {}
if LANDSCAPE_CLIENT_CFG_KEY not in self.config[LANDSCAPE_CFG_KEY][LANDSCAPE_CLIENT_CFG_KEY] :
self.config_dict[LANDSCAPE_CFG_KEY][LANDSCAPE_CLIENT_CFG] = {}
There was a problem hiding this comment.
It's possible that we only get a pro token and no landscape config via the pro agent user data. I think your refactoring suggested below fixes this and other issues with the __init__ method, thanks for pointing that out.
| if self.config_dict is None: | ||
| return | ||
| if not instance_id or instance_id == DEFAULT_INSTANCE_ID: | ||
| return | ||
| try: | ||
| if ( | ||
| self.config_dict["landscape"]["client"].get( | ||
| "installation_request_id" | ||
| ) | ||
| is not None | ||
| ): | ||
| return | ||
|
|
||
| # If we don't have a landscape client installation_request_id | ||
| # but have valid instance-id in the metadata, let's reuse it: | ||
| self.config_dict["landscape"]["client"][ | ||
| "installation_request_id" | ||
| ] = instance_id | ||
| except KeyError: | ||
| # client config doesn't exist, don't bother | ||
| pass | ||
|
|
||
| self.raw = "#cloud-config\n%s" % yaml.dump(self.config_dict).strip() |
There was a problem hiding this comment.
While generally, I'm onboard with early exit/returns in functions/methods to avoid nested indents, __init__ is a special case where we shouldn't early return.
-
Adding early empty returns inside
__init__methods are generally considered to be a bad practice as__init__method is intended to fully initialize instances. Future developers could come into this method and not see such early returns added logic may not be triggered due to an unexpected early exit. -
If we use hard-coded strings like "landscape" and "client" we should define global vars for those LANDSCAPE_CFG_KEY and LANDSCAPE_CLIENT_CFG_KEY which better advertizes our config key expectations.
-
no need to try/except KeyError if we instead
dict.get(, {})defaults to an empty dict when the key doesn't exist. Useself.config_dict.get(LANDSCAPE_CFG_KEY, {}).get("LANDSCAPE_CFG_CLIENT_KEY, {})instead.
For clarity, let's ensure we scope any initialization operations nested within the conditional truthy value we expect before we initialize.
Maybe something like this instead
| if self.config_dict is None: | |
| return | |
| if not instance_id or instance_id == DEFAULT_INSTANCE_ID: | |
| return | |
| try: | |
| if ( | |
| self.config_dict["landscape"]["client"].get( | |
| "installation_request_id" | |
| ) | |
| is not None | |
| ): | |
| return | |
| # If we don't have a landscape client installation_request_id | |
| # but have valid instance-id in the metadata, let's reuse it: | |
| self.config_dict["landscape"]["client"][ | |
| "installation_request_id" | |
| ] = instance_id | |
| except KeyError: | |
| # client config doesn't exist, don't bother | |
| pass | |
| self.raw = "#cloud-config\n%s" % yaml.dump(self.config_dict).strip() | |
| if "text/cloud-config" == type_from_starts_with(self.raw): | |
| self.config_dict = util.load_yaml(self.raw) | |
| if ( | |
| self.config_dict # Valid non-empty config | |
| and instance_id | |
| and instance_id != DEFAULT_INSTANCE_ID # Non-default instance-id | |
| and self.config_dict.get(LANDSCAPE_CFG_KEY, {}) | |
| .get(LANDSCAPE_CLIENT_CFG_KEY, {}) | |
| .get(LANDSCAPE_INSTALLATION_REQ_ID) | |
| is None # No landscape installation_request_id | |
| ): | |
| # We don't have a landscape client installation_request_id | |
| # but have valid instance-id in the metadata, let's reuse it: | |
| self.config_dict["landscape"]["client"][ | |
| LANDSCAPE_INSTALLATION_REQ_ID | |
| ] = instance_id | |
| self.raw = ( | |
| "#cloud-config\n%s" % yaml.dump(self.config_dict).strip() | |
| ) |
__init__ is a special case where we shouldn't early return.
1. Adding early empty returns inside __init__ methods are generally
considered to be a bad practice as __init__ method is intended to
fully initialize instances. Future developers could come into
this method and not see such early returns added logic may not be
triggered due to an unexpected early exit.
2. If we use hard-coded strings like "landscape" and "client" we
should define global vars for those LANDSCAPE_CFG_KEY and
LANDSCAPE_CLIENT_CFG_KEY which better advertizes our config key
expectations.
3. no need to try/except KeyError if we instead dict.get(, {})
defaults to an empty dict when the key doesn't exist. Use
self.config_dict.get(LANDSCAPE_CFG_KEY,
{}).get("LANDSCAPE_CFG_CLIENT_KEY, {}) instead.
For clarity, let's ensure we scope any initialization operations nested
within the conditional truthy value we expect before we initialize.
- load_text_file returns a dict, so should be the default arg. - we may have other fallbacks for the metadata value, so let's just log the file path if we don't find data in it.
No need for the extra local var candidates as we end up looping through the values twice to construct non None items and then to re-iterate through those items. Let's also now report if we found no candidates anywhere.
| req_id = self.config_dict[LANDSCAPE_CFG_KEY][ | ||
| LANDSCAPE_CLIENT_CFG_KEY | ||
| ].setdefault(LANDSCAPE_INSTALLATION_REQ_ID, instance_id) | ||
| if req_id == instance_id: | ||
| self.raw = ( | ||
| "#cloud-config\n%s" % yaml.dump(self.config_dict).strip() | ||
| ) |
There was a problem hiding this comment.
@CarlosNihelton: In the related spec WS024, it was mentioned that only landscape-client version 2.11 and later support this config key. Prior versions of landscape-client will break when landscape-config is called. Since landscape-client v 2.11 is only available on plucky and later, we may want to prevent writing this landscape::client::installation_request_id if we know we are on a 2.11 version of landscape-client or less. This changeset though may only be able to be performed once we are sure landscape-client is installed in the environment cloudinit.config.cc_landscape.py.
Without this version-aware gate, WSL releases earlier than plucky could generate errors once cc_landscape calls lansdscape-config with an unknown config key.
There was a problem hiding this comment.
Very good point. While the Landscape team is preparing themselves for SRU'ing that feature, we need to add some version checks to prevent failures. I'm adding a quick-to-revert check with a TODO item for when the necessary support lands on the interesting Ubuntu releases.
There was a problem hiding this comment.
Thanks @CarlosNihelton. Are we going to add that version check to this PR or somewhere else? Given that this changeset will SRU back to Jammy++ as part of 25.2 how do we ensure we don't break WSL-related image streams on Jammy++?
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks @CarlosNihelton +1 on this approach if there is no ability on the command-line to interact with landscape-client or landscape-config to confirm whether or not the config key installation_request_id is supported.
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
landscape-config doesn't support the installation_request_id before Ubuntu 25.04 as of Jun-2025, so we cannot use it yet. That's expected to change in the near future, as the Landscape team is SRU'ing that and other features to older releases. TODO: remove this check when the SRUs are done.
afd4902 to
565f8b7
Compare
| or the user cloud-int dir based on the instance name | ||
| """ | ||
| metadata = {"instance-id": DEFAULT_INSTANCE_ID} | ||
| pro_dir = cloud_init_data_dir(user_home / ".ubuntupro") |
There was a problem hiding this comment.
Should this actually be the following to align with docs and ds-identify behavior?
| pro_dir = cloud_init_data_dir(user_home / ".ubuntupro") | |
| pro_dir = cloud_init_data_dir(user_home / ".ubuntupro/.cloud-init") |
There was a problem hiding this comment.
Joining the cloud-init subdirectory happens inside the cloud_init_data_dir function here.
|
I think there is an inconsistency in this PR as far as the |
| if self.distro.name == "ubuntu": | ||
| agent_data, user_data = load_ubuntu_pro_data(user_home) | ||
| _, version_id, _ = util.get_linux_distro() | ||
| if version_id < "25.04": |
There was a problem hiding this comment.
This then hardcodes cloud-init to avoid using IID for an extended period of time after landscape-client/landscape-config eventually completes an SRU and someone notices changes it in upstream cloud-init and awaits a quarterly SRU of cloud-init back to the expected Ubuntu series. If there is no other way for us to call out to landscape to determine if this feature exists or is supported, I'm ok with this approach. But if there is a way we can subprocess out to try running landcape-config or landscape-client to perform a preflight check that'd be better and it would decouple cloud-init SRUs from the eventual support in landscape once it SRUs.
There was a problem hiding this comment.
There is always some assumption being hardcoded. For example, we could instead:
- Hardcode the landscape-client version by checking against the parsed result of
apt-cache policy landscape-clientor similar (results then may change when apt cache is updated) - Hardcode the assumption that the landscape-client is already installed when the datasource runs (currently true as we ship landscape-client by default, but remember that the landscape module is capable of triggering the installation and the packages module may upgrade it) and then check the package version via
landscape-config --versionorlandscape-client --version(which are nicer to parse in the end)
The version comparison needs to know which version started supporting that feature for each Ubuntu LTS, which we don't know for sure at this point, so it's likely that once landscape-client SRUs are done we need to come back to this anyway.
So I'm taking the easiest-to-revert approach.
There was a problem hiding this comment.
@CarlosNihelton I was hoping landscape would have some external indicator that reported whether the config key installation_request_id was supported.. But, since it's undocumented, hard-coding becomes the only option here. Works for me. no need to hardcode specific versions of landscape-client or config because that doesn't truly indicate either whether the support for installation_request_id actually got released there.
many thanks!
There was a problem hiding this comment.
In poking on questing we can use lansdcape-config --help which does actually document installation-request-id which should suffice as a feature check. So instead of if version_id < "25.04": we can do the following:
out, _ = subp.subp(["landscape-config", "--help"])
if "--installation-request-id" in out:
There was a problem hiding this comment.
Thanks for pushing the boundaries of this. The help does indeed contain the information we need.
In this particular case we're looking only for installation_request_id. Tests were updated to make sure we don't blow up due unexpected use of subp.
blackboxsw
left a comment
There was a problem hiding this comment.
Good work on the back and forth here. I'm happy with the decoupling of cloud-init from landscape-config SRUs. Thanks again @CarlosNihelton !
…ical#6200) Recently the Landscape server started tracking the status of commands delivered to clients that run on parent instances, notably the Ubuntu Pro for WSL agent, by assigning request IDs which the pro agent refers to when replying with the command completion status. For the case of the Install command, which results in a new WSL instance being created on the parent Windows machine and registered with Landscape, the server also expects the same value to be part of the landscape-client registration message. For that we need the `installation_request_id` to be added to the client.conf file. We found that synergic to the concept of `metadata.instance-id`, thus this refactoring in the WSL datasource aims to allow: - the pro agent to write a per-instance metadata file - the WSL data source to use the `metadata.instance-id` as `installation_request_id` - if the landscape.client config exists - if doesn't contain that field. Details were discussed in the specification document WS042 - Handle Landscape request_id in command Install via cloud-init
Checklist:
tests/unittests/cloudinit/example.pyshould be tested bytests/unittests/test_example.pytox -e py3tox -e doc.Proposed commit message
Merge type
Fixes: UDENG-6773