Skip to content

feat(wsl): Reuse metadata as Landscape installation_request_id#6200

Merged
blackboxsw merged 14 commits into
canonical:mainfrom
cnihelton:wsl-request-id
Jul 3, 2025
Merged

feat(wsl): Reuse metadata as Landscape installation_request_id#6200
blackboxsw merged 14 commits into
canonical:mainfrom
cnihelton:wsl-request-id

Conversation

@cnihelton
Copy link
Copy Markdown

@cnihelton cnihelton commented May 2, 2025

Checklist:

  • I have signed the CLA: https://ubuntu.com/legal/contributors
  • I have included a comprehensive commit message using the guide below
  • I have added unit tests to cover the new behavior under tests/unittests/
    • Test files should map to source files i.e. a source file cloudinit/example.py should be tested by tests/unittests/test_example.py
    • Run unit tests with tox -e py3
  • I have kept the change small, avoiding unnecessary whitespace or non-functional changes.
  • I have added a reference to issues that this PR relates to in the PR message (Refs integration: do not LXD bind mount /etc/cloud/cloud.cfg.d #1234, Fixes integration: do not LXD bind mount /etc/cloud/cloud.cfg.d #1234)
  • I have updated the documentation with the changed behavior.
    • If the change doesn't change the user interface and is trivial, this step may be skipped.
    • Cloud-config documentation is generated from the jsonschema.
    • Generate docs with tox -e doc.

Proposed commit message

feat(wsl): Reuse metadata as Landscape installation_request_id

Recently the Landscape server started tracking the status of commands
delivered to clients that run on parent instances, notably the Ubuntu
Pro for WSL agent, by assigning request IDs which the pro agent refers
to when replying with the command completion status. For the case of
the Install command, which results in a new WSL instance being created
on the parent Windows machine and registered with Landscape, the
server also expects the same value to be part of the landscape-client
registration message. For that we need the `installation_request_id`
to be added to the client.conf file. We found that synergic to the concept
of `metadata.instance-id`, thus this refactoring in the WSL datasource aims
to allow:
- the pro agent to write a per-instance metadata file
- the WSL data source to use the `metadata.instance-id` as `installation_request_id`
   - if the landscape.client config exists
   - if doesn't contain that field.

Details were discussed in the specification document
"WS042 - Handle Landscape request_id in command Install via cloud-init"

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Fixes: UDENG-6773

Such that if we have the relevant landscape config, we
amend it with the instance as as installation_request_id
immediately.
We only supported loading from the user dir.
Pro has no usage for metadata so far.
Now we got one: passing the instance ID assigned from Landscape.
In preparation for injecting metadata written by the pro agent
And validating installation_request_id is assigned when
landscape config exists.
@github-actions github-actions Bot added the documentation This Pull Request changes documentation label May 2, 2025
@cnihelton cnihelton marked this pull request as ready for review May 2, 2025 14:53
@cnihelton
Copy link
Copy Markdown
Author

@blackboxsw , pinging you since you have all the context of the spec.

@blackboxsw blackboxsw self-assigned this May 5, 2025
cnihelton pushed a commit to canonical/ubuntu-pro-for-wsl that referenced this pull request May 7, 2025
…loud-init (#1211)

We discussed the details of this implementation in the specification
WS042. The gist of it is:
- The Landscape server wants to track the completion of a child instance
installation via a new field added to the client configuration
- The field is called `installation_request_id` and the client sends
that value as part of the registration message
- The server expects that value to match the `requestID` of the
installation command sent to the UP4W agent.
- That's a unique, per instance, value, only assigned to instances
installed upon a Landscape server request, thus won't fit into
agent.yaml.
- It has the same semantics of a machine ID or cloud-init instance ID.
- Thus, we agreed to make the WSL datasource assign
`metadata.instance-id` to that value if a Landscape client config exists
but without the `installation_request_id` field

Cloud-init PR canonical/cloud-init#6200 handles
the changes needed in the datasource itself. Implementation details may
change, but we agreed in the spec that reusing the
`metadata.instance-id` for this purpose just makes sense, so I believe
we can proceed with the agent side in the meanwhile.

---
UDENG-6772
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay @CarlosNihelton thanks for your continued work on WSL features for cloud-init. I've logged a few inline comments to hopefully add to clarity of the code and adhere to best-practices for __init__ and static strings.

I think one comment and question about whether we expect ['landscape']['client'] in all config provided by Pro and whether we should raise a ValueError and explanatory message if we receive a config_dict which doesn't contain such keys.

Comment thread cloudinit/sources/DataSourceWSL.py Outdated

# If we don't have a landscape client installation_request_id
# but have valid instance-id in the metadata, let's reuse it:
self.config_dict["landscape"]["client"][
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we get config_dict without "landscape" or "client" key defined? If it is, we may need to check and set the empty dict to start:

if LANDSCAPE_CFG_KEY not in self.config_dict:
    self.config_dict[LANDSCAPE_CFG_KEY] = {}
if LANDSCAPE_CLIENT_CFG_KEY not in self.config[LANDSCAPE_CFG_KEY][LANDSCAPE_CLIENT_CFG_KEY] :
    self.config_dict[LANDSCAPE_CFG_KEY][LANDSCAPE_CLIENT_CFG] = {}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that we only get a pro token and no landscape config via the pro agent user data. I think your refactoring suggested below fixes this and other issues with the __init__ method, thanks for pointing that out.

Comment thread cloudinit/sources/DataSourceWSL.py Outdated
Comment on lines +164 to +186
if self.config_dict is None:
return
if not instance_id or instance_id == DEFAULT_INSTANCE_ID:
return
try:
if (
self.config_dict["landscape"]["client"].get(
"installation_request_id"
)
is not None
):
return

# If we don't have a landscape client installation_request_id
# but have valid instance-id in the metadata, let's reuse it:
self.config_dict["landscape"]["client"][
"installation_request_id"
] = instance_id
except KeyError:
# client config doesn't exist, don't bother
pass

self.raw = "#cloud-config\n%s" % yaml.dump(self.config_dict).strip()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While generally, I'm onboard with early exit/returns in functions/methods to avoid nested indents, __init__ is a special case where we shouldn't early return.

  1. Adding early empty returns inside __init__ methods are generally considered to be a bad practice as __init__ method is intended to fully initialize instances. Future developers could come into this method and not see such early returns added logic may not be triggered due to an unexpected early exit.

  2. If we use hard-coded strings like "landscape" and "client" we should define global vars for those LANDSCAPE_CFG_KEY and LANDSCAPE_CLIENT_CFG_KEY which better advertizes our config key expectations.

  3. no need to try/except KeyError if we instead dict.get(, {}) defaults to an empty dict when the key doesn't exist. Use self.config_dict.get(LANDSCAPE_CFG_KEY, {}).get("LANDSCAPE_CFG_CLIENT_KEY, {}) instead.

For clarity, let's ensure we scope any initialization operations nested within the conditional truthy value we expect before we initialize.

Maybe something like this instead

Suggested change
if self.config_dict is None:
return
if not instance_id or instance_id == DEFAULT_INSTANCE_ID:
return
try:
if (
self.config_dict["landscape"]["client"].get(
"installation_request_id"
)
is not None
):
return
# If we don't have a landscape client installation_request_id
# but have valid instance-id in the metadata, let's reuse it:
self.config_dict["landscape"]["client"][
"installation_request_id"
] = instance_id
except KeyError:
# client config doesn't exist, don't bother
pass
self.raw = "#cloud-config\n%s" % yaml.dump(self.config_dict).strip()
if "text/cloud-config" == type_from_starts_with(self.raw):
self.config_dict = util.load_yaml(self.raw)
if (
self.config_dict # Valid non-empty config
and instance_id
and instance_id != DEFAULT_INSTANCE_ID # Non-default instance-id
and self.config_dict.get(LANDSCAPE_CFG_KEY, {})
.get(LANDSCAPE_CLIENT_CFG_KEY, {})
.get(LANDSCAPE_INSTALLATION_REQ_ID)
is None # No landscape installation_request_id
):
# We don't have a landscape client installation_request_id
# but have valid instance-id in the metadata, let's reuse it:
self.config_dict["landscape"]["client"][
LANDSCAPE_INSTALLATION_REQ_ID
] = instance_id
self.raw = (
"#cloud-config\n%s" % yaml.dump(self.config_dict).strip()
)

Comment thread cloudinit/sources/DataSourceWSL.py
Comment thread cloudinit/sources/DataSourceWSL.py Outdated
Comment thread cloudinit/sources/DataSourceWSL.py
Comment thread cloudinit/sources/DataSourceWSL.py Outdated
 __init__ is a special case where we shouldn't early return.

    1. Adding early empty returns inside __init__ methods are generally
       considered to be a bad practice as __init__ method is intended to
       fully initialize instances. Future developers could come into
       this method and not see such early returns added logic may not be
       triggered due to an unexpected early exit.

    2. If we use hard-coded strings like "landscape" and "client" we
       should define global vars for those LANDSCAPE_CFG_KEY and
       LANDSCAPE_CLIENT_CFG_KEY which better advertizes our config key
       expectations.

    3. no need to try/except KeyError if we instead dict.get(, {})
       defaults to an empty dict when the key doesn't exist. Use
       self.config_dict.get(LANDSCAPE_CFG_KEY,
       {}).get("LANDSCAPE_CFG_CLIENT_KEY, {}) instead.

For clarity, let's ensure we scope any initialization operations nested
within the conditional truthy value we expect before we initialize.
- load_text_file returns a dict, so should be the default arg.
- we may have other fallbacks for the metadata value, so let's just log
  the file path if we don't find data in it.
No need for the extra local var candidates as we end up looping through
the values twice to construct non None items and then to re-iterate
through those items.

Let's also now report if we found no candidates anywhere.
@cnihelton cnihelton requested a review from blackboxsw May 15, 2025 14:12
Comment on lines +177 to +183
req_id = self.config_dict[LANDSCAPE_CFG_KEY][
LANDSCAPE_CLIENT_CFG_KEY
].setdefault(LANDSCAPE_INSTALLATION_REQ_ID, instance_id)
if req_id == instance_id:
self.raw = (
"#cloud-config\n%s" % yaml.dump(self.config_dict).strip()
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarlosNihelton: In the related spec WS024, it was mentioned that only landscape-client version 2.11 and later support this config key. Prior versions of landscape-client will break when landscape-config is called. Since landscape-client v 2.11 is only available on plucky and later, we may want to prevent writing this landscape::client::installation_request_id if we know we are on a 2.11 version of landscape-client or less. This changeset though may only be able to be performed once we are sure landscape-client is installed in the environment cloudinit.config.cc_landscape.py.

Without this version-aware gate, WSL releases earlier than plucky could generate errors once cc_landscape calls lansdscape-config with an unknown config key.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. While the Landscape team is preparing themselves for SRU'ing that feature, we need to add some version checks to prevent failures. I'm adding a quick-to-revert check with a TODO item for when the necessary support lands on the interesting Ubuntu releases.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CarlosNihelton. Are we going to add that version check to this PR or somewhere else? Given that this changeset will SRU back to Jammy++ as part of 25.2 how do we ensure we don't break WSL-related image streams on Jammy++?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added via 565f8b7

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CarlosNihelton +1 on this approach if there is no ability on the command-line to interact with landscape-client or landscape-config to confirm whether or not the config key installation_request_id is supported.

@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label May 31, 2025
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Jun 4, 2025
@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Jun 19, 2025
 landscape-config doesn't support the installation_request_id
before Ubuntu 25.04 as of Jun-2025, so we cannot use it yet.
That's expected to change in the near future, as the Landscape
 team is SRU'ing that and other features to older releases.

TODO: remove this check when the SRUs are done.
@cnihelton cnihelton requested a review from blackboxsw June 20, 2025 14:09
@github-actions github-actions Bot removed the stale-pr Pull request is stale; will be auto-closed soon label Jun 21, 2025
Comment thread cloudinit/sources/DataSourceWSL.py
or the user cloud-int dir based on the instance name
"""
metadata = {"instance-id": DEFAULT_INSTANCE_ID}
pro_dir = cloud_init_data_dir(user_home / ".ubuntupro")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this actually be the following to align with docs and ds-identify behavior?

Suggested change
pro_dir = cloud_init_data_dir(user_home / ".ubuntupro")
pro_dir = cloud_init_data_dir(user_home / ".ubuntupro/.cloud-init")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joining the cloud-init subdirectory happens inside the cloud_init_data_dir function here.

@blackboxsw
Copy link
Copy Markdown
Collaborator

I think there is an inconsistency in this PR as far as the .ubuntupro/.cloud-init subdirectory where ds-identify and docs expect Pro/Landscape user-data versus what the python datasource is checking now.

@cnihelton cnihelton requested a review from blackboxsw June 24, 2025 18:53
Comment thread cloudinit/sources/DataSourceWSL.py Outdated
if self.distro.name == "ubuntu":
agent_data, user_data = load_ubuntu_pro_data(user_home)
_, version_id, _ = util.get_linux_distro()
if version_id < "25.04":
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This then hardcodes cloud-init to avoid using IID for an extended period of time after landscape-client/landscape-config eventually completes an SRU and someone notices changes it in upstream cloud-init and awaits a quarterly SRU of cloud-init back to the expected Ubuntu series. If there is no other way for us to call out to landscape to determine if this feature exists or is supported, I'm ok with this approach. But if there is a way we can subprocess out to try running landcape-config or landscape-client to perform a preflight check that'd be better and it would decouple cloud-init SRUs from the eventual support in landscape once it SRUs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is always some assumption being hardcoded. For example, we could instead:

  • Hardcode the landscape-client version by checking against the parsed result of apt-cache policy landscape-client or similar (results then may change when apt cache is updated)
  • Hardcode the assumption that the landscape-client is already installed when the datasource runs (currently true as we ship landscape-client by default, but remember that the landscape module is capable of triggering the installation and the packages module may upgrade it) and then check the package version via landscape-config --version or landscape-client --version (which are nicer to parse in the end)

The version comparison needs to know which version started supporting that feature for each Ubuntu LTS, which we don't know for sure at this point, so it's likely that once landscape-client SRUs are done we need to come back to this anyway.

So I'm taking the easiest-to-revert approach.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarlosNihelton I was hoping landscape would have some external indicator that reported whether the config key installation_request_id was supported.. But, since it's undocumented, hard-coding becomes the only option here. Works for me. no need to hardcode specific versions of landscape-client or config because that doesn't truly indicate either whether the support for installation_request_id actually got released there.
many thanks!

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In poking on questing we can use lansdcape-config --help which does actually document installation-request-id which should suffice as a feature check. So instead of if version_id < "25.04": we can do the following:

out, _ = subp.subp(["landscape-config", "--help"])
if "--installation-request-id" in out:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing the boundaries of this. The help does indeed contain the information we need.

In this particular case we're looking only for installation_request_id.
Tests were updated to make sure we don't blow up due unexpected use of
subp.
@cnihelton cnihelton requested a review from blackboxsw July 3, 2025 19:51
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on the back and forth here. I'm happy with the decoupling of cloud-init from landscape-config SRUs. Thanks again @CarlosNihelton !

@blackboxsw blackboxsw merged commit ab2cc28 into canonical:main Jul 3, 2025
22 checks passed
@cnihelton cnihelton deleted the wsl-request-id branch July 4, 2025 12:52
DarkPhily pushed a commit to hetznercloud/cloud-init that referenced this pull request Sep 2, 2025
…ical#6200)

Recently the Landscape server started tracking the status of commands
delivered to clients that run on parent instances, notably the Ubuntu
Pro for WSL agent, by assigning request IDs which the pro agent refers
to when replying with the command completion status. For the case of
the Install command, which results in a new WSL instance being created
on the parent Windows machine and registered with Landscape, the
server also expects the same value to be part of the landscape-client
registration message. For that we need the `installation_request_id`
to be added to the client.conf file. We found that synergic to the concept
of `metadata.instance-id`, thus this refactoring in the WSL datasource aims
to allow:
- the pro agent to write a per-instance metadata file
- the WSL data source to use the `metadata.instance-id` as `installation_request_id`
   - if the landscape.client config exists
   - if doesn't contain that field.

Details were discussed in the specification document
WS042 - Handle Landscape request_id in command Install via cloud-init
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation This Pull Request changes documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants