Skip to content

Add LXD datasource#1040

Merged
TheRealFalcon merged 2 commits into
canonical:mainfrom
blackboxsw:lxd-ds
Nov 1, 2021
Merged

Add LXD datasource#1040
TheRealFalcon merged 2 commits into
canonical:mainfrom
blackboxsw:lxd-ds

Conversation

@blackboxsw
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw commented Sep 29, 2021

Add DataSourceLXD which knows how to talk to the dev-lxd socket to obtain all instance metadata API: https://linuxcontainers.org/lxd/docs/master/dev-lxd.

This first branch is to get deliver feature parity with the existing NoCloud datasource which is currently used to intialize LXC instances on first boot.

Introduce a SocketConnectionPool and LXDSocketAdapter to support performing HTTP GETs on the following routes which are surfaced by the LXD host to all containers:

These 4 routes minimally replace the static content provided in the following nocloud-net seed files:
/var/lib/cloud/nocloud-net/{meta-data,vendor-data,user-data,network-config}

The intent of this branch is to set a foundation for LXD socket communication that will allow us to build network hot-plug features by eventually consuming LXD's websocket upgrade route 1.0/events to react to network, meta-data and user-data config changes over time.

In the event that no custom network-config is provided, default to the same network-config definition provided by LXD to the NoCloud network-config seed file.

version: 1
config:
- type:physical
  name: eth0        # or one of enp5s0, enp0s5 or enc9
        dhcp4: true

Supplemental features above NoCloud datasource:

  • surface all custom instance data config keys via cloud-init query ds which aids in discoverability of features/tags/labels as well as conditional #cloud-config jinja templates operations based on custom config options.
  • TBD: better cloud-init query support for dot-delimited keys `cloud-init query ds[1.0].config.

Proposed Commit Message

Test Steps:

git checkout upstream/ubuntu/bionic debian
dch --newversion 24.0 --distribution bionic
# Add LXD entries via this patch against debian/cloud-init.templates to support dpkg-reconfigure setting LXD as a viable datasource
https://paste.ubuntu.com/p/Cdbz6wRcDf/plain/
patch -p1 < <patch from pastebin> 
git add debian
git commit -am 'local packaging test for LXD'
build-package
sbuild --dist=bionic --arch=amd64 --arch-all ../out/cloud-init*dsc
lxc launch ubuntu-daily:bionic dev-b
lxc file push cloud-init*deb dev-b/
lxc exec dev-b bash
$ apt install /cloud-init*deb
$ sudo dpkg-reconfigure cloud-init # select LXD and NoCloud Datasources
$ sudo cloud-init clean --logs --reboot 
$ sudo cloud-init status --long
$ sudo cloud-init query ds
$ sudo cloud-init query --format "{{platform}} {{subplatform}}"

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@blackboxsw blackboxsw force-pushed the lxd-ds branch 3 times, most recently from 513a745 to a5770a0 Compare September 29, 2021 03:20
@TheRealFalcon
Copy link
Copy Markdown
Contributor

Thanks @blackboxsw . This is looking really good!

Some observations:
If you specify user.meta-data when launching the instance, it appends the data that you've sent to its own metadata, rather than overwriting it. While playing around, I threw in an extra log to log all of the crawled metadata and got this:

2021-09-30 14:43:57,538 - DataSourceLXD.py[DEBUG]: {'meta-data': '#cloud-config\ninstance-id: me\nlocal-hostname: me\ninstance-id: i-87018aed\nlocal-hostname: myhost.internal', '1.0': {'config': {'user.user-data': "#cloud-config\nruncmd:\n - echo 'hi' > /root/hi", 'user.meta-data': 'instance-id: i-87018aed\nlocal-hostname: myhost.internal', 'user.vendor-data': "#cloud-config\nbootcmd:\n - echo 'hi from vendor' > /var/tmp/vendor"}}, 'user-data': "#cloud-config\nruncmd:\n - echo 'hi' > /root/hi", 'vendor-data': "#cloud-config\nbootcmd:\n - echo 'hi from vendor' > /var/tmp/vendor"}

This isn't a problem, but in cloud-init query we only report the user metadata. It might be confusing not seeing the pre-existing metadata that we don't overwrite.

cloud-init query -a includes:

 "ds": {
  "1.0": {
   "config": {
    "user.meta_data": "instance-id: i-87018aed\nlocal-hostname: myhost.internal",
    "user.user_data": "#cloud-config\nruncmd:\n - echo 'hi' > /root/hi",
    "user.vendor_data": "#cloud-config\nbootcmd:\n - echo 'hi from vendor' > /var/tmp/vendor"
   }
  }

regardless of the user. Vendordata and userdata should be redacted from the non-root user.

Regardling the security.devlxd, I'm not sure there's much more we can do. If it's not set, we'll use the LXD datasource. If it is set, we'll use NoCloud. The big problem comes if somebody toggles the flag after the container has already started. I can't really see a nice way of handling this. We have no mechanism to dynamically switch datasources, and I don't think we should. After first boot, we're not likely to see metadata changes anyway, so even if the socket gets shut off, I don't think it's necessarily catastrophic for the instance. I think we just need to document that setting the flag to true will force us to use the NoCloud datasource, and toggling the flag for an already running instance can lead to unexpected results.

@TheRealFalcon
Copy link
Copy Markdown
Contributor

Hmmm, I also just noticed this for each subsequent boot:

2021-09-30 15:13:40,527 - stages.py[DEBUG]: cache invalid in datasource: DataSourceLXD
2021-09-30 15:13:40,527 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: cache invalid in datasource: DataSourceLXD
2021-09-30 15:13:40,527 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance

@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Oct 15, 2021
@blackboxsw blackboxsw force-pushed the lxd-ds branch 4 times, most recently from d60c452 to 915aed4 Compare October 22, 2021 20:31
@blackboxsw blackboxsw removed the stale-pr Pull request is stale; will be auto-closed soon label Oct 23, 2021
Comment thread tests/integration_tests/clouds.py
@blackboxsw blackboxsw force-pushed the lxd-ds branch 3 times, most recently from d051c47 to 0cd95a8 Compare October 25, 2021 21:08
Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! I left some comments, but they're all very minor.

Comment thread tools/ds-identify Outdated
# this has to match the builtin list in cloud-init, it is what will
# be searched if there is no setting found in config.
DI_DSLIST_DEFAULT="MAAS ConfigDrive NoCloud AltCloud Azure Bigstep \
DI_DSLIST_DEFAULT="MAAS ConfigDrive LXD NoCloud AltCloud Azure Bigstep \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So < Jammy, we'll need quilt patches to put LXD after NoCloud?

Copy link
Copy Markdown
Collaborator Author

@blackboxsw blackboxsw Oct 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the order in ds-identify doesn't really matter as it is discovered based on the order configuration file on disk in the datasource_list config option in /etc/cloud/cloud.cfg or /etc/cloud/cloud.cfg.d/90_dpkg.cfg (included in cloud images), or cloudinit/settings.py or kernel_cmdline in the function read_datasource_list

you can see the ds-identify sources this order in /run/cloud-init/ds-identify.log even if LXD is listed before NoCloud in ./tools/ds-identify as I had here... you'll still see datasource_list sourced.

/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ NoCloud, ConfigDrive, LXD, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, UpCloud, VMware, Vultr, None ]
DSLIST=NoCloud ConfigDrive LXD OpenNebula DigitalOcean Azure AltCloud OVF MAAS GCE OpenStack CloudSigma SmartOS Bigstep Scaleway AliYun Ec2 CloudStack Hetzner IBMCloud Oracle Exoscale RbxCloud UpCloud VMware Vultr None
check for 'NoCloud' returned found
check for 'LXD' returned found
Found 2 datasources found=all: NoCloud LXD

That ordering found by ds-identify sets the ordering of python datasources discovery:

root@dev-bb:~# grep LXD /var/log/cloud-init.log 
2021-10-30 02:34:33,682 - __init__.py[DEBUG]: Looking for data source in: ['NoCloud', 'LXD', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM']

All said though, it's best if we keep this ordering aligned. Because in absence of other defaults, eventually we'd fallback to ds-itentify ordering in the DI_DSLIST_DEFAULT.

Comment thread cloudinit/sources/DataSourceLXD.py
Comment thread doc/rtd/topics/datasources/lxd.rst Outdated
Comment thread doc/rtd/topics/datasources/lxd.rst Outdated
Comment thread doc/rtd/topics/datasources/lxd.rst Outdated
Comment thread cloudinit/sources/DataSourceLXD.py Outdated
Comment thread cloudinit/sources/DataSourceLXD.py
Comment thread cloudinit/sources/DataSourceLXD.py Outdated
func=read_metadata)
self.metadata = self._crawled_metadata.get("meta-data", {})
user_metadata = self._crawled_metadata.get(
"config", {}).get("user.meta-data", {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not likely, but if config is null in the metadata definition, then it will be None here, and the .get after it will fail.

Copy link
Copy Markdown
Collaborator Author

@blackboxsw blackboxsw Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, reasonable. better to be defensive and schema warn than to fall over with an ugly trace
Well this was wrong to begin with as read_metadata actually returns the config key nested under the API version as in {"1.0": {"config": {...}}} That said, I've changed this to check of LXD_SOCKET_API_VERSION exists from read_metadata. When it does the "config" key is guaranteed to be a dict.

response = read_metadata(metadata_only=True)
md = response.get("meta-data", {})
if not isinstance(md, dict):
md = util.load_yaml(md)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can return None which would traceback on the following line.


dsname = 'LXD'

_network_config = sources.UNSET
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something we need to change now, but I'd like to get away from setting things to sources.UNSET. Mypy raises hell all over this module because we're setting it to a string here, but the "actual" type after it's been set is a dict...and same with the crawled_metadata line two lines below.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 let's look generally at better mypy cleanup on the 2nd pass here.

@blackboxsw blackboxsw changed the title WIP Add LXD datasource Add LXD datasource Oct 25, 2021
@blackboxsw
Copy link
Copy Markdown
Collaborator Author

Thanks @blackboxsw . This is looking really good!

Some observations: If you specify user.meta-data when launching the instance, it appends the data that you've sent to its own metadata, rather than overwriting it.

Thanks I added a ds["1,0"]["meta-data"] key too so we can see both from cloud-init query. LXD team might change how this is surfaced to LXD datasource mid-cycle so surfacing the raw content we see is probably the best plan forward. If that content changes to cloud-init.(network-config|vendor-data|user-data) keys we will surface those as well.

regardless of the user. Vendordata and userdata should be redacted from the non-root user.
Ahh right. need to make that change still in this PR.

Regardling the security.devlxd, I'm not sure there's much more we can do. If it's not set, we'll use the LXD datasource. If it is set, we'll use NoCloud.
I have documented that devlxd: true config is required. But I think this DS still needs to handle failure case across boot where devlxd is turned off. It still attempts to load the DS from cache and query instance data directly. If the socket file no longer exists (maybe due to a container being sandboxed for security triage or something) we would need to react with a warning in this case.

We have no mechanism to dynamically switch datasources, and I don't think we should. After first boot, we're not likely to see metadata changes anyway, so even if the socket gets shut off, I don't think it's necessarily catastrophic for the instance. I think we just need to document that setting the flag to true will force us to use the NoCloud datasource, and toggling the flag for an already running instance can lead to unexpected results.

+1

Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test comments

Comment thread cloudinit/sources/tests/test_lxd.py Outdated
assert True is lxd_ds._get_data()
assert LXD_V1_METADATA == lxd_ds._crawled_metadata
# network-config is loaded as YAML
if "network-config" in LXD_V1_METADATA:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you planning on parametrizing LXD_V1_METADATA? If not, these if statements seem pointless since it'll only ever be one value.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, yes. I think it's probably worth me parametrizing this test. will tackle tomorrow.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strike that dropping parametrization for this pass. I'll add a followup branch with intrumented VM support.

Comment thread tests/integration_tests/datasources/test_lxd_discovery.py Outdated
@TheRealFalcon
Copy link
Copy Markdown
Contributor

@blackboxsw blackboxsw force-pushed the lxd-ds branch 3 times, most recently from 27d7ddf to 74b1e83 Compare October 30, 2021 03:16
@blackboxsw
Copy link
Copy Markdown
Collaborator Author

Should LXD be added here? https://github.com/canonical/cloud-init/blob/main/cloudinit/settings.py#L21

yes it should and should have the same ordering we want LXD after NoCloud & ConfigDrive.

Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a couple of comments from my previous review that are unaddressed. I think the None checks and pep8 import stuff are minor enough we can do without, but I think we need to fix the wrong link in the docs (or tell me if I'm misunderstanding it).

Comment thread doc/rtd/topics/datasources/lxd.rst Outdated

- user.meta-data: YAML metadata which will be appended to base meta-data
- user.vendor-data: YAML which overrides any meta-data values
:ref:`network_config_v2` format
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see this one?

holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 1, 2021
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 1, 2021
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 1, 2021
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 1, 2021
Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@TheRealFalcon TheRealFalcon merged commit 7737653 into canonical:main Nov 1, 2021
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 1, 2021
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 1, 2021
holmanb added a commit to holmanb/cloud-init that referenced this pull request Nov 1, 2021
TheRealFalcon pushed a commit that referenced this pull request Nov 1, 2021
TheRealFalcon pushed a commit to TheRealFalcon/cloud-init that referenced this pull request Nov 4, 2021
TheRealFalcon pushed a commit to TheRealFalcon/cloud-init that referenced this pull request Nov 4, 2021
TheRealFalcon pushed a commit to TheRealFalcon/cloud-init that referenced this pull request Nov 4, 2021
TheRealFalcon pushed a commit to TheRealFalcon/cloud-init that referenced this pull request Nov 4, 2021
holmanb added a commit to holmanb/cloud-init that referenced this pull request Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants