Skip to content

Cloudstack doc fix#707

Merged
OddBloke merged 3 commits into
canonical:masterfrom
olivierlemasle:cloudstack-doc-fix
Dec 16, 2020
Merged

Cloudstack doc fix#707
OddBloke merged 3 commits into
canonical:masterfrom
olivierlemasle:cloudstack-doc-fix

Conversation

@olivierlemasle
Copy link
Copy Markdown
Contributor

Proposed Commit Message

Fix CloudStack documentation

In CloudStack configuration, datasource_list should be a
top-level object, not nested in the datasource configuration.

Additional Context

See OpenNebula documentation for example.

Test Steps

Manually: datasource list was not detected with configuration copied from online documentation. It is detected when the configuration is changed.

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

(doc only)

Copy link
Copy Markdown
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Olivier, thanks for taking the time to improve our docs! We really appreciate it.

Reading this through, I'm not sure that updating this to match OpenNebula's docs is the right change. Over there, we say:

This example cloud-init configuration (cloud.cfg) enables OpenNebula datasource only in ‘net’ mode.

Whereas here we say:

An example configuration with the default values is provided below:

The default value of datasource_list is not just CloudStack, however; it's a full list of all datasources (https://github.com/canonical/cloud-init/blob/master/cloudinit/settings.py#L21).

What do you think to either rewording the preceding sentence, or dropping these lines entirely (instead of dedenting them)?

@olivierlemasle
Copy link
Copy Markdown
Contributor Author

@OddBloke You're right, it's not the default values.

However, this datasource_list might be required for CloudStack. I'm quite new to cloud-init on CloudStack so I may be wrong, but here is the result of my tests.

Environment:

  • CloudStack, using VMWare hypervisor
  • Fedora 33 VM, with cloud-init from distribution (version 19.4)

1st case: no configuration

No configuration (except what cloud-init's package in Fedora provides). In particular, no datasource_list provided.

When I run a VM, cloud-init does not run, and /run/cloud-init/ds-identify.log contains:

[up 6.68s] ds-identify 
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
no datasource_list found, using default: MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud Oracle Exoscale RbxCloud
DMI_PRODUCT_NAME=VMware Virtual Platform
DMI_SYS_VENDOR=VMware, Inc.
DMI_PRODUCT_SERIAL=VMware-42 1d f7 ea 34 48 5b 62-b8 6c 26 fa db f7 20 12
DMI_PRODUCT_UUID=eaf71d42-4834-625b-b86c-26fadbf72012
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=No Asset Tag
FS_LABELS=
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.9.11-200.fc33.x86_64 root=/dev/mapper/fedora_fedora-root ro rd.lvm.lv=fedora_fedora/root rhgb quiet
VIRT=vmware
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=5.9.11-200.fc33.x86_64
UNAME_KERNEL_VERSION=#1 SMP Tue Nov 24 18:18:01 UTC 2020
UNAME_MACHINE=x86_64
UNAME_NODENAME=localhost.localdomain
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud Oracle Exoscale RbxCloud
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
pid=586 ppid=562
is_container=false
is_ds_enabled(IBMCloud) = true.
ec2 platform is 'Unknown'.
is_ds_enabled(IBMCloud) = true.
Running on vmware but rpctool query returned 1: No value found
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 6.90s] returning 1

Running sudo cloud-init init on that VM gives:

Cloud-init v. 19.4 running 'init' at Thu, 03 Dec 2020 19:50:51 +0000. Up 128.62 seconds.
ci-info: +++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++
 ... (removed) ...
2020-12-03 19:50:51,943 - util.py[WARNING]: Failed accessing user data.
2020-12-03 19:51:44,102 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [50/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f6d63bd0a30>, 'Connection to 169.254.169.254 timed out. (connect timeout=50.0)'))]
2020-12-03 19:52:35,154 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [101/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f6d63bd0eb0>, 'Connection to 169.254.169.254 timed out. (connect timeout=50.0)'))]
2020-12-03 19:52:53,165 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [119/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f6d63bf18b0>, 'Connection to 169.254.169.254 timed out. (connect timeout=17.0)'))]
2020-12-03 19:52:54,167 - DataSourceEc2.py[CRITICAL]: Giving up on md from ['http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 120 seconds
2020-12-03 19:52:54,250 - util.py[WARNING]: Failed to fetch password from virtual router 10.0.26.1
2020-12-03 19:52:54,692 - warnings.py[WARNING]: **************************************************************************
# A new feature in cloud-init identified possible datasources for        #
# this system as:                                                        #
#   []                                                                   #
# However, the datasource used was: CloudStack                           #
#                                                                        #
# In the future, cloud-init will only attempt to use datasources that    #
# are identified or specifically configured.                             #
# For more information see                                               #
#   https://bugs.launchpad.net/bugs/1669675                              #
#                                                                        #
# If you are seeing this message, please file a bug against              #
# cloud-init at                                                          #
#    https://bugs.launchpad.net/cloud-init/+filebug?field.tags=dsid      #
# Make sure to include the cloud provider your instance is               #
# running on.                                                            #
#                                                                        #
# After you have filed a bug, you can disable this warning by launching  #
# your instance with the cloud-config below, or putting that content     #
# into /etc/cloud/cloud.cfg.d/99-warnings.cfg                            #
#                                                                        #
# #cloud-config                                                          #
# warnings:                                                              #
#   dsid_missing_source: off                                             #
**************************************************************************
 ... (removed) ...

2nd case: with datasource_list configuration

I create a second template with a file /etc/cloud/cloud.cfg.d/10_cloudstack.cfg:

datasource_list: [ CloudStack, None]

When starting a new VM from that template, cloud-init works perfectly (creates user with ssh key pair provided by CloudStack).

/run/cloud-init/ds-identify.log contains:

[up 6.74s] ds-identify 
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
/etc/cloud/cloud.cfg.d/10_cloudstack.cfg set datasource_list: [ CloudStack, None]
DMI_PRODUCT_NAME=VMware Virtual Platform
DMI_SYS_VENDOR=VMware, Inc.
DMI_PRODUCT_SERIAL=VMware-42 1d e3 46 83 96 3b 79-6a bb c7 84 9a 52 4f 26
DMI_PRODUCT_UUID=46e31d42-9683-793b-6abb-c7849a524f26
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=No Asset Tag
FS_LABELS=
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.9.11-200.fc33.x86_64 root=/dev/mapper/fedora_fedora-root ro rd.lvm.lv=fedora_fedora/root rhgb quiet
VIRT=vmware
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=5.9.11-200.fc33.x86_64
UNAME_KERNEL_VERSION=#1 SMP Tue Nov 24 18:18:01 UTC 2020
UNAME_MACHINE=x86_64
UNAME_NODENAME=localhost.localdomain
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=CloudStack None
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
pid=587 ppid=563
is_container=false
single entry in datasource_list (CloudStack None) use that.
[up 6.98s] returning 0

3rd case

This is quite strange but I reproduced it multiple times: if I use

datasource_list:
  - CloudStack
  - None

instead of datasource_list: [ CloudStack, None] like in case 2, the datasource list is not recognized, and everything is like in case 1. This is also the case if I omit None, like in the configuration in this PR (when I created the PR, I had actually assumed that both syntaxes where equivalent). Is this a bug or did I wrongly assume that the configuration is in YAML?

@OddBloke
Copy link
Copy Markdown
Collaborator

OddBloke commented Dec 3, 2020

1st case: no configuration

DMI_PRODUCT_NAME=VMware Virtual Platform

This is where we're tripping up. ds-identify uses, amongst other things, DMI_PRODUCT_NAME to detect whether or not we're running on a given cloud without requiring networking (because this determination happens at systemd generator time). Specifically, this is the only thing that will positively identify CloudStack:

dscheck_CloudStack() {
is_container && return ${DS_NOT_FOUND}
dmi_product_name_matches "CloudStack*" && return $DS_FOUND
return $DS_NOT_FOUND
}

So the fact that your platform is identifying itself as "VMWare Virtual Platform" means that we don't detect it as CloudStack.

ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled

These lines indicate that if we don't find a datasource at all, then cloud-init should be disabled.

Running on vmware but rpctool query returned 1: No value found
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 6.90s] returning 1

And we can see that ds-identify also identifies that the OVF datasource (used for VMWare) is not applicable: there are no applicable datasources, so cloud-init disables itself.

Running sudo cloud-init init on that VM gives:

What you're seeing here is, in the absence of the output of ds-identify which indicates which exact datasource to use, cloud-init attempting every datasource in order, until one works. As we now have networking, we don't have to use DMI, so we do detect CloudStack correctly (and then prompt you to file a bug for the mismatch between ds-identify behaviour and cloud-init behaviour).

2nd case: with datasource_list configuration

single entry in datasource_list (CloudStack None) use that.

This is the key: as you configured cloud-init with only one datasource, ds-identify trusts you so does not perform its own detection:

cloud-init/tools/ds-identify

Lines 1655 to 1660 in 6c4e87b

# if there is only a single entry in $DI_DSLIST
if [ $# -eq 1 ] || [ $# -eq 2 -a "$2" = "None" ] ; then
debug 1 "single entry in datasource_list ($DI_DSLIST) use that."
found "$@"
return
fi

3rd case

This is quite strange but I reproduced it multiple times: if I use

datasource_list:
  - CloudStack
  - None

instead of datasource_list: [ CloudStack, None] like in case 2, the datasource list is not recognized, and everything is like in case 1. This is also the case if I omit None, like in the configuration in this PR (when I created the PR, I had actually assumed that both syntaxes where equivalent). Is this a bug or did I wrongly assume that the configuration is in YAML?

The configuration is YAML, but ds-identify is implemented in shell (for a variety of reasons), and so only supports the array spelling: see https://github.com/canonical/cloud-init/blob/master/tools/ds-identify#L568 for the implementation. So this is both expected behaviour and quite strange. :)

@OddBloke
Copy link
Copy Markdown
Collaborator

OddBloke commented Dec 3, 2020

So, to summarise: if you can configure VMWare to present a different DMI product name, then that will address the issue you're seeing. Otherwise, you'll need to instruct cloud-init on which datasource to use: you can write a file as you've found, or provide ci.ds=... on the kernel command line.

@olivierlemasle
Copy link
Copy Markdown
Contributor Author

Thank you very much for your very detailed and intelligible explanation! 👍

The configuration is YAML, but ds-identify is implemented in shell (for a variety of reasons), and so only supports the array spelling: see https://github.com/canonical/cloud-init/blob/master/tools/ds-identify#L568 for the implementation. So this is both expected behaviour and quite strange. :)

So CloudStack documentation here was twice wrong:

  • datasource_list is a top-level key
  • AND this list syntax is not supported 😄

I agree that it may be best to simply drop these lines in documentation.

In CloudStack configuration, datasource_list should be
a top-level object, not nested in the datasource
configuration. We remove it from here because it is
not a default value.
Copy link
Copy Markdown
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@OddBloke OddBloke merged commit e5f7459 into canonical:master Dec 16, 2020
@olivierlemasle olivierlemasle deleted the cloudstack-doc-fix branch December 16, 2020 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants