Skip to content

Azure: Support for VMs without ephemeral resource disks#716

Closed
johnsonshi wants to merge 3 commits into
canonical:masterfrom
johnsonshi:decrease-azure-ephemeral-resource-disk-wait-time
Closed

Azure: Support for VMs without ephemeral resource disks#716
johnsonshi wants to merge 3 commits into
canonical:masterfrom
johnsonshi:decrease-azure-ephemeral-resource-disk-wait-time

Conversation

@johnsonshi
Copy link
Copy Markdown
Contributor

@johnsonshi johnsonshi commented Dec 8, 2020

Proposed Commit Message

Azure: Support for VMs without ephemeral resource disks.

Changes:

  • Only merge in default Azure cloud ephemeral disk configs
    during DataSourceAzure._get_data() if the ephemeral disk
    exists.
  • DataSourceAzure.address_ephemeral_resize() (which is
    invoked in DataSourceAzure.activate() should only set up
    the ephemeral disk if the disk exists.

Azure VMs may or may not come with ephemeral resource disks
depending on the VM SKU. For VM SKUs that come with
ephemeral resource disks, the Azure platform guarantees that
the ephemeral resource disk is attached to the VM before
the VM is booted. For VM SKUs that do not come with
ephemeral resource disks, cloud-init currently attempts
to wait and set up a non-existent ephemeral resource
disk, which wastes boot time. It also causes disk setup
modules to fail (due to non-existent references to the
ephemeral resource disk).

udevadm settle is invoked by cloud-init very early in boot.
udevadm settle is invoked very early, before
DataSourceAzure's _get_data() and activate() methods.

Within DataSourceAzure's _get_data() and activate() methods,
the ephemeral resource disk path should exist if the
VM SKU comes with an ephemeral resource disk.
The ephemeral resource disk path should not exist if the
VM SKU does not come with an ephemeral resource disk.

LP: #1901011

Additional Context

Problem

For Azure VMs, cloud-init's DataSourceAzure.py formats and addresses ephemeral disk resizing. It does this for all Azure VM SKUs. See code and code.

The code right now waits up to 120 seconds for the ephemeral disk to appear before either proceeding or giving up. It waits up to 120 secs for the symlink /dev/disk/cloud/azure_resource to appear.

For new Azure VM SKUs (such as Dv4, Dsv4, Ev4, Esv4) that do not come with ephemeral resource disks, cloud-init would wait up to 120 seconds before giving up. See LP: #1901011.

For these new Azure VM SKUs without ephemeral resource disks, the disk_setup module would also fail later in cloud-init because "builtin Azure ephemeral disk configs" are merged into DataSourceAzure metadata. These builtin configs reference the non-existent ephemeral disk, which causes the module to fail.

Why this approach was chosen

As of today, the Azure Instance Metadata Service (Azure IMDS) does not expose VM instance metadata indicating whether an ephemeral resource disk exists for the VM or not.

The Azure host also guarantees that the ephemeral resource disk is attached to the VM before it is booted during VM deployment.

Additionally, the ephemeral resource disk symlink (/dev/disk/cloud/azure_resource) that cloud-init waits for is actually created by a udev rule that comes with cloud-init. Additional relevant code, code, and code.

Because:

  • the Azure host guarantees that resource disks are attached for VM SKUs that have resource disks,
  • because the symlinks are created by udev rules (created as soon as the kernel detects the disk and sends the event to udev),
  • and because udevadm settle is invoked very early in boot by cloud-init (before DataSourceAzure runs),
    it is guaranteed that the ephemeral resource disk symlink exists by the time DataSourceAzure runs.

Test Steps

No regression for VM SKUs with ephemeral resource disk

  • Created a custom image with this branch's cloud-init installed.
  • Deployed a Standard_DS1_V2 VM (has ephemeral resource disk) from this custom image.
  • Cloud-init logs here: https://paste.ubuntu.com/p/qdxkfwYKRD/
2021-02-03 23:23:31,913 - __init__.py[DEBUG]: Found unstable nic names: ['eth0']; calling udevadm settle
2021-02-03 23:23:31,914 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2021-02-03 23:23:31,933 - util.py[DEBUG]: Waiting for udev events to settle took 0.019 seconds
...
2021-02-03 23:23:32,823 - handlers.py[DEBUG]: finish: azure-ds/crawl_metadata: SUCCESS: crawl_metadata
2021-02-03 23:23:32,823 - util.py[DEBUG]: Crawl of metadata service took 1.117 seconds
...
2021-02-03 23:23:32,824 - azure.py[DEBUG]: Ephemeral resource disk '/dev/disk/cloud/azure_resource' exists. Merging default Azure cloud ephemeral disk configs.
...
2021-02-03 23:23:38,563 - handlers.py[DEBUG]: start: azure-ds/activate: activate
2021-02-03 23:23:38,563 - handlers.py[DEBUG]: start: azure-ds/address_ephemeral_resize: address_ephemeral_resize
2021-02-03 23:23:38,564 - azure.py[DEBUG]: Ephemeral resource disk '/dev/disk/cloud/azure_resource' exists.
...
2021-02-03 23:23:38,894 - handlers.py[DEBUG]: finish: azure-ds/address_ephemeral_resize: SUCCESS: address_ephemeral_resize

Fix for VM SKUs without ephemeral resource disk

  • Created a custom image with this branch's cloud-init installed.
  • Deployed a Standard_D2s_v4 VM (no ephemeral resource disk) from this custom image.
  • Cloud-init logs here: https://paste.ubuntu.com/p/GmCqBNTQrG/
2021-02-03 23:20:53,217 - __init__.py[DEBUG]: Found unstable nic names: ['eth0']; calling udevadm settle
2021-02-03 23:20:53,217 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2021-02-03 23:20:53,235 - util.py[DEBUG]: Waiting for udev events to settle took 0.018 seconds
...
2021-02-03 23:20:54,388 - handlers.py[DEBUG]: finish: azure-ds/crawl_metadata: SUCCESS: crawl_metadata
2021-02-03 23:20:54,389 - util.py[DEBUG]: Crawl of metadata service took 1.363 seconds
...
2021-02-03 23:20:54,389 - azure.py[DEBUG]: Ephemeral resource disk '/dev/disk/cloud/azure_resource' does not exist. Not merging default Azure cloud ephemeral disk configs.
...
2021-02-03 23:20:59,846 - handlers.py[DEBUG]: start: azure-ds/activate: activate
2021-02-03 23:20:59,846 - handlers.py[DEBUG]: start: azure-ds/address_ephemeral_resize: address_ephemeral_resize
2021-02-03 23:20:59,847 - azure.py[DEBUG]: Ephemeral resource disk '/dev/disk/cloud/azure_resource' does not exist.
2021-02-03 23:20:59,847 - handlers.py[DEBUG]: finish: azure-ds/address_ephemeral_resize: SUCCESS: address_ephemeral_resize

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@johnsonshi johnsonshi changed the title Decrease Azure ephemeral disk wait time Azure: Support for VMs without ephemeral resource disks Dec 8, 2020

@azure_ds_telemetry_reporter
def address_ephemeral_resize(devpath=RESOURCE_DISK_PATH, maxwait=120,
def address_ephemeral_resize(devpath=RESOURCE_DISK_PATH, maxwait=5,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to know whether a VM is one that does not have ephemeral disks?

You mention specific types:

"Dv4, Dsv4, Ev4, Esv4"

Is the instance-type availabe in metadata? If that's available, then one could look up the maxwait value based on instance-type.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to know whether a VM is one that does not have ephemeral disks?

You mention specific types:

"Dv4, Dsv4, Ev4, Esv4"

Is the instance-type availabe in metadata? If that's available, then one could look up the maxwait value based on instance-type.

And if there is not a way to know ... can you fix the platform?

Copy link
Copy Markdown
Contributor Author

@johnsonshi johnsonshi Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, the Azure Instance Metadata Service (Azure IMDS) does not expose VM instance metadata indicating whether an ephemeral resource disk exists for the VM or not.

Unfortunately, IMDS support for exposing ephemeral resource disk presence/absence won't be around for quite some time/in the next few months. In the intervening time, Linux VMs deployed without the disks are suffering from a 2-minute delay with cloud-init.

I opened this draft PR ahead of time to communicate our plans:

  1. Decrease wait time for ephemeral disk.
    ** Optional: If the ephemeral disk doesn't come up, then delete the built-in Azure DS cloud-config that references setting up the ephemeral disk. This prevents disk_setup and fs_setup from throwing RuntimeErrors due to referencing non-existent ephemeral disks.
  2. Once IMDS supports exposing this info, we'll have a more graceful approach.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, the Azure Instance Metadata Service (Azure IMDS) does not expose VM instance metadata indicating whether an ephemeral resource disk exists for the VM or not.

Unfortunately, IMDS support for exposing ephemeral resource disk presence/absence won't be around for quite some time/in the next few months. In the intervening time, Linux VMs deployed without the disks are suffering from a 2-minute delay with cloud-init.

I was asking if the instance type is available; IIUC, there are new instance types which are without the ephemeral disk;
The instance metadata has[1]:

"vmSize": "Standard_A3"

Can't we set the timeout value low if the vmSize is of the types without the disk?

https://docs.microsoft.com/en-us/azure/virtual-machines/linux/instance-metadata-service

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't take a dependency on the list of VM Sizes since the list of VM Sizes without ephemeral disks will grow as time passes. This design also won't be resilient if Azure ever exposes an option (to users) to deploy VMs with/without ephemeral disks.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't IMDS indicate whether one is attached or not in the metadata? I understand that's not directly under your control. But the platform is in the position where it knows whether one was attached or not and it should expose that to the instance such that cloud-init can Do The Right Thing(tm).

If we drop the timeout to something lower there are a class of users with Ephemeral disk which will get errors in the log about not waiting long enough for the disk to show up. Keeping it as it is means new instance-types without them have this long timeout but that does not regress other instance types. Including the vmSize check in cloud-init seems like a reasonable compromise, and it can be updated in cloud-init.

Looking at the function address_ephemeral_resize; in the case where we don't wait long enough there are some paths that will break, for example this:

https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1611074

Not waiting would mean cloud-init would fail to resize/reformat the ephemeral disk

Would it be reasonable to pre-populate those instances with user-data (ideally this is the case for vendor-data) via the UI or a template in cli that'd indicate no ephemeral disk is attached? This could be useful on instances which do have ephemeral disk but users don't want to it configured during firstboot (save boot time w.r.t partition, format,).

#cloud-config 
datasource:
   Azure:
      ephemeral_disk:
         enabled: true|false

user-data is processed before cloud-init/cmd/main.py calls the ds.activate() method which is what triggers the ephemeral resize. Then DatasourceAzure.activate() can skip the call to address_ephemeral_resize() altogether.

Copy link
Copy Markdown
Contributor Author

@johnsonshi johnsonshi Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raharper//@anhvoms I definitely agree with you that IMDS/the platform should expose information on whether there is an ephemeral resource disk attached or not. Unfortunately, this platform-side change can't happen and be rolled out until >6 months from now. This means that for >half a year until platform changes are made, Linux VMs deployed on Azure will regress performance by 2 minutes.

There are reasons why decreasing the wait time for ephemeral disks won't regress current existing VM SKUs:

  1. For VM SKUs with ephemeral disks, the Azure platform guarantees that the ephemeral disk is attached before a VM is booted.

  2. In the past few years, we've never seen an instance where cloud-init had to wait for ephemeral disks to come up, as the disk was already attached (guaranteed by the platform) and the disk symlink was already created (created by udev rules).

  3. As mentioned in my PR description above, I've performed deployment tests across a variety of Linux images on Azure. The intent was to test whether any distros or images had delays in creating the disk symlink.

grep "Azure ephemeral disk: All files appeared after" /var/log/cloud-init.log
util.py[DEBUG]: Azure ephemeral disk: All files appeared after 0 seconds: ['/dev/disk/cloud/azure_resource']
The statement above is true and tested for the following images:
** RedHat:RHEL:8.2:latest
** SUSE:sles-15-sp2:gen2:latest
** Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest
** Canonical:0001-com-ubuntu-server-focal:20_04-lts-gen2:latest
** Canonical:UbuntuServer:18.04-LTS:latest
** Canonical:UbuntuServer:18_04-lts-gen2:latest
** RedHat:RHEL:7-LVM:7.9.2020111202
** RedHat:RHEL:7lvm-gen2:7.9.2020111205
** RedHat:RHEL:7.8:7.8.2020111309
** RedHat:RHEL:79-gen2:7.9.2020111302
** RedHat:RHEL:7_9:7.9.2020111301
** RedHat:RHEL:8-LVM:8.3.2020111909
** RedHat:RHEL:8-lvm-gen2:8.3.2020111910

  1. Because (1) platform guarantees disk to be attached before booting and (2) udev rules are loaded very early in the boot stage (from my understanding, as soon as systemd-udevd is loaded, which is very early in boot and way before cloud-init is loaded by systemd), there will be almost 0 chance for this change to regress existing VMs.

Ultimately, this change (1) prevents a 2-minute performance delay for all Linux VMs without ephemeral disks on Azure (which is a pretty significant perf penalty for a wide range of users) and (2) does not really regress something that hasn't happened before/has a very low chance of ever happening (very small/non-existent users affected).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. For VM SKUs with ephemeral disks, the Azure platform guarantees that the ephemeral disk is attached before a VM is booted.

I apologize for not reading your preamble in this PR more closely. If this is what the platform is guaranteeing then it seems reasonable to remove the timeout completely. If the disk is not there waiting 5 seconds isn't going to make it show up (rather something more invasive like running a blkid command to probe the storage layer would likely be needed).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries and thanks @raharper! This info definitely needs to be exposed through IMDS (as a 5-second wait is still a significant penalty especially for mass VM scale outs). After platform support is there (in a few months), we can programmatically decide whether to even wait for a disk or not.

@OddBloke
Copy link
Copy Markdown
Collaborator

OddBloke commented Jan 4, 2021

This looks to me like it ended up in a place where we can land it. Do folks agree?

@raharper
Copy link
Copy Markdown
Collaborator

raharper commented Jan 5, 2021

This looks to me like it ended up in a place where we can land it. Do folks agree?

I'm +1 on this PR.

@johnsonshi
Copy link
Copy Markdown
Contributor Author

This looks to me like it ended up in a place where we can land it. Do folks agree?

Right now, all Azure ephemeral-disk-less VMs face a few issues:

  1. Waiting 120 seconds for an ephemeral disk that will never appear. This PR reduces the wait to a few secs only. As discussed above, this approach won't cause regressions. For VM SKUs that have ephemeral disks, the Azure platform guarantees ephemeral disks to be attached to the VM before it is started.
  2. The Azure default datasource config references the ephemeral disk (see code). The default configs are merged in _get_data() (see code). (which is the location when we wait for the disk and know if the disk appears or not).
  3. By the time cloud-init reaches activate() -> address_ephemeral_resize(), the default data, metadata, and userdata would have already been merged. Take note _get_data() happens before activate() -> address_ephemeral_resize().
  4. Due to current IMDS limitations, we can only know that the VM doesn't have an ephemeral disk when it gives up waiting for the disk in address_ephemeral_resize()... after various data has already been merged.
  5. When cloud-init later reaches cc_disk_setup, the disk_setup module fails due to referencing the non-existent ephemeral disk (caused by the default datasource config).

I'm restricting the scope of this PR to fixing (1) only. The other issue (not merging the default datasource config referencing the ephemeral disk) cannot be cleanly fixed as of now, so I'll be fixing it later.

  • Rationale: the only fix that is possible right now (without IMDS ephemeral disk info) is to modify the already-merged datasource once we reach address_ephemeral_resize() (after cloud-init realizes the disk isn't there).
  • Deleting YAML references to the ephemeral disk after the data (including userdata) have been merged isn't a clean solution.

Alternatively, we can wait for the ephemeral disk for 1-5 secs within _get_data() before merging the default config. That way, we'll now if cloud-init needs to exclude the ephemeral disk default configs. @OddBloke // @raharper // @anhvoms Thoughts?

In the future, when Azure IMDS presents information on the absence/presence of the ephemeral disk, the plan is:

  • Within _get_data(), before the various datasources are merged, see if the IMDS metadata says we have an ephemeral disk or not. Depending on the info, include/do not include the ephemeral disk configs when the data is merged.

@lutzwillek-tomtom
Copy link
Copy Markdown

We asked Microsoft (Azure support case 120112525000371) if they aware that the Azure Instance Metadata Service does not expose any information whether an ephemeral resource disk for the VM is available or not, and the problems caused by this if it comes to the new SKU types without ephemeral disk. Further we asked if there are any plans to change IMDS behave in future. We got the answer that there are no plans or roadmaps in place to change the current behave. Based on this reply, my expectation would be that Azure IMDS will not present any information on the absence/presence of the ephemeral disk in the foreseeable future.
Therefore I would like to suggest, even if it is unclean, to consider adding a delay of up to 5 seconds before the default config is merged in future.

However, the biggest operational issue at the moment is indeed the 120 second boot delay, caused by waiting for a disk that never appears. So the proposal of @johnsonshi to limit the scope of this PR to fix (1) only seems perfectly fine.

@smoser
Copy link
Copy Markdown
Collaborator

smoser commented Jan 5, 2021

I think the 5 second wait is just silly. Either drop it or leave it as is.

got the answer that there are no plans or roadmaps in place to change the current behave. Based on this reply, my expectation would be that Azure IMDS will not present any information on the absence/presence of the ephemeral disk in the foreseeable future.
Therefore I would like to suggest, even if it is unclean, to consider adding a delay of up to 5 seconds before the default config is merged in future.

I really don't understand how you got from "IMDS won't present any information" to "lets wait 5 seconds".

I see the following 3 options:

  1. leave the 120 second wait. This is 100% backwards compatible. New types without the disk have 120 second wait and WARN in the logs.
  2. change to 5 second wait. This is some middle option, which does:
    • delay new types by 5 seconds.
    • changes failure hit on old types from a WARN to a DEBUG (when disk wasn't there for some reason)
  3. drop the wait entirely.
    • new types will not have delay
    • As you've said, old types should not ever have a problem

I'd change option 3 to do a udevadm settle if the expected link wasn't present and then check again and then go on, but I'd be fine to drop it entirely and just go on.

Basically I don't buy the argument for 5 second delay... If that delay fixes 90% (or 99% or 99.9%) of the cases, all you did was make them harder to reproduce and find a real solution than option 3. The half-way solution just doesn't really help.

@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed, please do feel free to reopen it if you wish to continue working on it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Jan 20, 2021
@raharper
Copy link
Copy Markdown
Collaborator

raharper commented Jan 20, 2021

Alternatively, we can wait for the ephemeral disk for 1-5 secs within _get_data() before merging the default config. That way, we'll now if cloud-init needs to exclude the ephemeral disk default configs. @OddBloke // @raharper // @anhvoms Thoughts?

I don't think waiting helps anything; either it's there or it isn't and 5 seconds won't matter. Drop the wait entirely.

@johnsonshi I suggest rebasing this on master, dropping the wait entirely and then mark this ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale-pr Pull request is stale; will be auto-closed soon

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants