Azure: Support for VMs without ephemeral resource disks#716
Conversation
|
|
||
| @azure_ds_telemetry_reporter | ||
| def address_ephemeral_resize(devpath=RESOURCE_DISK_PATH, maxwait=120, | ||
| def address_ephemeral_resize(devpath=RESOURCE_DISK_PATH, maxwait=5, |
There was a problem hiding this comment.
Is there no way to know whether a VM is one that does not have ephemeral disks?
You mention specific types:
"Dv4, Dsv4, Ev4, Esv4"
Is the instance-type availabe in metadata? If that's available, then one could look up the maxwait value based on instance-type.
There was a problem hiding this comment.
Is there no way to know whether a VM is one that does not have ephemeral disks?
You mention specific types:
"Dv4, Dsv4, Ev4, Esv4"
Is the instance-type availabe in metadata? If that's available, then one could look up the maxwait value based on instance-type.
And if there is not a way to know ... can you fix the platform?
There was a problem hiding this comment.
As of today, the Azure Instance Metadata Service (Azure IMDS) does not expose VM instance metadata indicating whether an ephemeral resource disk exists for the VM or not.
Unfortunately, IMDS support for exposing ephemeral resource disk presence/absence won't be around for quite some time/in the next few months. In the intervening time, Linux VMs deployed without the disks are suffering from a 2-minute delay with cloud-init.
I opened this draft PR ahead of time to communicate our plans:
- Decrease wait time for ephemeral disk.
** Optional: If the ephemeral disk doesn't come up, then delete the built-in Azure DS cloud-config that references setting up the ephemeral disk. This preventsdisk_setupandfs_setupfrom throwing RuntimeErrors due to referencing non-existent ephemeral disks. - Once IMDS supports exposing this info, we'll have a more graceful approach.
There was a problem hiding this comment.
As of today, the Azure Instance Metadata Service (Azure IMDS) does not expose VM instance metadata indicating whether an ephemeral resource disk exists for the VM or not.
Unfortunately, IMDS support for exposing ephemeral resource disk presence/absence won't be around for quite some time/in the next few months. In the intervening time, Linux VMs deployed without the disks are suffering from a 2-minute delay with cloud-init.
I was asking if the instance type is available; IIUC, there are new instance types which are without the ephemeral disk;
The instance metadata has[1]:
"vmSize": "Standard_A3"
Can't we set the timeout value low if the vmSize is of the types without the disk?
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/instance-metadata-service
There was a problem hiding this comment.
We can't take a dependency on the list of VM Sizes since the list of VM Sizes without ephemeral disks will grow as time passes. This design also won't be resilient if Azure ever exposes an option (to users) to deploy VMs with/without ephemeral disks.
There was a problem hiding this comment.
Shouldn't IMDS indicate whether one is attached or not in the metadata? I understand that's not directly under your control. But the platform is in the position where it knows whether one was attached or not and it should expose that to the instance such that cloud-init can Do The Right Thing(tm).
If we drop the timeout to something lower there are a class of users with Ephemeral disk which will get errors in the log about not waiting long enough for the disk to show up. Keeping it as it is means new instance-types without them have this long timeout but that does not regress other instance types. Including the vmSize check in cloud-init seems like a reasonable compromise, and it can be updated in cloud-init.
Looking at the function address_ephemeral_resize; in the case where we don't wait long enough there are some paths that will break, for example this:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1611074
Not waiting would mean cloud-init would fail to resize/reformat the ephemeral disk
Would it be reasonable to pre-populate those instances with user-data (ideally this is the case for vendor-data) via the UI or a template in cli that'd indicate no ephemeral disk is attached? This could be useful on instances which do have ephemeral disk but users don't want to it configured during firstboot (save boot time w.r.t partition, format,).
#cloud-config
datasource:
Azure:
ephemeral_disk:
enabled: true|false
user-data is processed before cloud-init/cmd/main.py calls the ds.activate() method which is what triggers the ephemeral resize. Then DatasourceAzure.activate() can skip the call to address_ephemeral_resize() altogether.
There was a problem hiding this comment.
@raharper//@anhvoms I definitely agree with you that IMDS/the platform should expose information on whether there is an ephemeral resource disk attached or not. Unfortunately, this platform-side change can't happen and be rolled out until >6 months from now. This means that for >half a year until platform changes are made, Linux VMs deployed on Azure will regress performance by 2 minutes.
There are reasons why decreasing the wait time for ephemeral disks won't regress current existing VM SKUs:
-
For VM SKUs with ephemeral disks, the Azure platform guarantees that the ephemeral disk is attached before a VM is booted.
-
In the past few years, we've never seen an instance where cloud-init had to wait for ephemeral disks to come up, as the disk was already attached (guaranteed by the platform) and the disk symlink was already created (created by udev rules).
-
As mentioned in my PR description above, I've performed deployment tests across a variety of Linux images on Azure. The intent was to test whether any distros or images had delays in creating the disk symlink.
grep "Azure ephemeral disk: All files appeared after" /var/log/cloud-init.log
util.py[DEBUG]: Azure ephemeral disk: All files appeared after 0 seconds: ['/dev/disk/cloud/azure_resource']
The statement above is true and tested for the following images:
** RedHat:RHEL:8.2:latest
** SUSE:sles-15-sp2:gen2:latest
** Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest
** Canonical:0001-com-ubuntu-server-focal:20_04-lts-gen2:latest
** Canonical:UbuntuServer:18.04-LTS:latest
** Canonical:UbuntuServer:18_04-lts-gen2:latest
** RedHat:RHEL:7-LVM:7.9.2020111202
** RedHat:RHEL:7lvm-gen2:7.9.2020111205
** RedHat:RHEL:7.8:7.8.2020111309
** RedHat:RHEL:79-gen2:7.9.2020111302
** RedHat:RHEL:7_9:7.9.2020111301
** RedHat:RHEL:8-LVM:8.3.2020111909
** RedHat:RHEL:8-lvm-gen2:8.3.2020111910
- Because (1) platform guarantees disk to be attached before booting and (2) udev rules are loaded very early in the boot stage (from my understanding, as soon as
systemd-udevdis loaded, which is very early in boot and way beforecloud-initis loaded bysystemd), there will be almost 0 chance for this change to regress existing VMs.
Ultimately, this change (1) prevents a 2-minute performance delay for all Linux VMs without ephemeral disks on Azure (which is a pretty significant perf penalty for a wide range of users) and (2) does not really regress something that hasn't happened before/has a very low chance of ever happening (very small/non-existent users affected).
There was a problem hiding this comment.
1. For VM SKUs with ephemeral disks, the Azure platform guarantees that the ephemeral disk is attached before a VM is booted.
I apologize for not reading your preamble in this PR more closely. If this is what the platform is guaranteeing then it seems reasonable to remove the timeout completely. If the disk is not there waiting 5 seconds isn't going to make it show up (rather something more invasive like running a blkid command to probe the storage layer would likely be needed).
There was a problem hiding this comment.
No worries and thanks @raharper! This info definitely needs to be exposed through IMDS (as a 5-second wait is still a significant penalty especially for mass VM scale outs). After platform support is there (in a few months), we can programmatically decide whether to even wait for a disk or not.
|
This looks to me like it ended up in a place where we can land it. Do folks agree? |
I'm +1 on this PR. |
Right now, all Azure ephemeral-disk-less VMs face a few issues:
I'm restricting the scope of this PR to fixing (1) only. The other issue (not merging the default datasource config referencing the ephemeral disk) cannot be cleanly fixed as of now, so I'll be fixing it later.
Alternatively, we can wait for the ephemeral disk for 1-5 secs within In the future, when Azure IMDS presents information on the absence/presence of the ephemeral disk, the plan is:
|
|
We asked Microsoft (Azure support case 120112525000371) if they aware that the Azure Instance Metadata Service does not expose any information whether an ephemeral resource disk for the VM is available or not, and the problems caused by this if it comes to the new SKU types without ephemeral disk. Further we asked if there are any plans to change IMDS behave in future. We got the answer that there are no plans or roadmaps in place to change the current behave. Based on this reply, my expectation would be that Azure IMDS will not present any information on the absence/presence of the ephemeral disk in the foreseeable future. However, the biggest operational issue at the moment is indeed the 120 second boot delay, caused by waiting for a disk that never appears. So the proposal of @johnsonshi to limit the scope of this PR to fix (1) only seems perfectly fine. |
|
I think the 5 second wait is just silly. Either drop it or leave it as is.
I really don't understand how you got from "IMDS won't present any information" to "lets wait 5 seconds". I see the following 3 options:
I'd change option 3 to do a Basically I don't buy the argument for 5 second delay... If that delay fixes 90% (or 99% or 99.9%) of the cases, all you did was make them harder to reproduce and find a real solution than option 3. The half-way solution just doesn't really help. |
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon. (If the pull request is closed, please do feel free to reopen it if you wish to continue working on it.) |
I don't think waiting helps anything; either it's there or it isn't and 5 seconds won't matter. Drop the wait entirely. @johnsonshi I suggest rebasing this on master, dropping the wait entirely and then mark this ready for review. |
Proposed Commit Message
Azure: Support for VMs without ephemeral resource disks.
Changes:
during
DataSourceAzure._get_data()if the ephemeral diskexists.
DataSourceAzure.address_ephemeral_resize()(which isinvoked in
DataSourceAzure.activate()should only set upthe ephemeral disk if the disk exists.
Azure VMs may or may not come with ephemeral resource disks
depending on the VM SKU. For VM SKUs that come with
ephemeral resource disks, the Azure platform guarantees that
the ephemeral resource disk is attached to the VM before
the VM is booted. For VM SKUs that do not come with
ephemeral resource disks, cloud-init currently attempts
to wait and set up a non-existent ephemeral resource
disk, which wastes boot time. It also causes disk setup
modules to fail (due to non-existent references to the
ephemeral resource disk).
udevadm settleis invoked by cloud-init very early in boot.udevadm settleis invoked very early, beforeDataSourceAzure's_get_data()andactivate()methods.Within
DataSourceAzure's_get_data()andactivate()methods,the ephemeral resource disk path should exist if the
VM SKU comes with an ephemeral resource disk.
The ephemeral resource disk path should not exist if the
VM SKU does not come with an ephemeral resource disk.
LP: #1901011
Additional Context
Problem
For Azure VMs, cloud-init's
DataSourceAzure.pyformats and addresses ephemeral disk resizing. It does this for all Azure VM SKUs. See code and code.The code right now waits up to 120 seconds for the ephemeral disk to appear before either proceeding or giving up. It waits up to 120 secs for the symlink
/dev/disk/cloud/azure_resourceto appear.For new Azure VM SKUs (such as
Dv4,Dsv4,Ev4,Esv4) that do not come with ephemeral resource disks, cloud-init would wait up to 120 seconds before giving up. See LP: #1901011.For these new Azure VM SKUs without ephemeral resource disks, the
disk_setupmodule would also fail later in cloud-init because "builtin Azure ephemeral disk configs" are merged into DataSourceAzure metadata. These builtin configs reference the non-existent ephemeral disk, which causes the module to fail.Why this approach was chosen
As of today, the Azure Instance Metadata Service (Azure IMDS) does not expose VM instance metadata indicating whether an ephemeral resource disk exists for the VM or not.
The Azure host also guarantees that the ephemeral resource disk is attached to the VM before it is booted during VM deployment.
Additionally, the ephemeral resource disk symlink (
/dev/disk/cloud/azure_resource) that cloud-init waits for is actually created by a udev rule that comes with cloud-init. Additional relevant code, code, and code.Because:
udevrules (created as soon as the kernel detects the disk and sends the event toudev),udevadm settleis invoked very early in boot by cloud-init (before DataSourceAzure runs),it is guaranteed that the ephemeral resource disk symlink exists by the time DataSourceAzure runs.
Test Steps
No regression for VM SKUs with ephemeral resource disk
Standard_DS1_V2VM (has ephemeral resource disk) from this custom image.Fix for VM SKUs without ephemeral resource disk
Standard_D2s_v4VM (no ephemeral resource disk) from this custom image.Checklist: