This bug was originally filed in Launchpad as LP: #1877491
Launchpad details
affected_projects = []
assignee = mruffell
assignee_name = Matthew Ruffell
date_closed = 2020-08-25T19:31:30.780654+00:00
date_created = 2020-05-08T01:56:19.295729+00:00
date_fix_committed = 2020-06-09T21:58:19.767584+00:00
date_fix_released = 2020-08-25T19:31:30.780654+00:00
id = 1877491
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1877491
milestone = None
owner = mruffell
owner_name = Matthew Ruffell
private = False
status = fix_released
submitter = mruffell
submitter_name = Matthew Ruffell
tags = ['sts']
duplicates = []
Launchpad user Matthew Ruffell(mruffell) wrote on 2020-05-08T01:56:19.295729+00:00
Currently, we populate the debconf database variable grub-pc/install_devices by checking to see if a device is present in a hardcoded list [1] of directories:
- /dev/sda
- /dev/vda
- /dev/xvda
- /dev/sda1
- /dev/vda1
- /dev/xvda1
[1] https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_grub_dpkg.py
While this is a simple elegant solution, the hardcoded list does not match real world conditions, where grub is installed to a disk which is not on this list.
The primary example is any cloud which uses NVMe storage, such as AWS c5 instances.
/dev/nvme0n1 is not on the above list, and in this case, falls back to a hardcoded /dev/sda value for grub-pc/install_devices.
The thing is, the grub postinstall script [2] checks to see if the value from grub-pc/install_devices exists, and if it doesn't, shows the user an interactive dpkg prompt where they must select the disk to install grub to. See the screenshot [3].
[2] https://paste.ubuntu.com/p/5FChJxbk5K/
[3] https://launchpadlibrarian.net/478771797/Screenshot%20from%202020-04-14%2014-39-11.png
This breaks scripts that don't set DEBIAN_FRONTEND=noninteractive as they get hung waiting for the user to input a choice.
I propose that we modify the cc_grub_dpkg module to be more robust at selecting the correct disk grub is installed to.
Why not simply add an extra directory to the hardcoded list?
Lets take NVMe storage as an example again. On a c5d.large instance I spun up just now, lsblk returns:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 46.6G 0 disk
nvme1n1 259:1 0 8G 0 disk
└─nvme1n1p1 259:2 0 8G 0 part /
We cannot hardcode /dev/nvme0n1, as the NVMe naming conventions are not stable in the kernel, and some boots the 8G disk will be /dev/nvme0n1, and others will be /dev/nvme1n1.
Instead, I propose that we determine which grub has been installed to by following the grub2 debian/postinst.in script, and implementing the algorithm behind usable_partitions(), device_to_id() and available_ids() functions [3].
[3] https://paste.ubuntu.com/p/vKFNSwNyhP/
This uses grub-probe to find the root disk where the /boot directory is located, and then turns the disk name into a /dev/disk/by-id/ value.
This is robust to unstable kernel device naming conventions.
On Nitro, this returns:
/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0179fff411dd211f0
On Xen, this returns:
/dev/xvda
On a typical QEMU/KVM machine, this returns:
/dev/vda
On my personal desktop computer, this returns:
/dev/disk/by-id/ata-WDC_WD5000AAKX-00PWEA0_WD-WMAYP3497618
I have tested this on AWS, on Xen, Nitro, on KVM, with BIOS and EFI based instances, in LXC, and on bare metal with a BIOS based MAAS machine.
All give the correct results in my testing.
TESTING:
You can fetch grub-pc/install_devices with:
$ echo get grub-pc/install_devices | sudo debconf-communicate grub-pc
Reset with:
$ echo reset grub-pc/install_devices | sudo debconf-communicate grub-pc
This bug was originally filed in Launchpad as LP: #1877491
Launchpad details
Launchpad user Matthew Ruffell(mruffell) wrote on 2020-05-08T01:56:19.295729+00:00
Currently, we populate the debconf database variable grub-pc/install_devices by checking to see if a device is present in a hardcoded list [1] of directories:
[1] https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_grub_dpkg.py
While this is a simple elegant solution, the hardcoded list does not match real world conditions, where grub is installed to a disk which is not on this list.
The primary example is any cloud which uses NVMe storage, such as AWS c5 instances.
/dev/nvme0n1 is not on the above list, and in this case, falls back to a hardcoded /dev/sda value for grub-pc/install_devices.
The thing is, the grub postinstall script [2] checks to see if the value from grub-pc/install_devices exists, and if it doesn't, shows the user an interactive dpkg prompt where they must select the disk to install grub to. See the screenshot [3].
[2] https://paste.ubuntu.com/p/5FChJxbk5K/
[3] https://launchpadlibrarian.net/478771797/Screenshot%20from%202020-04-14%2014-39-11.png
This breaks scripts that don't set DEBIAN_FRONTEND=noninteractive as they get hung waiting for the user to input a choice.
I propose that we modify the cc_grub_dpkg module to be more robust at selecting the correct disk grub is installed to.
Why not simply add an extra directory to the hardcoded list?
Lets take NVMe storage as an example again. On a c5d.large instance I spun up just now, lsblk returns:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 46.6G 0 disk
nvme1n1 259:1 0 8G 0 disk
└─nvme1n1p1 259:2 0 8G 0 part /
We cannot hardcode /dev/nvme0n1, as the NVMe naming conventions are not stable in the kernel, and some boots the 8G disk will be /dev/nvme0n1, and others will be /dev/nvme1n1.
Instead, I propose that we determine which grub has been installed to by following the grub2 debian/postinst.in script, and implementing the algorithm behind usable_partitions(), device_to_id() and available_ids() functions [3].
[3] https://paste.ubuntu.com/p/vKFNSwNyhP/
This uses grub-probe to find the root disk where the /boot directory is located, and then turns the disk name into a /dev/disk/by-id/ value.
This is robust to unstable kernel device naming conventions.
On Nitro, this returns:
/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0179fff411dd211f0
On Xen, this returns:
/dev/xvda
On a typical QEMU/KVM machine, this returns:
/dev/vda
On my personal desktop computer, this returns:
/dev/disk/by-id/ata-WDC_WD5000AAKX-00PWEA0_WD-WMAYP3497618
I have tested this on AWS, on Xen, Nitro, on KVM, with BIOS and EFI based instances, in LXC, and on bare metal with a BIOS based MAAS machine.
All give the correct results in my testing.
TESTING:
You can fetch grub-pc/install_devices with:
$ echo get grub-pc/install_devices | sudo debconf-communicate grub-pc
Reset with:
$ echo reset grub-pc/install_devices | sudo debconf-communicate grub-pc