You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
VMs produced by this playbook ship with GRUB_DISABLE_LINUX_UUID=true in /etc/default/grub (inherited from the Ubuntu cloud image). That setting tells grub-mkconfig to emit root=/dev/vda1 in the kernel command line instead of root=UUID=<...>. When a second VIRTIO_DISK is later attached to the VM and the VM is rebooted, the new disk takes PCI slot 0 — the kernel renames it vda, the OS disk becomes vdb, and the initramfs hangs waiting for the non-existent /dev/vda1.
This is the initramfs-level twin of the issue I filed as #7 (HC bootDevices empty by default). The two are independent layers of the same disaster:
Initramfs-level (this issue): if you fix the BIOS path via bootDevices, grub still loads with root=/dev/vda1 → kernel boots → initramfs can't find root because the orphan disk reordered virtio device naming → boot hangs.
Both have to be fixed for a worker VM with attached non-OS disks to survive a reboot.
Reproducer
# On any VM from this playbook (HC 9.6.x):
grep GRUB_DISABLE_LINUX_UUID /etc/default/grub
# → GRUB_DISABLE_LINUX_UUID=true
cat /proc/cmdline
# → BOOT_IMAGE=/boot/vmlinuz-... root=/dev/vda1 ro nomodeset ...# /etc/fstab is already correct (LABEL=cloudimg-rootfs) — only grub is the weak link.# Verified on an HC 9.6.25 cluster: 8 VMs, all 8 had the bad setting.
After attaching a second VIRTIO_DISK and rebooting, the kernel boot reaches Btrfs loaded, crc32c=crc32c-intel, zoned=yes, fsverity=yes and hangs (initramfs is waiting for /dev/vda1 that doesn't exist).
Suggested fix in the playbook
Add a task that runs once at provision time, after the VM has booted but before any non-OS disks could be attached:
- name: Use UUID-based root for resilience against extra virtio disk attachmentsansible.builtin.lineinfile:
path: /etc/default/grubregexp: '^GRUB_DISABLE_LINUX_UUID='line: '#GRUB_DISABLE_LINUX_UUID=true # disabled — vda reorder hazard'
- name: Regenerate grub.cfg with root=UUID=ansible.builtin.command: update-grubchanged_when: false
Or cleaner, set it via cloud-init's runcmd:
runcmd:
- sed -i 's/^GRUB_DISABLE_LINUX_UUID=true/#GRUB_DISABLE_LINUX_UUID=true/' /etc/default/grub
- update-grub
Workaround for existing deployments
Per host:
sudo sed -i.bak 's/^GRUB_DISABLE_LINUX_UUID=true/#GRUB_DISABLE_LINUX_UUID=true/' /etc/default/grub
sudo update-grub
# Verify: grep "root=" /boot/grub/grub.cfg | head -1# should now contain root=UUID=<your-rootfs-uuid>
I just applied this to all 8 VMs on the dd-k3s reference cluster — works cleanly on Ubuntu 22.04 cloud-image. grub.cfg now contains root=UUID=ff9efcec-... instead of root=/dev/vda1. The cluster is now defended against the virtio-reorder boot hang at both layers (BIOS bootDevices already patched via #7's workaround, plus this grub change).
Summary
VMs produced by this playbook ship with
GRUB_DISABLE_LINUX_UUID=truein/etc/default/grub(inherited from the Ubuntu cloud image). That setting tellsgrub-mkconfigto emitroot=/dev/vda1in the kernel command line instead ofroot=UUID=<...>. When a second VIRTIO_DISK is later attached to the VM and the VM is rebooted, the new disk takes PCI slot 0 — the kernel renames itvda, the OS disk becomesvdb, and the initramfs hangs waiting for the non-existent/dev/vda1.This is the initramfs-level twin of the issue I filed as #7 (HC
bootDevicesempty by default). The two are independent layers of the same disaster:bootDevices+ two virtio disks → BIOS gives up → "No bootable device".bootDevices, grub still loads withroot=/dev/vda1→ kernel boots → initramfs can't find root because the orphan disk reordered virtio device naming → boot hangs.Both have to be fixed for a worker VM with attached non-OS disks to survive a reboot.
Reproducer
After attaching a second VIRTIO_DISK and rebooting, the kernel boot reaches
Btrfs loaded, crc32c=crc32c-intel, zoned=yes, fsverity=yesand hangs (initramfs is waiting for/dev/vda1that doesn't exist).Suggested fix in the playbook
Add a task that runs once at provision time, after the VM has booted but before any non-OS disks could be attached:
Or cleaner, set it via cloud-init's
runcmd:Workaround for existing deployments
Per host:
I just applied this to all 8 VMs on the dd-k3s reference cluster — works cleanly on Ubuntu 22.04 cloud-image. grub.cfg now contains
root=UUID=ff9efcec-...instead ofroot=/dev/vda1. The cluster is now defended against the virtio-reorder boot hang at both layers (BIOS bootDevices already patched via #7's workaround, plus this grub change).Why this matters together with #7
bootDevices(BIOS layer) + UUID root (initramfs layer) are both needed. Fixing only one leaves the VM vulnerable:bootDevices/dev/vda1/dev/vda1Both fixes should land together in the playbook for a complete defense.
Suggested labels
bug,priority:high— same "silent until reboot" hazard as #7.