Summary
Worker / control-plane VMs provisioned by this playbook are created without an explicit bootDevices field on the HC VirDomain. They boot fine on first power-on because HC's BIOS auto-falls-back to the only attached VIRTIO_DISK. But the next time a second VIRTIO_DISK is attached to the VM (any HC API consumer — manual disk add, Terraform provider, dynamic-storage workflows, etc.) AND the VM is rebooted, the BIOS gives up with Boot failed: not a bootable disk / No bootable device. The VM is then unbootable until bootDevices is set manually via the HC UI.
This was reproduced on an HC 9.6.25.224460 cluster: all 8 VMs (3 servers + 4 agents + 1 nfs-server) provisioned by this playbook ~728 days ago had bootDevices = []. A second VIRTIO_DISK was attached to one agent VM, and on the next STOP/START it bricked.
The root cause is an HC platform bug (separate issue filed internally against the HC platform team). The workaround belongs here regardless: the playbook is the source of these VMs and is in the best position to set bootDevices correctly at provision time. That covers every existing and future deployment without waiting on the HC platform fix landing.
Suggested fix
When provisioning a VM via the HC API, include the primary disk UUID in the request's bootDevices list, or PATCH the VM right after disk attach. Something like:
- name: Set explicit boot order on the VM
ansible.builtin.uri:
url: "https://{{ hc_host }}/rest/v1/VirDomain/{{ vm_uuid }}"
method: PATCH
user: "{{ hc_user }}"
password: "{{ hc_pass }}"
force_basic_auth: yes
validate_certs: no
body_format: json
body:
bootDevices:
- "{{ os_disk_uuid }}"
Or, if scale_computing.hypercore Ansible collection's vm module exposes the field, set it there at create time.
Workaround for existing deployments
curl -sk -u admin:admin -X PATCH "https://<hc-host>/rest/v1/VirDomain/<vm-uuid>" \
-H 'Content-Type: application/json' \
-d '{"bootDevices":["<os-disk-uuid>"]}'
Evidence
A worker VM before manual fix: bootDevices=[]
blockDevs:
f8b7960b VIRTIO_DISK 100GB slot=1 <- OS disk
16e1edfa IDE_CDROM 1MB slot=0 <- cloud-init
93969048 VIRTIO_DISK 1GB slot=0 <- second virtio (any source — manual add, dynamic storage, etc.)
After STOP+START: BIOS → "Boot failed: not a bootable disk / No bootable device"
All other VMs in the cluster: bootDevices=[] — boot fine today because they still
have only one VIRTIO_DISK, but vulnerable to the same brick the moment a second
virtio disk is attached AND the VM is rebooted.
Suggested labels
bug, priority:high (silent + recovery-blocking failure mode)
Summary
Worker / control-plane VMs provisioned by this playbook are created without an explicit
bootDevicesfield on the HCVirDomain. They boot fine on first power-on because HC's BIOS auto-falls-back to the only attached VIRTIO_DISK. But the next time a second VIRTIO_DISK is attached to the VM (any HC API consumer — manual disk add, Terraform provider, dynamic-storage workflows, etc.) AND the VM is rebooted, the BIOS gives up withBoot failed: not a bootable disk / No bootable device. The VM is then unbootable untilbootDevicesis set manually via the HC UI.This was reproduced on an HC 9.6.25.224460 cluster: all 8 VMs (3 servers + 4 agents + 1 nfs-server) provisioned by this playbook ~728 days ago had
bootDevices = []. A second VIRTIO_DISK was attached to one agent VM, and on the next STOP/START it bricked.The root cause is an HC platform bug (separate issue filed internally against the HC platform team). The workaround belongs here regardless: the playbook is the source of these VMs and is in the best position to set
bootDevicescorrectly at provision time. That covers every existing and future deployment without waiting on the HC platform fix landing.Suggested fix
When provisioning a VM via the HC API, include the primary disk UUID in the request's
bootDeviceslist, or PATCH the VM right after disk attach. Something like:Or, if
scale_computing.hypercoreAnsible collection'svmmodule exposes the field, set it there at create time.Workaround for existing deployments
Evidence
Suggested labels
bug,priority:high(silent + recovery-blocking failure mode)