feat(api): skip DPU-tied steps for zero-DPU hosts#1980
Conversation
…u_hosts When allow_zero_dpu_hosts is true, zero-DPU hosts skip: is_bios_setup verification, machine_setup (short-circuited before the Redfish call), is_boot_order_setup, and set_host_boot_order's SetBootOrder arm (returns Done — no DPU-first boot ordering to configure). The toggle is surfaced in the local nico-core values (default false). Signed-off-by: s3rj1k <evasive.gyron@gmail.com>
a4fb21b to
bb43f98
Compare
|
/ok to test bb43f98 |
`set_boot_order_dpu_first` is DPU-targeting and returns `NotSupported` on vendors without a custom impl. For hosts with no DPU under `allow_zero_dpu_hosts`, PATCH BootSourceOverride directly via `boot_first` — try UefiHttp first, fall back to Pxe for BMCs that don't accept UefiHttp. The downstream SetBootOrder substates already handle `jid = None`, so the host still progresses through reboot + verify. Signed-off-by: s3rj1k <evasive.gyron@gmail.com>
954bd43 to
9840809
Compare
|
/ok to test 9840809 |
There was a problem hiding this comment.
I'm not sure we want to do this... these steps are still important even without DPU's... we want boot order setup so that we can boot to the scout image, which is still a thing even in the zero-DPU world. Plus all the other things that machine_setup does.
| tracing::info!( | ||
| "Skipping machine_setup: zero-DPU host (allow_zero_dpu_hosts=true); BIOS profile is DPU-tied." | ||
| ); | ||
| return Ok(None); |
There was a problem hiding this comment.
Hmm, I'm not sure we want to skip the setup just because there are zero DPU's. machine_setup does other things like enable virtualization, clears the TPM, sets up serial console, etc.
libredfish only sends a NoDpu error for Dell systems today: https://github.com/NVIDIA/libredfish/blob/dd2152ac5642c5256b893e396647e159003d0071/src/dell.rs#L363 ... and even then it still ends up applying the rest of the config (although it doesn't return a job ID which is bad.)
libredfish only tries to detect the DPU so it can determine what interface MAC address to configure for network boot (the DPU becomes the boot device), so the correct fix is to send it the non-DPU primary NIC as the MAC address as part of the setup call (see my TODO line above which we never got around to fixing.)
For now though, skipping the setup call altogether if there are zero DPUs is not what we want.
There was a problem hiding this comment.
machine_setup does other things like enable virtualization, clears the TPM, sets up serial console, etc.
not all machines even support this kind of configuration, if no_dpu_flag is not enough, I suggest introducing another one that explicitly disables all this configuration magic and assumes that operator will be in charge of setting up BIOS config
There was a problem hiding this comment.
indeed sounds more like a bios or bmc profile related setting and feature than anything directly related to DPUs
There was a problem hiding this comment.
Logic here is simple, if there is no DPU on server, there is little point in configuring NIC boot device ordering (at this point we don't care if server boot from specific NIC, we only want it to boot from some NIC) and other BIOS enforcements, I do agree that this is more related to server settings, it might be worth having another dedicated flag for this.
Description
When allow_zero_dpu_hosts is true, zero-DPU hosts skip: is_bios_setup verification, machine_setup (short-circuited before the Redfish call), is_boot_order_setup, and set_host_boot_order's SetBootOrder arm (returns Done — no DPU-first boot ordering to configure). The toggle is surfaced in the local nico-core values (default false).
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes