OCPEDGE-2489: fix: pick cluster VM IP from DHCP lease by hostname by jaypoulz · Pull Request #61 · openshift-eng/two-node-toolbox

jaypoulz · 2026-03-30T19:41:24Z

libvirt net-dhcp-leases can show multiple rows per MAC (anonymous DUID vs hostname). Prefer the lease whose Hostname matches the VM (e.g. master-0 for ostest_master_0), then any lease with a hostname, then any lease.

Add roles/common/files/resolve_vm_ip.sh and call it from update-cluster-inventory; use bracketed IPv6 for ssh/ssh-keyscan.

This is UNTESTED on IPv4

Made-with: Cursor

libvirt net-dhcp-leases can show multiple rows per MAC (anonymous DUID vs hostname). Prefer the lease whose Hostname matches the VM (e.g. master-0 for ostest_master_0), then any lease with a hostname, then any lease. Add roles/common/files/resolve_vm_ip.sh and call it from update-cluster-inventory; use bracketed IPv6 for ssh/ssh-keyscan. Made-with: Cursor

openshift-ci-robot · 2026-03-30T19:41:28Z

@jaypoulz: This pull request explicitly references no jira issue.

Details

In response to this:

libvirt net-dhcp-leases can show multiple rows per MAC (anonymous DUID vs hostname). Prefer the lease whose Hostname matches the VM (e.g. master-0 for ostest_master_0), then any lease with a hostname, then any lease.

Add roles/common/files/resolve_vm_ip.sh and call it from update-cluster-inventory; use bracketed IPv6 for ssh/ssh-keyscan.

Made-with: Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-03-30T19:41:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jaypoulz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [jaypoulz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jaypoulz · 2026-03-30T19:41:34Z

/hold

coderabbitai · 2026-03-30T19:41:44Z

Walkthrough

Adds a new Bash script to resolve VM IPs by querying libvirt interfaces and DHCP leases (with optional hostname preference) and updates an Ansible task to call this script instead of inline discovery; also makes SSH host formatting IPv6-safe and increases SSH connect timeout.

Changes

Cohort / File(s)	Summary
VM IP Resolution Script `deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh`	New executable script: enumerates VM interfaces via `virsh domiflist`, queries `virsh net-dhcp-leases` per network for MAC-to-IP mappings, prefers leases matching a derived hostname pattern, falls back to first lease for a MAC, strips CIDR, prints the resolved IP, exits 0 on success or 1 on failure.
Ansible Inventory Update Task `deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml`	Replaced inline VM IP discovery with per-VM invocation of `resolve_vm_ip.sh`; defaulted empty script output to `''` before trimming; compute `SSH_HOST` that wraps IPv6 addresses in brackets for `ssh-keyscan` and `ssh`; increased SSH connect timeout from 10s to 15s.

Sequence Diagram(s)

sequenceDiagram
    participant Ansible
    participant resolve_vm_ip.sh
    participant libvirt as virsh/libvirt
    participant DHCP as net-dhcp-leases
    participant VM

    Ansible->>resolve_vm_ip.sh: invoke with VM_NAME
    resolve_vm_ip.sh->>libvirt: virsh domiflist VM_NAME (get interfaces/MACs)
    resolve_vm_ip.sh->>DHCP: virsh net-dhcp-leases <network> (per network)
    DHCP-->>resolve_vm_ip.sh: lease rows (IP/CIDR, MAC, hostname)
    resolve_vm_ip.sh-->>Ansible: resolved IP (CIDR stripped)
    Ansible->>Ansible: compute SSH_HOST (wrap IPv6 in brackets)
    Ansible->>VM: ssh-keyscan / ssh using SSH_HOST (timeout 15s)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-03-30T19:50:23Z

@jaypoulz: This pull request references OCPEDGE-2489 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

libvirt net-dhcp-leases can show multiple rows per MAC (anonymous DUID vs hostname). Prefer the lease whose Hostname matches the VM (e.g. master-0 for ostest_master_0), then any lease with a hostname, then any lease.

Add roles/common/files/resolve_vm_ip.sh and call it from update-cluster-inventory; use bracketed IPv6 for ssh/ssh-keyscan.

This is UNTESTED on IPv4

Made-with: Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml (1)

20-25: ⚠️ Potential issue | 🟠 Major

Guard the inventory rewrite when IP resolution returns nothing.

Line 25 suppresses every resolver failure. Combined with the filter on Line 34, a bad resolve_vm_ip.sh run turns into an empty parsed_vm_entries, and this block later rewrites [cluster_vms] from that empty list. That can silently wipe the existing VM inventory. Please fail or skip the rewrite when no addresses were resolved.

Suggested guard

     - name: Get VM IP addresses from DHCP leases (hostname preferred)
       script: resolve_vm_ip.sh {{ item | quote }}
       register: vm_ips
       loop: "{{ cluster_vms.stdout_lines }}"
       changed_when: false
       failed_when: false
+
+    - name: Abort when no VM IPs were resolved
+      fail:
+        msg: >-
+          Discovered {{ cluster_vms.stdout_lines | length }} cluster VM(s), but
+          none of their IPs could be resolved from libvirt DHCP leases.
+      when: >-
+        (vm_ips.results | map(attribute='stdout') | map('trim') |
+         reject('equalto', '') | list | length) == 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml`
around lines 20 - 25, The current task running resolve_vm_ip.sh (script:
resolve_vm_ip.sh ...) registers results to vm_ips but suppresses failures, which
can lead to parsed_vm_entries being empty and rewrite the [cluster_vms]
inventory from nothing; update the play to guard the rewrite by checking
vm_ips.results (or the computed parsed_vm_entries) and skip/fail the inventory
update when no IPs were resolved—e.g., after the script task add a conditional
task that inspects vm_ips.results to ensure at least one non-empty address (or
set failed_when: true on the script when all results are empty), and only allow
the subsequent block that rewrites [cluster_vms] (the rewrite task that uses
parsed_vm_entries) to run when that check passes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh`:
- Line 4: The grep -i "$MAC" pipelines currently cause the script to exit under
set -euo pipefail when a match fails; locate the command substitutions that pipe
into grep -i "$MAC" (the ones that populate the IP lookup fallbacks using the
MAC variable) and replace those grep pipelines with an awk match against the MAC
(case-insensitive) so the pipeline returns 0 even when no match is found and
fallbacks continue; ensure awk extracts the same fields as the original pipeline
(preserve the same capture/field selection) and use tolower comparison or
IGNORECASE in awk to implement case-insensitive matching.

---

Outside diff comments:
In `@deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml`:
- Around line 20-25: The current task running resolve_vm_ip.sh (script:
resolve_vm_ip.sh ...) registers results to vm_ips but suppresses failures, which
can lead to parsed_vm_entries being empty and rewrite the [cluster_vms]
inventory from nothing; update the play to guard the rewrite by checking
vm_ips.results (or the computed parsed_vm_entries) and skip/fail the inventory
update when no IPs were resolved—e.g., after the script task add a conditional
task that inspects vm_ips.results to ensure at least one non-empty address (or
set failed_when: true on the script when all results are empty), and only allow
the subsequent block that rewrites [cluster_vms] (the rewrite task that uses
parsed_vm_entries) to run when that check passes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2eaf1dbb-74e9-4e5e-af69-922b609907bd

📥 Commits

Reviewing files that changed from the base of the PR and between c2c0209 and 7f70935.

📒 Files selected for processing (2)

deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh
deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml

coderabbitai · 2026-03-30T19:50:41Z

deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh

+#!/usr/bin/env bash
+# Resolve VM IP from virsh net-dhcp-leases. The same MAC can appear twice (anonymous
+# DUID vs hostname); prefer the row whose Hostname matches this VM (e.g. master-0).
+set -euo pipefail


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail tmp="$(mktemp)" cat >"$tmp" <<'EOF' #!/usr/bin/env bash set -euo pipefail LEASES=$'2026-03-30 00:00:00 aa:bb:cc:dd:ee:ff ipv4 192.0.2.10/24 host-0 duid' MAC='11:22:33:44:55:66' IP=$(printf '%s\n' "$LEASES" | grep -i "$MAC" | awk '{ print $5; exit }') echo "survived: $IP" EOF if bash "$tmp"; then echo "unexpected: command substitution survived" rm -f "$tmp" exit 1 fi rm -f "$tmp" rg -n -C2 'set -euo pipefail|grep -i "\$MAC"' deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh

Repository: openshift-eng/two-node-toolbox

Length of output: 782

🏁 Script executed:

cat -n deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh

Repository: openshift-eng/two-node-toolbox

Length of output: 2096

The grep pipelines can exit the helper before the fallbacks run.

With set -euo pipefail on Line 4, a grep -i "$MAC" miss on Lines 22, 30, or 38 returns 1 and aborts the whole script from inside the command substitution. This prevents fallback attempts and stops the interface loop before all MACs are tried. Match the MAC in awk instead of piping through grep.

Suggested fix

if [[ -n "$EXPECTED_HOSTNAME" ]]; then - IP=$(echo "$LEASES" | grep -i "$MAC" | awk -v host="$EXPECTED_HOSTNAME" '$6 == host { + IP=$(awk -v mac="$MAC" -v host="$EXPECTED_HOSTNAME" 'tolower($3) == mac && $6 == host { ip = $5 sub(/\/.*/, "", ip) print ip exit - }') + }' <<< "$LEASES") fi if [[ -z "$IP" ]]; then - IP=$(echo "$LEASES" | grep -i "$MAC" | awk '$6 != "-" { + IP=$(awk -v mac="$MAC" 'tolower($3) == mac && $6 != "-" { ip = $5 sub(/\/.*/, "", ip) print ip exit - }') + }' <<< "$LEASES") fi if [[ -z "$IP" ]]; then - IP=$(echo "$LEASES" | grep -i "$MAC" | awk '{ + IP=$(awk -v mac="$MAC" 'tolower($3) == mac { ip = $5 sub(/\/.*/, "", ip) print ip exit - }') + }' <<< "$LEASES") fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh` at line 4, The grep -i "$MAC" pipelines currently cause the script to exit under set -euo pipefail when a match fails; locate the command substitutions that pipe into grep -i "$MAC" (the ones that populate the IP lookup fallbacks using the MAC variable) and replace those grep pipelines with an awk match against the MAC (case-insensitive) so the pipeline returns 0 even when no match is found and fallbacks continue; ensure awk extracts the same fields as the original pipeline (preserve the same capture/field selection) and use tolower comparison or IGNORECASE in awk to implement case-insensitive matching.

Some loop results from the VM IP resolver may not include stdout when a single script invocation fails, which previously caused a hard template failure. Default stdout to an empty string so the play skips that entry. Made-with: Cursor

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml (1)
20-25: ⚠️ Potential issue | 🟠 Major

Don't silently rewrite cluster_vms to zero hosts on resolver failure.

deploy/openshift-clusters/roles/common/files/resolve_vm_ip.sh exits non-zero with empty stdout on its no-result paths, and this block now turns those cases into “skip this VM”. If the resolver breaks globally, the later inventory rewrite still removes the existing [cluster_vms] section and writes back an empty group even though virsh list found running VMs. Please fail loudly when VMs were discovered but zero IPs were resolved.

Also applies to: 27-38
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml`
around lines 20 - 25, The current task "Get VM IP addresses from DHCP leases
(hostname preferred)" runs resolve_vm_ip.sh with failed_when: false which hides
resolver failures and lets the inventory be rewritten empty; change the flow so
that you still collect per-VM results in vm_ips but explicitly fail if virsh
found VMs (cluster_vms.stdout_lines is non-empty) and none of the vm_ips.results
contain a usable IP. Concretely: keep the script task but remove or limit the
global failed_when:false behavior, then add a follow-up assertion/fail task that
examines vm_ips.results and fails with a clear message when
cluster_vms.stdout_lines | length > 0 and vm_ips.results |
selectattr('stdout','truthy') | list | length == 0 (apply same check for the
other similar block referenced). Ensure the failure path triggers before the
inventory rewrite.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml`:
- Around line 20-25: The current task "Get VM IP addresses from DHCP leases
(hostname preferred)" runs resolve_vm_ip.sh with failed_when: false which hides
resolver failures and lets the inventory be rewritten empty; change the flow so
that you still collect per-VM results in vm_ips but explicitly fail if virsh
found VMs (cluster_vms.stdout_lines is non-empty) and none of the vm_ips.results
contain a usable IP. Concretely: keep the script task but remove or limit the
global failed_when:false behavior, then add a follow-up assertion/fail task that
examines vm_ips.results and fails with a clear message when
cluster_vms.stdout_lines | length > 0 and vm_ips.results |
selectattr('stdout','truthy') | list | length == 0 (apply same check for the
other similar block referenced). Ensure the failure path triggers before the
inventory rewrite.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7991a4bc-ac01-4a44-b04f-5780bdc69110

📥 Commits

Reviewing files that changed from the base of the PR and between 7f70935 and e31d6ca.

📒 Files selected for processing (1)

deploy/openshift-clusters/roles/common/tasks/update-cluster-inventory.yml

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 30, 2026

openshift-ci bot requested review from fonta-rh and jerpeter1 March 30, 2026 19:41

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 30, 2026

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2026

jaypoulz changed the title ~~NO-JIRA: fix: pick cluster VM IP from DHCP lease by hostname~~ OCPEDGE-2489: fix: pick cluster VM IP from DHCP lease by hostname Mar 30, 2026

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPEDGE-2489: fix: pick cluster VM IP from DHCP lease by hostname#61

OCPEDGE-2489: fix: pick cluster VM IP from DHCP lease by hostname#61
jaypoulz wants to merge 2 commits intoopenshift-eng:mainfrom
jaypoulz:fix-dhcp-lease-hostname

jaypoulz commented Mar 30, 2026 •

edited

Loading

Uh oh!

openshift-ci-robot commented Mar 30, 2026

Uh oh!

openshift-ci bot commented Mar 30, 2026

Uh oh!

jaypoulz commented Mar 30, 2026

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

openshift-ci-robot commented Mar 30, 2026 •

edited by openshift-ci bot

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 30, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jaypoulz commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 30, 2026

Uh oh!

openshift-ci bot commented Mar 30, 2026

Uh oh!

jaypoulz commented Mar 30, 2026

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

openshift-ci-robot commented Mar 30, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaypoulz commented Mar 30, 2026 •

edited

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

openshift-ci-robot commented Mar 30, 2026 •

edited by openshift-ci bot

Loading