Skip to content

[nvidia_unstable-11.9] Backport util: fix max socket calculation#12

Merged
NathanChenNVIDIA merged 2 commits intonvidia_unstable-11.9from
socket_count_fix_unstable
Feb 10, 2026
Merged

[nvidia_unstable-11.9] Backport util: fix max socket calculation#12
NathanChenNVIDIA merged 2 commits intonvidia_unstable-11.9from
socket_count_fix_unstable

Conversation

@NathanChenNVIDIA
Copy link
Collaborator

On some systems (e.g. GB200), physical_package_id values are not contiguous or zero-based. Instead of 0..N, they may contain large arbitrary identifiers (e.g. 256123234). The previous implementation assumed a 0..N range and used the maximum ID value directly.

This caused:

excessive memory allocation
extremely large loop bounds
OOM / DoS scenarios
unnecessary CPU time consumption
The new implementation computes the socket count as the number of unique package IDs present on the node, rather than relying on the maximum numeric value.

[Testing]
Launching VM via Kubevirt + Libvirt no longer hits OOM from virt-launcher querying the socket IDs and causing the VM to continually allocate large amounts of memory based on the large physical_package_id values.

alex2e78 and others added 2 commits February 10, 2026 19:31
This patch changes how the maximum socket count is calculated.

On some systems (e.g. GB200), physical_package_id values are not
contiguous or zero-based. Instead of 0..N, they may contain large
arbitrary identifiers (e.g. 256123234). The previous implementation
assumed a 0..N range and used the maximum ID value directly.

This caused:
    excessive memory allocation
    extremely large loop bounds
    OOM / DoS scenarios
    unnecessary CPU time consumption

The new implementation computes the socket count as the number of unique
package IDs present on the node, rather than relying on the maximum numeric
value.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alexandr Semenikhin <alexandr2e78@gmail.com>
(cherry picked from commit a64367115015df58e0d82635a40d76df56144c60 https://github.com/libvirt/libvirt/commits/)
Link: https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/COIBU2IGVLC36Q3FLXDL3W7U7WIFVPPJ/
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Copy link
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this is a clean pick and matches upstream.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

Copy link
Collaborator

@ianm-nv ianm-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Acked-by: Ian May <ianm@nvidia.com>

@NathanChenNVIDIA NathanChenNVIDIA merged commit 5eea017 into nvidia_unstable-11.9 Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants