docs: add DPF setup guide for NICo integration#1999
Conversation
Adds a deployment manual for installing and configuring DPF when it is used as the DPU provisioning backend for NICo. The guide is structured as a supplement to the upstream DPF documentation and covers only the NICo-specific deltas: prerequisites (operator namespace, NGC image-pull and Argo CD Helm repository Secrets, cert-manager approval policy and RBAC), DPF operator chart parameter overrides, and the post-install cluster state (orchestrator RBAC, DPFOperatorConfig, DPUCluster, VIP LoadBalancer Service/Endpoints) that must be in place before NICo starts. The guide is intentionally agnostic to the underlying Kubernetes implementation and assumes a working cluster is already available. Wires the new page into the docs navigation under Getting Started > Installation Options.
|
I noticed that this is a setup guide for DPF. Could you also update the following document to provide a high-level overview of DPF-managed DPU installation and reprovisioning? This information can be included in a follow-up PR as well. https://github.com/NVIDIA/infra-controller/blob/main/docs/dpu-management/dpu-lifecycle-management.md |
thanks @krish-nvidia . I will take care of it in next PR. |
| resources: | ||
| - dpuservices | ||
| - dpuservicechains | ||
| verbs: ["get", "list"] |
There was a problem hiding this comment.
do you get chain and dpuservice?
There was a problem hiding this comment.
The Claude used the API declaration in kube.rs file. These APIs were added during milestone 1 during sdk implementation. This is possible that few of them are not used now. I have to check the sdk implementation with other caller implementation.
There was a problem hiding this comment.
I am planning to use DPUServices to check the status of overall env. I am working on admin-cli dpf verify command which will verify some basic details of the env. I will raise PR soon, WIP.
| verbs: ["get", "list", "patch"] | ||
| - apiGroups: ["operator.dpu.nvidia.com"] | ||
| resources: ["dpfoperatorconfigs"] | ||
| verbs: ["get", "patch"] |
There was a problem hiding this comment.
Some verbs were kept for debugging purpose. I will check later and remove them. I hope it is not any blocker as per you.
There was a problem hiding this comment.
It's not, it will just make us more comfortable if we limit this down to what is really required as this will indicate we did the right thing with the integration and nothing slipped. I'm adding comments to the things I see suspicious.
There was a problem hiding this comment.
We patch DPFOperatorConfig to remove BFCFG template which was added in milestone 1. Right now we just set is None regardless of anything as we don't support milestone 1.
| - apiGroups: ["provisioning.dpu.nvidia.com"] | ||
| resources: ["dpuclusters"] | ||
| verbs: ["get", "list"] |
There was a problem hiding this comment.
what type of get/list you do for the DPUCluster?
| - apiGroups: ["svc.dpu.nvidia.com"] | ||
| resources: ["dpuservices", "dpuservicechains"] | ||
| verbs: ["get", "list", "create", "patch", "delete"] |
There was a problem hiding this comment.
This looks suspicious, why do you need those?
| verbs: ["get", "list", "patch"] | ||
| - apiGroups: ["operator.dpu.nvidia.com"] | ||
| resources: ["dpfoperatorconfigs"] | ||
| verbs: ["get", "patch"] |
There was a problem hiding this comment.
It's not, it will just make us more comfortable if we limit this down to what is really required as this will indicate we did the right thing with the integration and nothing slipped. I'm adding comments to the things I see suspicious.
|
@vasrem The AI picked the verbs from the DPF SDk implementation. Some verbs are implemented there but not used. I won't update the verbs in this PR because I have to remove it from code also. I will work on a new PR where will remove the unneeded PRs. For details of NICo's operations, you can have a look at the design document. |
ceb6869 to
b5786c3
Compare
| @@ -53,6 +53,9 @@ navigation: | |||
| path: getting-started/installation-options/reference-install.md | |||
| - page: Day 0 IP and Network Configuration | |||
| path: getting-started/installation-options/day0-ip-network-config.md | |||
| - page: DPF Setup for NICo Integration | |||
There was a problem hiding this comment.
I would recommend putting this doc in the "Provisioning (Day 0 Operations)" section, maybe after "Ingesting Hosts (REST API)?"
Description
Adds a deployment manual for installing and configuring DPF when it is used as the DPU provisioning backend for NICo. The guide is structured as a supplement to the upstream DPF documentation and covers only the NICo-specific deltas: prerequisites (operator namespace, NGC image-pull and Argo CD Helm repository Secrets, cert-manager approval policy and RBAC), DPF operator chart parameter overrides, and the post-install cluster state (orchestrator RBAC, DPFOperatorConfig, DPUCluster, VIP LoadBalancer Service/Endpoints) that must be in place before NICo starts.
Wires the new page into the docs navigation under
Getting Started > Installation Options.
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes