Skip to content

docs: add DPF setup guide for NICo integration#1999

Open
abvarshney-nv wants to merge 2 commits into
NVIDIA:mainfrom
abvarshney-nv:doc_dpf
Open

docs: add DPF setup guide for NICo integration#1999
abvarshney-nv wants to merge 2 commits into
NVIDIA:mainfrom
abvarshney-nv:doc_dpf

Conversation

@abvarshney-nv
Copy link
Copy Markdown
Contributor

Description

Adds a deployment manual for installing and configuring DPF when it is used as the DPU provisioning backend for NICo. The guide is structured as a supplement to the upstream DPF documentation and covers only the NICo-specific deltas: prerequisites (operator namespace, NGC image-pull and Argo CD Helm repository Secrets, cert-manager approval policy and RBAC), DPF operator chart parameter overrides, and the post-install cluster state (orchestrator RBAC, DPFOperatorConfig, DPUCluster, VIP LoadBalancer Service/Endpoints) that must be in place before NICo starts.

Wires the new page into the docs navigation under
Getting Started > Installation Options.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Adds a deployment manual for installing and configuring DPF when it
is used as the DPU provisioning backend for NICo. The guide is
structured as a supplement to the upstream DPF documentation and
covers only the NICo-specific deltas: prerequisites (operator
namespace, NGC image-pull and Argo CD Helm repository Secrets,
cert-manager approval policy and RBAC), DPF operator chart parameter
overrides, and the post-install cluster state (orchestrator RBAC,
DPFOperatorConfig, DPUCluster, VIP LoadBalancer Service/Endpoints)
that must be in place before NICo starts. The guide is intentionally
agnostic to the underlying Kubernetes implementation and assumes a
working cluster is already available.

Wires the new page into the docs navigation under
Getting Started > Installation Options.
@krish-nvidia
Copy link
Copy Markdown
Contributor

I noticed that this is a setup guide for DPF. Could you also update the following document to provide a high-level overview of DPF-managed DPU installation and reprovisioning? This information can be included in a follow-up PR as well.

https://github.com/NVIDIA/infra-controller/blob/main/docs/dpu-management/dpu-lifecycle-management.md

@abvarshney-nv
Copy link
Copy Markdown
Contributor Author

abvarshney-nv commented May 29, 2026

I noticed that this is a setup guide for DPF. Could you also update the following document to provide a high-level overview of DPF-managed DPU installation and reprovisioning? This information can be included in a follow-up PR as well.

https://github.com/NVIDIA/infra-controller/blob/main/docs/dpu-management/dpu-lifecycle-management.md

thanks @krish-nvidia . I will take care of it in next PR.

Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment on lines +346 to +349
resources:
- dpuservices
- dpuservicechains
verbs: ["get", "list"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you get chain and dpuservice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Claude used the API declaration in kube.rs file. These APIs were added during milestone 1 during sdk implementation. This is possible that few of them are not used now. I have to check the sdk implementation with other caller implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to use DPUServices to check the status of overall env. I am working on admin-cli dpf verify command which will verify some basic details of the env. I will raise PR soon, WIP.

Comment thread docs/manuals/dpf.md
verbs: ["get", "list", "patch"]
- apiGroups: ["operator.dpu.nvidia.com"]
resources: ["dpfoperatorconfigs"]
verbs: ["get", "patch"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you patch dpfoperatorconfig?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some verbs were kept for debugging purpose. I will check later and remove them. I hope it is not any blocker as per you.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, it will just make us more comfortable if we limit this down to what is really required as this will indicate we did the right thing with the integration and nothing slipped. I'm adding comments to the things I see suspicious.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We patch DPFOperatorConfig to remove BFCFG template which was added in milestone 1. Right now we just set is None regardless of anything as we don't support milestone 1.

Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md
Comment thread docs/manuals/dpf.md
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md
Comment on lines +334 to +336
- apiGroups: ["provisioning.dpu.nvidia.com"]
resources: ["dpuclusters"]
verbs: ["get", "list"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what type of get/list you do for the DPUCluster?

Comment thread docs/manuals/dpf.md
Comment on lines +340 to +342
- apiGroups: ["svc.dpu.nvidia.com"]
resources: ["dpuservices", "dpuservicechains"]
verbs: ["get", "list", "create", "patch", "delete"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious, why do you need those?

Comment thread docs/manuals/dpf.md
verbs: ["get", "list", "patch"]
- apiGroups: ["operator.dpu.nvidia.com"]
resources: ["dpfoperatorconfigs"]
verbs: ["get", "patch"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, it will just make us more comfortable if we limit this down to what is really required as this will indicate we did the right thing with the integration and nothing slipped. I'm adding comments to the things I see suspicious.

Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
@abvarshney-nv
Copy link
Copy Markdown
Contributor Author

@vasrem The AI picked the verbs from the DPF SDk implementation. Some verbs are implemented there but not used. I won't update the verbs in this PR because I have to remove it from code also. I will work on a new PR where will remove the unneeded PRs. For details of NICo's operations, you can have a look at the design document.

Comment thread docs/manuals/dpf.md Outdated
Comment thread docs/manuals/dpf.md Outdated
@abvarshney-nv abvarshney-nv force-pushed the doc_dpf branch 4 times, most recently from ceb6869 to b5786c3 Compare June 1, 2026 17:40
Comment thread docs/index.yml
@@ -53,6 +53,9 @@ navigation:
path: getting-started/installation-options/reference-install.md
- page: Day 0 IP and Network Configuration
path: getting-started/installation-options/day0-ip-network-config.md
- page: DPF Setup for NICo Integration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend putting this doc in the "Provisioning (Day 0 Operations)" section, maybe after "Ingesting Hosts (REST API)?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants