cuD-PDLP by Bubullzz · Pull Request #1391 · NVIDIA/cuopt

Bubullzz · 2026-06-04T15:24:15Z

Implemented metis-partitionned multi-GPU PDLP.

On 8 NVLINKed B200 : 2x to 6x speedup against CuOpt PDLP. 0.5x to 4x speedup against D-PDLP. My implementation scales better on bigger instances.
I am coming back soon with the memory footprint gains

closes #891

…he cycle seems to be fixed, cuopt compiles

…olver !!!

+ style too

…k on main

compiles and runs

Bubullzz · 2026-06-17T15:15:45Z

/ok to test 56d5580

Bubullzz · 2026-06-17T15:15:51Z

@coderabbitai review

coderabbitai · 2026-06-17T15:15:58Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp (1)
151-181: ⚠️ Potential issue | 🟠 Major

Add error checking to all NCCL calls.

The ncclGroupStart, ncclGroupEnd, ncclSend, ncclRecv, and ncclAllReduce calls do not check return codes. NCCL errors (e.g., network failures, rank mismatches) will silently corrupt distributed execution or cause hangs. Wrap these calls with error checking similar to how RAFT_CUDA_TRY is used for CUDA errors throughout the codebase. This applies to all occurrences in multi_gpu_engine.hpp (lines 151–181, 218–248, 260–267) and distributed_algorithms.cu.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp` around lines 151 - 181,
The NCCL calls ncclGroupStart, ncclSend, ncclRecv, and ncclGroupEnd in the
distributed variable synchronization loops do not check return codes, which can
cause silent errors or hangs. Wrap each of these NCCL function calls with error
checking (similar to the RAFT_CUDA_TRY pattern used elsewhere in the codebase)
to capture and handle potential NCCL errors such as network failures or rank
mismatches. Apply this error checking consistently to all NCCL calls in the
affected regions and ensure that errors are properly reported before continuing
with execution.

🧹 Nitpick comments (2)

cpp/src/pdlp/distributed_pdlp/partitioner.hpp (1)
9-9: 💤 Low value

Unused <string> include.

This header uses char const* for the context parameter in validate_partition, not std::string. The <string> include can be removed per IWYU.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/distributed_pdlp/partitioner.hpp` at line 9, The `#include
<string>` header on line 9 is unused because the `validate_partition` function
uses `char const*` for its context parameter instead of `std::string`. Remove
this include statement to follow IWYU (Include What You Use) principles and
reduce unnecessary dependencies.
cpp/src/pdlp/pdlp.cu (1)
2996-2999: ⚡ Quick win

Consider moving distributed average-restart validation to construction time.

This runtime check correctly guards against using average restart in distributed mode (since unscaled_primal_avg_solution_ and unscaled_dual_avg_solution_ remain zero-sized from the shape-0 placeholder delegation). However, failing deep in the solve loop provides a poor user experience.

Add a precondition check in the distributed constructor (around line 400) to reject the incompatible configuration early:
cuopt_expects(settings.hyper_params.never_restart_to_average,
              error_type_t::ValidationError,
              "Distributed PDLP requires never_restart_to_average = true");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/pdlp.cu` around lines 2996 - 2999, The validation check for
distributed PDLP's average restart constraint is currently performed during the
solve loop (around the cuopt_expects call with never_restart_to_average
message), which is too late for good user experience. Move this validation to
the distributed constructor (around line 400) to catch the incompatible
configuration early at construction time. Add a cuopt_expects precondition that
checks settings.hyper_params.never_restart_to_average is true and uses
error_type_t::ValidationError instead of RuntimeError. This allows users to
discover the configuration requirement immediately rather than failing deep in
the solve process.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp`:
- Around line 151-181: The NCCL calls ncclGroupStart, ncclSend, ncclRecv, and
ncclGroupEnd in the distributed variable synchronization loops do not check
return codes, which can cause silent errors or hangs. Wrap each of these NCCL
function calls with error checking (similar to the RAFT_CUDA_TRY pattern used
elsewhere in the codebase) to capture and handle potential NCCL errors such as
network failures or rank mismatches. Apply this error checking consistently to
all NCCL calls in the affected regions and ensure that errors are properly
reported before continuing with execution.

---

Nitpick comments:
In `@cpp/src/pdlp/distributed_pdlp/partitioner.hpp`:
- Line 9: The `#include <string>` header on line 9 is unused because the
`validate_partition` function uses `char const*` for its context parameter
instead of `std::string`. Remove this include statement to follow IWYU (Include
What You Use) principles and reduce unnecessary dependencies.

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2996-2999: The validation check for distributed PDLP's average
restart constraint is currently performed during the solve loop (around the
cuopt_expects call with never_restart_to_average message), which is too late for
good user experience. Move this validation to the distributed constructor
(around line 400) to catch the incompatible configuration early at construction
time. Add a cuopt_expects precondition that checks
settings.hyper_params.never_restart_to_average is true and uses
error_type_t::ValidationError instead of RuntimeError. This allows users to
discover the configuration requirement immediately rather than failing deep in
the solve process.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 67a6c70f-e5d7-4fb3-bb09-3e515a5f65f2

📥 Commits

Reviewing files that changed from the base of the PR and between c42f770 and 56d5580.

📒 Files selected for processing (22)

cpp/CMakeLists.txt
cpp/cuopt_cli.cpp
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
cpp/src/pdlp/CMakeLists.txt
cpp/src/pdlp/distributed_pdlp/distributed_algorithms.cu
cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.cu
cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp
cpp/src/pdlp/distributed_pdlp/partition_loader.cu
cpp/src/pdlp/distributed_pdlp/partitioner.cpp
cpp/src/pdlp/distributed_pdlp/partitioner.hpp
cpp/src/pdlp/distributed_pdlp/rank_data.hpp
cpp/src/pdlp/distributed_pdlp/shard.cu
cpp/src/pdlp/distributed_pdlp/shard.hpp
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/pdhg.hpp
cpp/src/pdlp/pdlp.cu
cpp/src/pdlp/pdlp.cuh
cpp/src/pdlp/saddle_point.cu
cpp/src/pdlp/solve.cuh
cpp/src/pdlp/termination_strategy/convergence_information.cu
cpp/src/pdlp/termination_strategy/convergence_information.hpp

💤 Files with no reviewable changes (1)

cpp/src/pdlp/saddle_point.cu

🚧 Files skipped from review as they are similar to previous changes (15)

cpp/src/pdlp/CMakeLists.txt
cpp/src/pdlp/distributed_pdlp/rank_data.hpp
cpp/CMakeLists.txt
cpp/src/pdlp/pdlp.cuh
cpp/src/pdlp/termination_strategy/convergence_information.hpp
cpp/src/pdlp/solve.cuh
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
cpp/src/pdlp/distributed_pdlp/shard.hpp
cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.cu
cpp/cuopt_cli.cpp
cpp/src/pdlp/distributed_pdlp/partition_loader.cu
cpp/src/pdlp/distributed_pdlp/partitioner.cpp
cpp/src/pdlp/distributed_pdlp/shard.cu
cpp/src/pdlp/distributed_pdlp/distributed_algorithms.cu
cpp/src/pdlp/termination_strategy/convergence_information.cu

Bubullzz · 2026-06-17T17:20:28Z

/ok to test e2a36ab

…mmunicators emoji troll 🧌

Bubullzz · 2026-06-18T11:03:39Z

/ok to test 9c1e345

Bubullzz · 2026-06-18T13:17:24Z

/ok to test 34196b1

Bubullzz added 30 commits May 7, 2026 15:07

first commit !! added multi_gpu_partition file to solver settings

1e0bd53

slowly skeletonning

978d17b

better shard.cuh

dd0c0ef

wip

2037eca

added a bit of skeleton. Forward declared pdlp_solver in shard.hpp, t…

0f62eff

…he cycle seems to be fixed, cuopt compiles

still wip but going well

d89c85a

cursor broke everything grrr

5534ff0

partition loader now partition loads

dd935c5

big advancements ayo ! We can soon start working on imlementing the s…

09eb20b

…olver !!!

added pre loop setup need to manage boxing

b5ebfd2

+ style too

added distributed transform

0965a60

added semicolon and existing runtime error enum

d4d1cab

added } and fixed cuot_expects in partition loader

6659dd9

small bug fixes

b2ed271

a version that compiles #heheha 😎😎😎😎

50d16ce

removed use of engine:transaform

359d9f4

added multi-gpu SpMV #heheha

910a49a

transformed a transform. it compiles hehe

76c0b3f

updated take step for distributed. compiles but doesnt run. will chec…

5ec7138

…k on main

Merge branch 'main' into cuD-PDLP

1f02afd

support spmvop on multi-gpu

de19f38

compile ready

0030a6c

can run now

172ebc2

passing all tests, good merge

23d0798

fixed the errors hihi, finished distributed part for compte_fixed_error

30881ce

style

c33faf2

now manage halpern update in multi-gpu pdlp

98e0ce6

small fix to calls of multi_gpu_engine_ and scale/unscale solutions.

84128bf

compiles and runs

comments

abe4dd2

added is multi gpu to pdhg

5c41497

Bubullzz added 10 commits June 17, 2026 15:04

cleaned pdhg.hpp and removed is_multi_gpu flag

e680422

cleaner distributed handling

6000b75

removed unused disrtibuted spmv in multi_gpu_engine

8c67f90

cleaned pdhg a bit more

1f42904

pdhg review ready

30319a0

finished cuopt_cli

2a14a4d

cleaned initial_scaling

9052e31

pdlp and solve.cu

26d7f9e

cnvergence info v2

733334e

FINISHED

56d5580

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Bubullzz marked this pull request as ready for review June 17, 2026 15:27

Kh4ster assigned Kh4ster and Bubullzz Jun 17, 2026

Kh4ster added feature request New feature or request non-breaking Introduces a non-breaking change pdlp labels Jun 17, 2026

Kh4ster changed the title ~~CuD-PDLP~~ cuD-PDLP Jun 17, 2026

style

e2a36ab

Bubullzz removed the do not merge Do not merge if this flag is set label Jun 18, 2026

Bubullzz added 3 commits June 18, 2026 03:52

moved mgpu engine up so graph gets destroyed before the mgpuengine co…

45556da

…mmunicators emoji troll 🧌

removed a useless inclyde and moved an assert up for coderabbit

d5f1ce0

added nccl_try everywhere

9c1e345

style

34196b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuD-PDLP#1391

cuD-PDLP#1391
Bubullzz wants to merge 130 commits into
NVIDIA:mainfrom
Bubullzz:cuD-PDLP

Bubullzz commented Jun 4, 2026 •

edited

Loading

Uh oh!

Bubullzz commented Jun 17, 2026

Uh oh!

Bubullzz commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Bubullzz commented Jun 17, 2026

Uh oh!

Bubullzz commented Jun 18, 2026

Uh oh!

Bubullzz commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Bubullzz commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bubullzz commented Jun 17, 2026

Uh oh!

Bubullzz commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Bubullzz commented Jun 17, 2026

Uh oh!

Bubullzz commented Jun 18, 2026

Uh oh!

Bubullzz commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bubullzz commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading