Add latent-objective recognition eval to multi-objective skill by cafzal · Pull Request #1442 · NVIDIA/cuopt

cafzal · 2026-06-18T17:12:27Z

Description

Adds a fifth eval to the cuopt-multi-objective-exploration skill — multiobj-explore-eval-005-latent-objective — covering the boundary the existing four don't: a problem stated with a single objective while a second objective sits latent in the data, unstated.

The current evals all hand the agent both objectives (001 interpret, 002 explore, 004 dual-as-slope) or are explicitly single-objective (003 decoy). None test recognizing a latent objective. This one grades whether the skill makes the agent surface the latent cost objective and trace the supply-vs-cost frontier — rather than optimizing the stated objective alone or silently folding cost into a self-chosen weighted blend (maximize supply − λ·cost). It brackets the skill's activation boundary opposite the 003 decoy.

Behavioral eval (expected_script: null, LLM-graded on the behavior list), same house style as 001/002/004; validate_skills.sh picks up the new array entry and the signature / BENCHMARK.md / skill-card regenerate via NVSkills-Eval. The latent-objective shape is the max-supply supply-vs-cost case validated on cuOpt (Tesla T4) in NVIDIA/cuopt-examples#157.

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

Signed-off-by: cafzal <cameron.afzal@gmail.com>

copy-pr-bot · 2026-06-18T17:12:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…002/004 house style) Signed-off-by: cafzal <cameron.afzal@gmail.com>

coderabbitai · 2026-06-18T17:28:56Z

📝 Walkthrough

Walkthrough

A single new evaluation case, multiobj-explore-eval-005-latent-objective, is appended to the evals array in skills/cuopt-multi-objective-exploration/evals/evals.json. The case defines a multi-period production planning problem with unstated cost objectives and specifies expected skill behavior for latent-objective discovery via epsilon-constraint Pareto tracing.

Changes

Latent Objective Eval Case

Layer / File(s)	Summary
New latent-objective evaluation case `skills/cuopt-multi-objective-exploration/evals/evals.json`	Adds `multiobj-explore-eval-005-latent-objective` with a question describing a production planning scenario where cost data is present but not in the stated objective. Specifies expected outputs: epsilon-constraint Pareto frontier tracing, exchange-rate estimation via adjacent-point differencing (MILP/no duals), interpretable operating points with knee flagging, and exclusion of single-plan collapse or self-chosen weighted-sum behaviors.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

NVIDIA/cuopt#1406: Updates cuopt-multi-objective-exploration evaluation expectations and documentation on exchange-rate derivation (dual when available, otherwise differencing), directly matching the method specified in the new eval case.

Suggested labels

non-breaking, improvement

Suggested reviewers

mlubin
rgsl888prabhu
tmckayus

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main change: adding a new evaluation case for latent-objective recognition to the multi-objective exploration skill.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description clearly explains the new evaluation case, its purpose, and how it complements existing evaluations by testing latent objective recognition.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cafzal · 2026-06-18T17:40:48Z

@rgsl888prabhu small skill eval – adds the latent-objective case (the under-trigger boundary the other four miss) to cuopt-multi-objective-exploration, mirroring the existing four. Ready when you/Miles have a cycle.

Add latent-objective recognition eval to multi-objective skill

8fc1a5d

Signed-off-by: cafzal <cameron.afzal@gmail.com>

eval-005: defer per-solve mechanics to api/formulation skills (match …

8275bf7

…002/004 house style) Signed-off-by: cafzal <cameron.afzal@gmail.com>

cafzal marked this pull request as ready for review June 18, 2026 17:26

cafzal requested a review from a team as a code owner June 18, 2026 17:26

cafzal requested a review from tmckayus June 18, 2026 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add latent-objective recognition eval to multi-objective skill#1442

Add latent-objective recognition eval to multi-objective skill#1442
cafzal wants to merge 2 commits into
NVIDIA:mainfrom
cafzal:multiobj-latent-objective-eval

cafzal commented Jun 18, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

cafzal commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cafzal commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

cafzal commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cafzal commented Jun 18, 2026 •

edited

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading