Skip to content

Add latent-objective recognition eval to multi-objective skill#1442

Open
cafzal wants to merge 2 commits into
NVIDIA:mainfrom
cafzal:multiobj-latent-objective-eval
Open

Add latent-objective recognition eval to multi-objective skill#1442
cafzal wants to merge 2 commits into
NVIDIA:mainfrom
cafzal:multiobj-latent-objective-eval

Conversation

@cafzal

@cafzal cafzal commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a fifth eval to the cuopt-multi-objective-exploration skill — multiobj-explore-eval-005-latent-objective — covering the boundary the existing four don't: a problem stated with a single objective while a second objective sits latent in the data, unstated.

The current evals all hand the agent both objectives (001 interpret, 002 explore, 004 dual-as-slope) or are explicitly single-objective (003 decoy). None test recognizing a latent objective. This one grades whether the skill makes the agent surface the latent cost objective and trace the supply-vs-cost frontier — rather than optimizing the stated objective alone or silently folding cost into a self-chosen weighted blend (maximize supply − λ·cost). It brackets the skill's activation boundary opposite the 003 decoy.

Behavioral eval (expected_script: null, LLM-graded on the behavior list), same house style as 001/002/004; validate_skills.sh picks up the new array entry and the signature / BENCHMARK.md / skill-card regenerate via NVSkills-Eval. The latent-objective shape is the max-supply supply-vs-cost case validated on cuOpt (Tesla T4) in NVIDIA/cuopt-examples#157.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Signed-off-by: cafzal <cameron.afzal@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…002/004 house style)

Signed-off-by: cafzal <cameron.afzal@gmail.com>
@cafzal cafzal marked this pull request as ready for review June 18, 2026 17:26
@cafzal cafzal requested a review from a team as a code owner June 18, 2026 17:26
@cafzal cafzal requested a review from tmckayus June 18, 2026 17:26
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

A single new evaluation case, multiobj-explore-eval-005-latent-objective, is appended to the evals array in skills/cuopt-multi-objective-exploration/evals/evals.json. The case defines a multi-period production planning problem with unstated cost objectives and specifies expected skill behavior for latent-objective discovery via epsilon-constraint Pareto tracing.

Changes

Latent Objective Eval Case

Layer / File(s) Summary
New latent-objective evaluation case
skills/cuopt-multi-objective-exploration/evals/evals.json
Adds multiobj-explore-eval-005-latent-objective with a question describing a production planning scenario where cost data is present but not in the stated objective. Specifies expected outputs: epsilon-constraint Pareto frontier tracing, exchange-rate estimation via adjacent-point differencing (MILP/no duals), interpretable operating points with knee flagging, and exclusion of single-plan collapse or self-chosen weighted-sum behaviors.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • NVIDIA/cuopt#1406: Updates cuopt-multi-objective-exploration evaluation expectations and documentation on exchange-rate derivation (dual when available, otherwise differencing), directly matching the method specified in the new eval case.

Suggested labels

non-breaking, improvement

Suggested reviewers

  • mlubin
  • rgsl888prabhu
  • tmckayus
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: adding a new evaluation case for latent-objective recognition to the multi-objective exploration skill.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the new evaluation case, its purpose, and how it complements existing evaluations by testing latent objective recognition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@cafzal

cafzal commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@rgsl888prabhu small skill eval – adds the latent-objective case (the under-trigger boundary the other four miss) to cuopt-multi-objective-exploration, mirroring the existing four. Ready when you/Miles have a cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant