fix: token mult prob error plot masking by 1ytic · Pull Request #2485 · NVIDIA-NeMo/RL

1ytic · 2026-05-13T15:22:56Z

What does this PR do ?

Fixes the token_mult_prob_error debug plot so it does not select fully masked samples and labels the recomputed policy logprobs accurately.

Issues

N/A

Usage

N/A. This is a logging/debug-plot fix.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

The plot previously computed the per-sequence multiplicative probability error after applying sample_mask, but still allowed rows with zero valid tokens to participate in argmax. Fully masked rows therefore produced 0 / 0 = nan, and the plot could show token_mult_prob_error=nan.

The orange logprob line was also labeled as reference policy, but the data comes from prev_logprobs, i.e. the training policy recomputation used for GRPO's behavior-policy comparison, not the frozen reference policy.

Before fix, the debug plot could select a fully masked sequence and show both symptoms:

After fix, the plot selects an unmasked sequence, reports a finite token_mult_prob_error, and labels the orange line as the training policy recompute:

Signed-off-by: Ivan Sorokin <27285181+1ytic@users.noreply.github.com>

copy-pr-bot · 2026-05-13T15:23:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Fix token mult prob error plot masking

ffd92a4

Signed-off-by: Ivan Sorokin <27285181+1ytic@users.noreply.github.com>

1ytic marked this pull request as ready for review May 14, 2026 10:06

1ytic requested review from a team as code owners May 14, 2026 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: token mult prob error plot masking#2485

fix: token mult prob error plot masking#2485
1ytic wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
1ytic:codex/fix-token-mult-prob-error-plot

1ytic commented May 13, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

1ytic commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1ytic commented May 13, 2026 •

edited

Loading