[PyTorch] Fix bug with PR 2677#2819
Conversation
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…27/TransformerEngine into fix_return_stats_max_cudnn
…eturn_stats_max_cudnn
…27/TransformerEngine into fix_return_stats_max_cudnn
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…eturn_stats_max_cudnn
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…eturn_stats_max_cudnn
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…eturn_stats_max_cudnn
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…eturn_stats_max_cudnn
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…27/TransformerEngine into fix_return_stats_max_cudnn Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
…eturn_stats_max_cudnn
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
…eturn_stats_max_cudnn
…27/TransformerEngine into fix_return_stats_max_cudnn
…eturn_stats_max_cudnn
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci pytorch L1 |
Greptile SummaryThis PR fixes a bug introduced in #2677 where padding tokens were not excluded from the Key changes:
Confidence Score: 5/5Safe to merge — the fix is logically correct, both prior review concerns are resolved, and the only remaining note is a cosmetic no-op. The No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[return_max_logit=True] --> B{qkv_format == 'thd'?}
B -- No --> G[amax over all dims → max_logit]
B -- Yes --> C{max_tensor.ndim}
C -- 4\ncuDNN ≤9.6 or sm120\nb x h x sq x 1 --> D[Build sq_idx mask from cu_seqlens_q\nmasked_fill padding positions with -inf]
C -- 3\ncuDNN >9.6 non-sm120\ntq x h x 1 --> E{cu_seqlens_q_padded\nis not None?}
E -- Yes\npad between seqs --> F[Compute actual & padded seqlens\nrepeat_interleave → valid mask\nmasked_fill padding with -inf]
E -- No\nno padding --> G
D --> G
F --> G
Reviews (3): Last reviewed commit: "fixes from feedback" | Re-trigger Greptile |
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci pytorch L1 |
| elif max_tensor.ndim == 3: | ||
| if cu_seqlens_q_padded is not None: | ||
| # For THD on newer cuDNN runtimes (non-sm120), Max is [tq, h, 1] with | ||
| # padded positions containing junk. Mask them out with -inf. |
There was a problem hiding this comment.
Maybe point out that this is for THD + pad_between_seqs=True + non-sm120 + cuDNN>9.6 situations?
| # Expand: T×3, F×1, T×3, F×1, T×2, F×2, T×7, F×1 → TTTF|TTTF|TTFF|TTTTTTTF | ||
| valid = torch.repeat_interleave(values, counts) | ||
| # Finally, replace invalid (F) positions with -inf | ||
| max_tensor = max_tensor.masked_fill(~valid.view(-1, 1, 1), float("-inf")) |
There was a problem hiding this comment.
I feel the logic (both in ndim=3 branch and ndim=4) is a bit convoluted - it would generate quite a few small kernels. I don't know if there's a more elegant way to program this, or simply zeroing out the entire Max at initialization would be more efficient performance-wise. I'm going to approve for the functionality given that this is also kind of a niche case (THD + return_max_logit=True + pad_between_seqs=True).
There was a problem hiding this comment.
Do we have control over stats/max tensors? I thought they're passed over by cuDNN.
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
* cudnn now returns Stats always and Max only with `return_max_logit=true` Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fix a typo that caused a bug Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * update doc strings Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix more docs Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fixes from the feedback Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * update cudnn-frontend to v1.19.1 Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * update the cudnn frontend Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fix a wrong omission Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix: mask out padding tokens when THD Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes from greptile feedback Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor nit Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fixes from feedback Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
#2677 didn't run L1 tests and the downstream CIs failed since there was a bug where padding tokens were not ignored while returning max_logit. This PR fixes it.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
return_max_logit=True, ignore the padding tokens correctly before calculatingmax_logitChecklist: