Skip to content

Add qwen3 moe experts only test#1274

Open
cjluo-nv wants to merge 5 commits intomainfrom
chenjiel/add_qwen_moe_test
Open

Add qwen3 moe experts only test#1274
cjluo-nv wants to merge 5 commits intomainfrom
chenjiel/add_qwen_moe_test

Conversation

@cjluo-nv
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv commented Apr 16, 2026

Summary

Type of change: New tests

Known issue

On transformers>=5.0, fused MoE experts (_QuantFusedExperts) are not recognized by get_quant_config, causing quant_algo=None in the exported config. This test currently fails on transformers 5.x and is intended to be fixed by a follow-up change.

Testing

  • transformers 4.57.6: PASSED
  • transformers 5.5.4: FAILED (quant_algo is None due to fused expert export gap)

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A

Summary by CodeRabbit

  • Tests
    • Added GPU test coverage for exporting Qwen3 Mixture-of-Experts models with NVFP4 quantization.
    • Verifies the exported checkpoint records the NVFP4 quantization algorithm and that module exclusion patterns correctly exclude attention and LM head components while not excluding routed expert paths.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 43e2a44f-c5c8-4ba3-9b53-74f68f105e61

📥 Commits

Reviewing files that changed from the base of the PR and between 78046c4 and 65a8889.

📒 Files selected for processing (1)
  • tests/gpu/torch/export/test_export.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/gpu/torch/export/test_export.py

📝 Walkthrough

Walkthrough

Adds a GPU test that builds a tiny Qwen3 MoE model on CUDA, quantizes it with NVFP4_EXPERTS_ONLY_CFG, exports an HF checkpoint, and verifies exported hf_quant_config.json excludes non-expert module patterns while not excluding expert module paths.

Changes

Cohort / File(s) Summary
Test Coverage for Qwen3 MoE NVFP4 Export
tests/gpu/torch/export/test_export.py
Added json and fnmatch imports, extended test imports (get_tiny_qwen3_moe, export_hf_checkpoint, NVFP4_EXPERTS_ONLY_CFG), and introduced test_qwen3_moe_nvfp4_experts_only_export_exclude_modules which builds a tiny Qwen3 MoE on CUDA, quantizes with NVFP4_EXPERTS_ONLY_CFG, exports an HF checkpoint, reads hf_quant_config.json, and asserts exclusion patterns for non-expert modules while ensuring routed expert module paths are not excluded.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add qwen3 moe experts only test' accurately and concisely describes the main change: adding a new test for Qwen3 MoE with NVFP4 experts-only configuration.
Security Anti-Patterns ✅ Passed PR modifies only test file exempted by SECURITY.md and contains no security anti-patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chenjiel/add_qwen_moe_test

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv force-pushed the chenjiel/add_qwen_moe_test branch from 17fd111 to aba7223 Compare April 16, 2026 06:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 271-285: The current substring checks on exclude_modules are
unsafe for glob-like patterns; update the test to detect glob patterns that
would match routed experts by treating each entry as a glob (e.g., use
fnmatch/fnmatchcase or equivalent) and assert there is no pattern that matches
"*mlp.experts*" while not matching "*shared*". Concretely, replace the
substring-based loop over exclude_modules with a glob-aware check that fails if
any pattern would match routed expert paths (pattern matches "*mlp.experts*" and
does not match "*shared*"), referencing the exclude_modules variable used in the
test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 882e6f45-1a6b-4b82-b938-f2a571d15985

📥 Commits

Reviewing files that changed from the base of the PR and between d45219b and 17fd111.

📒 Files selected for processing (1)
  • tests/unit/torch/quantization/plugins/test_huggingface.py

Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1274/

Built to branch gh-pages at 2026-04-17 16:20 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 267-269: The test unconditionally asserts
quant_section["quant_algo"] == "NVFP4" but in transformers>=5.0
get_quant_config() cannot detect per-expert quantizers (see hugggingface.py
comment), so gate this by checking transformers.__version__ (or parsing with
packaging.version.parse) and call pytest.xfail(...) with a clear reason about
the transformers>=5.0 fused expert gap before the assertion; place this check
just after quant_section = hf_quant_config["quantization"] (referencing
quant_section, hf_quant_config and the get_quant_config() behavior) so
environments with transformers 5.x mark the test as expected to fail until the
exporter is updated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 2f1b7646-bd3e-4ecf-bb8f-4f90182d2baf

📥 Commits

Reviewing files that changed from the base of the PR and between 17fd111 and aba7223.

📒 Files selected for processing (1)
  • tests/unit/torch/quantization/plugins/test_huggingface.py

Comment on lines +267 to +269
quant_section = hf_quant_config["quantization"]
assert quant_section["quant_algo"] == "NVFP4"
exclude_modules = quant_section["exclude_modules"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the test file content around lines 267-269
cd /tmp && head -300 tests/unit/torch/quantization/plugins/test_huggingface.py | tail -50

Repository: NVIDIA/Model-Optimizer

Length of output: 182


🏁 Script executed:

# Get exact line range for context
sed -n '260,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 821


🏁 Script executed:

# Check imports in the test file to see if Version is already imported
head -30 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1125


🏁 Script executed:

# Search for references to transformers version compatibility issues
rg "transformers.*>=.*5" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 2352


🏁 Script executed:

# Look for any xfail/skip markers already used in similar tests
rg "pytest.xfail|pytest.skip" tests/unit/torch/quantization/plugins/test_huggingface.py -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 152


🏁 Script executed:

# Search for the test function name to understand context
rg "test_qwen3_moe_nvfp4_experts_only_export_exclude_modules" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 1548


🏁 Script executed:

# Look for get_quant_config and MoE expert handling in transformers >= 5.0
rg "get_quant_config" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check for any documentation about transformers >= 5.0 and quant_algo in MoE models
rg "quant_algo" modelopt/torch/quantization/plugins/huggingface.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if there are any existing pytest markers or skip conditions in this test file
rg "@pytest\.|pytest\.(mark|skip|xfail|parametrize)" tests/unit/torch/quantization/plugins/test_huggingface.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 309


🏁 Script executed:

# Look for any TODOs or FIXMEs related to transformers 5.0 in the quantization module
rg "TODO|FIXME" modelopt/torch/quantization/plugins/huggingface.py | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 174


🏁 Script executed:

# Search more broadly for quant_algo handling, including export modules
rg "quant_algo" modelopt/torch/export/ --type py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 20044


🏁 Script executed:

# Check the export_hf_checkpoint function to understand how quant_algo is set
rg "export_hf_checkpoint|def export_hf_checkpoint" modelopt/torch/export/ --type py -B 2 -A 15

Repository: NVIDIA/Model-Optimizer

Length of output: 2601


🏁 Script executed:

# Look for any issue or PR description context in recent commits or docs
find . -name "*.md" -type f | xargs rg "transformers.*5.*quant" 2>/dev/null | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if there are any other tests using transformers version checks for similar issues
rg "Version\(transformers" tests/unit/torch/quantization/ -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 497


🏁 Script executed:

# Look for issue discussions or expected behavior documentation about Qwen3 MoE export
rg "Qwen3.*MoE|fused.*expert" modelopt/ --type py -B 1 -A 1

Repository: NVIDIA/Model-Optimizer

Length of output: 18666


🏁 Script executed:

# Check if there are any pytest markers or fixtures that handle transformers version-specific behavior
rg "Version.*transformers" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 385


🏁 Script executed:

# Look for any issue tracking or discussion about Qwen3 MoE export with transformers 5.0
rg "quant_algo.*None|None.*quant_algo" modelopt/torch/ --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 485


🏁 Script executed:

# Check the quantization exporter to see if it explicitly handles quant_algo export for fused experts
rg "get_quant_config" modelopt/torch/ --type py -B 2 -A 8

Repository: NVIDIA/Model-Optimizer

Length of output: 3824


🏁 Script executed:

# Search for how get_quant_config handles fused experts and their quantization
sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4053


🏁 Script executed:

# Check if there's any special handling for fused experts in the config generation
rg "fused.*expert|_QuantFusedExperts" modelopt/torch/export/quant_utils.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Look for per-layer quantization extraction to understand how quant_algo is determined
rg "get_per_layer_quant_config|_get_per_layer_quant_config" modelopt/torch/export/ --type py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Continue reading the get_quant_config function to see how it determines the final quant_algo
sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | tail -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4113


🏁 Script executed:

# Look for any comments or documentation about transformers 5.0 fused experts and quant_algo
rg -i "fused.*expert|transformers.*5\.0" modelopt/torch/export/quant_utils.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check the PR or recent commits to understand the issue context better
git log --oneline -20 tests/unit/torch/quantization/plugins/test_huggingface.py 2>/dev/null | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 105


🏁 Script executed:

# Look for any documentation or issue references in the test file itself
head -50 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1838


🏁 Script executed:

# Check the recent commit to see if there are any related discussions or fixes
git show aba7223 --stat 2>/dev/null | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 1638


🏁 Script executed:

# Look for any skip/xfail markers already applied to this specific test
sed -n '230,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2007


🏁 Script executed:

# Check if there are any environment-specific or version-specific skips in the file
rg "@pytest.mark.skip|@pytest.mark.xfail" tests/unit/torch/quantization/plugins/test_huggingface.py -B 2

Repository: NVIDIA/Model-Optimizer

Length of output: 88


🏁 Script executed:

# Look for any issue discussions or TODOs about transformers 5.0 and quant_algo
rg "TODO|FIXME|HACK|XXX" modelopt/torch/export/quant_utils.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if there's a specific issue or comment about fused experts and quant_algo export in transformers 5.0
rg "fused.*expert|transformers.*5" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 3 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 2116


🏁 Script executed:

# Look for process_layer_quant_config to understand how quant_algo is determined from layer configs
rg "process_layer_quant_config|def process_layer_quant_config" modelopt/torch/export/ --type py -B 2 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 9406


🏁 Script executed:

# Verify the patch would be syntactically correct by checking imports and pytest.xfail usage
python3 << 'EOF'
import sys
# Check pytest.xfail signature
try:
    import pytest
    sig = str(pytest.xfail.__doc__)
    print("pytest.xfail is available and callable")
    # Check if it accepts reason parameter
    if "reason" in sig:
        print("✓ pytest.xfail accepts 'reason' parameter")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 146


🏁 Script executed:

# Verify the test would actually fail with transformers >= 5.0 by checking if there's documented behavior
rg "known.*issue|known.*failure|bug|regression" modelopt/torch/export/ --type py | grep -i "fused\|moe\|transformers\|5\.0" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 188


🏁 Script executed:

# Check if there are any existing Qwen3 MoE tests that pass and how they handle this
rg "qwen3.*moe|get_tiny_qwen3_moe" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 20 | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 1476


Gate the quant_algo assertion for transformers>=5.0 to avoid deterministic test failures.

Line 268 will fail unconditionally in environments with transformers 5.x due to the documented structural change in fused expert quantizer detection. The comment at modelopt/torch/quantization/plugins/huggingface.py ("transformers>=5.0 has batched experts, no per-expert quantizers") confirms that get_quant_config() cannot discover quantizers in the new fused expert format, leaving quant_algo as None. Add pytest.xfail() with reason explaining the transformers >= 5.0 gap until the exporter is updated.

Proposed patch
 def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):
     """..."""
     quant_section = hf_quant_config["quantization"]
+    if Version(transformers.__version__) >= Version("5.0"):
+        pytest.xfail(
+            "Known issue: transformers>=5.0 fused MoE experts are not recognized by "
+            "get_quant_config, so quant_algo is exported as None."
+        )
     assert quant_section["quant_algo"] == "NVFP4"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 267 -
269, The test unconditionally asserts quant_section["quant_algo"] == "NVFP4" but
in transformers>=5.0 get_quant_config() cannot detect per-expert quantizers (see
hugggingface.py comment), so gate this by checking transformers.__version__ (or
parsing with packaging.version.parse) and call pytest.xfail(...) with a clear
reason about the transformers>=5.0 fused expert gap before the assertion; place
this check just after quant_section = hf_quant_config["quantization"]
(referencing quant_section, hf_quant_config and the get_quant_config() behavior)
so environments with transformers 5.x mark the test as expected to fail until
the exporter is updated.

Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py Outdated
Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py Outdated
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/gpu/torch/export/test_export.py`:
- Around line 503-505: The assertion that quant_section["quant_algo"] == "NVFP4"
fails on transformers ≥5.0; add a pytest skip guard using the existing
TRANSFORMERS_VERSION_GE_5_0 flag to skip the test when running under
transformers 5.x. Locate the test that contains the code referencing
hf_quant_config and quant_section (in tests/gpu/torch/export/test_export.py) and
decorate that test function with
`@pytest.mark.skipif`(TRANSFORMERS_VERSION_GE_5_0, reason="MoE expert fusion
change in transformers>=5.0 causes deterministic failure"), following the same
pattern used in
tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 19479a84-8ae4-402f-bf75-e547a33eab42

📥 Commits

Reviewing files that changed from the base of the PR and between aba7223 and 78046c4.

📒 Files selected for processing (1)
  • tests/gpu/torch/export/test_export.py

Comment on lines +503 to +505
quant_section = hf_quant_config["quantization"]
assert quant_section["quant_algo"] == "NVFP4"
exclude_modules = quant_section["exclude_modules"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Dependency specs mentioning transformers =="
rg -n --iglob 'pyproject.toml' --iglob 'setup.py' --iglob 'setup.cfg' --iglob '*requirements*.txt' --iglob '*constraints*.txt' '\btransformers\b'

echo
echo "== Existing version guards/xfail patterns in tests =="
rg -n -C2 --iglob 'tests/**/*.py' 'xfail|skipif|transformers\.__version__|version\.parse\('

Repository: NVIDIA/Model-Optimizer

Length of output: 24668


🏁 Script executed:

# Find and examine the test_export.py file
fd -type f "test_export.py" --path "*/gpu/torch/export/*"

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Check the specific lines mentioned in the review
fd -type f "test_export.py" --path "*/tests/*" --exec wc -l {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Look for TRANSFORMERS_VERSION_GE_5_0 definition and usage
rg -n "TRANSFORMERS_VERSION_GE_5_0" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 1080


🏁 Script executed:

# Read the test_export.py file around lines 477-521
fd -type f "test_export.py" --path "*/tests/*" --exec head -n 530 {} \; | tail -n 60

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Find test_export.py in the gpu torch export path
find tests -name "test_export.py" -path "*/gpu/torch/export/*"

Repository: NVIDIA/Model-Optimizer

Length of output: 103


🏁 Script executed:

# Read the specific file and lines
find tests -name "test_export.py" -path "*/gpu/torch/export/*" -exec head -n 530 {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 17974


🏁 Script executed:

# Check where TRANSFORMERS_VERSION_GE_5_0 is defined
rg -n "TRANSFORMERS_VERSION_GE_5_0\s*=" --context 3

Repository: NVIDIA/Model-Optimizer

Length of output: 627


🏁 Script executed:

# Check imports at the top of test_export.py
head -n 80 tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2567


🏁 Script executed:

# Check if test_export.py uses TRANSFORMERS_VERSION_GE_5_0 anywhere
rg -n "TRANSFORMERS_VERSION_GE_5_0" tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check for any xfail/skipif in test_export.py
rg -n "xfail|skipif" tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Get the exact line numbers and context of the test function
sed -n '477,521p' tests/gpu/torch/export/test_export.py | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 2387


🏁 Script executed:

# Check pyproject.toml to see the exact transformers constraint
sed -n '80,90p' pyproject.toml

Repository: NVIDIA/Model-Optimizer

Length of output: 569


🏁 Script executed:

# Look at how TRANSFORMERS_VERSION_GE_5_0 is used in the similar test file
sed -n '30,40p' tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py

Repository: NVIDIA/Model-Optimizer

Length of output: 413


Add version guard to prevent deterministic test failure in transformers 5.x.

Line 504 unconditionally asserts quant_algo == "NVFP4", but this fails in transformers ≥5.0 due to how MoE expert fusion is handled (confirmed by PR context). The repository pins transformers>=4.56 in pyproject.toml, so v5.x environments are encountered in CI. Align with the established pattern in test_sparse_sequential_moe.py by using @pytest.mark.skipif with the existing TRANSFORMERS_VERSION_GE_5_0 flag.

Suggested approach
+from modelopt.torch.quantization.plugins.huggingface import (
+    TRANSFORMERS_VERSION_GE_5_0,
+)

+@pytest.mark.skipif(TRANSFORMERS_VERSION_GE_5_0, reason="Transformers v5 does not recognize fused MoE experts in get_quant_config; quant_algo may be None")
 def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):

This matches the skip pattern already used in tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py (lines 178, 306) for similar MoE-related transformers version constraints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/export/test_export.py` around lines 503 - 505, The assertion
that quant_section["quant_algo"] == "NVFP4" fails on transformers ≥5.0; add a
pytest skip guard using the existing TRANSFORMERS_VERSION_GE_5_0 flag to skip
the test when running under transformers 5.x. Locate the test that contains the
code referencing hf_quant_config and quant_section (in
tests/gpu/torch/export/test_export.py) and decorate that test function with
`@pytest.mark.skipif`(TRANSFORMERS_VERSION_GE_5_0, reason="MoE expert fusion
change in transformers>=5.0 causes deterministic failure"), following the same
pattern used in
tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.42%. Comparing base (4e33368) to head (4d522da).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1274      +/-   ##
==========================================
- Coverage   72.74%   66.42%   -6.32%     
==========================================
  Files         459      459              
  Lines       48611    48611              
==========================================
- Hits        35361    32289    -3072     
- Misses      13250    16322    +3072     
Flag Coverage Δ
gpu 28.26% <ø> (-23.94%) ⬇️
unit 52.22% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv enabled auto-merge (squash) April 17, 2026 16:18
Copy link
Copy Markdown
Contributor

@meenchen meenchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clean, well-written test PR that adds a single focused GPU test for NVFP4_EXPERTS_ONLY_CFG with a Qwen3 MoE model HF export. The test:

  1. Is correct: Creates a tiny Qwen3 MoE model, quantizes with NVFP4_EXPERTS_ONLY_CFG, exports via export_hf_checkpoint, then verifies the hf_quant_config.json content - specifically checking quant_algo == "NVFP4", that attention layers and lm_head are in exclude_modules, and that routed experts are NOT excluded.

  2. Is focused and small: Only 54 lines added to a single file, with meaningful semantic assertions (not just checking file existence).

  3. Uses existing utilities properly: Leverages existing get_tiny_qwen3_moe helper, existing NVFP4_EXPERTS_ONLY_CFG config, and existing export_hf_checkpoint function. No code duplication.

  4. Is well-documented: Clear docstring explaining what's being tested and a reference to a real production config. The PR description honestly documents the known transformers 5.x incompatibility.

  5. No new tests are duplicating existing ones: Checked and confirmed no existing tests cover this NVFP4_EXPERTS_ONLY_CFG + Qwen3 MoE export combination.

Minor note: The PR description states this test fails on transformers 5.x, but that's a known issue to be fixed in a follow-up. This is appropriate as a regression test for transformers 4.x behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants