Add qwen3 moe experts only test by cjluo-nv · Pull Request #1274 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-04-16T06:52:53Z

Summary

Add unit test for Qwen3 MoE HF export with NVFP4_EXPERTS_ONLY_CFG quantization config
Verifies that hf_quant_config.json correctly reports quant_algo: NVFP4 and that non-expert modules (self_attn, lm_head) appear in exclude_modules while routed expert layers (mlp.experts.*) do not
Reference: https://huggingface.co/nvidia/Qwen3.5-397B-A17B-NVFP4/blob/main/hf_quant_config.json

Type of change: New tests

Known issue

On transformers>=5.0, fused MoE experts (_QuantFusedExperts) are not recognized by get_quant_config, causing quant_algo=None in the exported config. This test currently fails on transformers 5.x and is intended to be fixed by a follow-up change.

Testing

transformers 4.57.6: PASSED
transformers 5.5.4: FAILED (quant_algo is None due to fused expert export gap)

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A

Summary by CodeRabbit

Tests
- Added GPU test coverage for exporting Qwen3 Mixture-of-Experts models with NVFP4 quantization.
- Verifies the exported checkpoint records the NVFP4 quantization algorithm and that module exclusion patterns correctly exclude attention and LM head components while not excluding routed expert paths.

coderabbitai · 2026-04-16T06:53:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 43e2a44f-c5c8-4ba3-9b53-74f68f105e61

📥 Commits

Reviewing files that changed from the base of the PR and between 78046c4 and 65a8889.

📒 Files selected for processing (1)

tests/gpu/torch/export/test_export.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/gpu/torch/export/test_export.py

📝 Walkthrough

Walkthrough

Adds a GPU test that builds a tiny Qwen3 MoE model on CUDA, quantizes it with NVFP4_EXPERTS_ONLY_CFG, exports an HF checkpoint, and verifies exported hf_quant_config.json excludes non-expert module patterns while not excluding expert module paths.

Changes

Cohort / File(s)	Summary
Test Coverage for Qwen3 MoE NVFP4 Export `tests/gpu/torch/export/test_export.py`	Added `json` and `fnmatch` imports, extended test imports (`get_tiny_qwen3_moe`, `export_hf_checkpoint`, `NVFP4_EXPERTS_ONLY_CFG`), and introduced `test_qwen3_moe_nvfp4_experts_only_export_exclude_modules` which builds a tiny Qwen3 MoE on CUDA, quantizes with NVFP4_EXPERTS_ONLY_CFG, exports an HF checkpoint, reads `hf_quant_config.json`, and asserts exclusion patterns for non-expert modules while ensuring routed expert module paths are not excluded.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add qwen3 moe experts only test' accurately and concisely describes the main change: adding a new test for Qwen3 MoE with NVFP4 experts-only configuration.
Security Anti-Patterns	✅ Passed	PR modifies only test file exempted by SECURITY.md and contains no security anti-patterns.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chenjiel/add_qwen_moe_test

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 271-285: The current substring checks on exclude_modules are
unsafe for glob-like patterns; update the test to detect glob patterns that
would match routed experts by treating each entry as a glob (e.g., use
fnmatch/fnmatchcase or equivalent) and assert there is no pattern that matches
"*mlp.experts*" while not matching "*shared*". Concretely, replace the
substring-based loop over exclude_modules with a glob-aware check that fails if
any pattern would match routed expert paths (pattern matches "*mlp.experts*" and
does not match "*shared*"), referencing the exclude_modules variable used in the
test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 882e6f45-1a6b-4b82-b938-f2a571d15985

📥 Commits

Reviewing files that changed from the base of the PR and between d45219b and 17fd111.

📒 Files selected for processing (1)

tests/unit/torch/quantization/plugins/test_huggingface.py

github-actions · 2026-04-16T06:59:52Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1274/
Built to branch `gh-pages` at 2026-04-17 16:20 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 267-269: The test unconditionally asserts
quant_section["quant_algo"] == "NVFP4" but in transformers>=5.0
get_quant_config() cannot detect per-expert quantizers (see hugggingface.py
comment), so gate this by checking transformers.__version__ (or parsing with
packaging.version.parse) and call pytest.xfail(...) with a clear reason about
the transformers>=5.0 fused expert gap before the assertion; place this check
just after quant_section = hf_quant_config["quantization"] (referencing
quant_section, hf_quant_config and the get_quant_config() behavior) so
environments with transformers 5.x mark the test as expected to fail until the
exporter is updated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 2f1b7646-bd3e-4ecf-bb8f-4f90182d2baf

📥 Commits

Reviewing files that changed from the base of the PR and between 17fd111 and aba7223.

📒 Files selected for processing (1)

tests/unit/torch/quantization/plugins/test_huggingface.py

coderabbitai · 2026-04-16T07:03:20Z

+    quant_section = hf_quant_config["quantization"]
+    assert quant_section["quant_algo"] == "NVFP4"
+    exclude_modules = quant_section["exclude_modules"]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the test file content around lines 267-269 cd /tmp && head -300 tests/unit/torch/quantization/plugins/test_huggingface.py | tail -50

Repository: NVIDIA/Model-Optimizer

Length of output: 182

🏁 Script executed:

# Get exact line range for context sed -n '260,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 821

🏁 Script executed:

# Check imports in the test file to see if Version is already imported head -30 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1125

🏁 Script executed:

# Search for references to transformers version compatibility issues rg "transformers.*>=.*5" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 2352

🏁 Script executed:

# Look for any xfail/skip markers already used in similar tests rg "pytest.xfail|pytest.skip" tests/unit/torch/quantization/plugins/test_huggingface.py -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 152

🏁 Script executed:

# Search for the test function name to understand context rg "test_qwen3_moe_nvfp4_experts_only_export_exclude_modules" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 1548

🏁 Script executed:

# Look for get_quant_config and MoE expert handling in transformers >= 5.0 rg "get_quant_config" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check for any documentation about transformers >= 5.0 and quant_algo in MoE models rg "quant_algo" modelopt/torch/quantization/plugins/huggingface.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if there are any existing pytest markers or skip conditions in this test file rg "@pytest\.|pytest\.(mark|skip|xfail|parametrize)" tests/unit/torch/quantization/plugins/test_huggingface.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 309

🏁 Script executed:

# Look for any TODOs or FIXMEs related to transformers 5.0 in the quantization module rg "TODO|FIXME" modelopt/torch/quantization/plugins/huggingface.py | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 174

🏁 Script executed:

# Search more broadly for quant_algo handling, including export modules rg "quant_algo" modelopt/torch/export/ --type py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 20044

🏁 Script executed:

# Check the export_hf_checkpoint function to understand how quant_algo is set rg "export_hf_checkpoint|def export_hf_checkpoint" modelopt/torch/export/ --type py -B 2 -A 15

Repository: NVIDIA/Model-Optimizer

Length of output: 2601

🏁 Script executed:

# Look for any issue or PR description context in recent commits or docs find . -name "*.md" -type f | xargs rg "transformers.*5.*quant" 2>/dev/null | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if there are any other tests using transformers version checks for similar issues rg "Version\(transformers" tests/unit/torch/quantization/ -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 497

🏁 Script executed:

# Look for issue discussions or expected behavior documentation about Qwen3 MoE export rg "Qwen3.*MoE|fused.*expert" modelopt/ --type py -B 1 -A 1

Repository: NVIDIA/Model-Optimizer

Length of output: 18666

🏁 Script executed:

# Check if there are any pytest markers or fixtures that handle transformers version-specific behavior rg "Version.*transformers" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 385

🏁 Script executed:

# Look for any issue tracking or discussion about Qwen3 MoE export with transformers 5.0 rg "quant_algo.*None|None.*quant_algo" modelopt/torch/ --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 485

🏁 Script executed:

# Check the quantization exporter to see if it explicitly handles quant_algo export for fused experts rg "get_quant_config" modelopt/torch/ --type py -B 2 -A 8

Repository: NVIDIA/Model-Optimizer

Length of output: 3824

🏁 Script executed:

# Search for how get_quant_config handles fused experts and their quantization sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4053

🏁 Script executed:

# Check if there's any special handling for fused experts in the config generation rg "fused.*expert|_QuantFusedExperts" modelopt/torch/export/quant_utils.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Look for per-layer quantization extraction to understand how quant_algo is determined rg "get_per_layer_quant_config|_get_per_layer_quant_config" modelopt/torch/export/ --type py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Continue reading the get_quant_config function to see how it determines the final quant_algo sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | tail -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4113

🏁 Script executed:

# Look for any comments or documentation about transformers 5.0 fused experts and quant_algo rg -i "fused.*expert|transformers.*5\.0" modelopt/torch/export/quant_utils.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check the PR or recent commits to understand the issue context better git log --oneline -20 tests/unit/torch/quantization/plugins/test_huggingface.py 2>/dev/null | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 105

🏁 Script executed:

# Look for any documentation or issue references in the test file itself head -50 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1838

🏁 Script executed:

# Check the recent commit to see if there are any related discussions or fixes git show aba7223 --stat 2>/dev/null | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 1638

🏁 Script executed:

# Look for any skip/xfail markers already applied to this specific test sed -n '230,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2007

🏁 Script executed:

# Check if there are any environment-specific or version-specific skips in the file rg "@pytest.mark.skip|@pytest.mark.xfail" tests/unit/torch/quantization/plugins/test_huggingface.py -B 2

Repository: NVIDIA/Model-Optimizer

Length of output: 88

🏁 Script executed:

# Look for any issue discussions or TODOs about transformers 5.0 and quant_algo rg "TODO|FIXME|HACK|XXX" modelopt/torch/export/quant_utils.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if there's a specific issue or comment about fused experts and quant_algo export in transformers 5.0 rg "fused.*expert|transformers.*5" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 3 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 2116

🏁 Script executed:

# Look for process_layer_quant_config to understand how quant_algo is determined from layer configs rg "process_layer_quant_config|def process_layer_quant_config" modelopt/torch/export/ --type py -B 2 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 9406

🏁 Script executed:

# Verify the patch would be syntactically correct by checking imports and pytest.xfail usage python3 << 'EOF' import sys # Check pytest.xfail signature try: import pytest sig = str(pytest.xfail.__doc__) print("pytest.xfail is available and callable") # Check if it accepts reason parameter if "reason" in sig: print("✓ pytest.xfail accepts 'reason' parameter") except Exception as e: print(f"Error: {e}") EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 146

🏁 Script executed:

# Verify the test would actually fail with transformers >= 5.0 by checking if there's documented behavior rg "known.*issue|known.*failure|bug|regression" modelopt/torch/export/ --type py | grep -i "fused\|moe\|transformers\|5\.0" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 188

🏁 Script executed:

# Check if there are any existing Qwen3 MoE tests that pass and how they handle this rg "qwen3.*moe|get_tiny_qwen3_moe" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 20 | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 1476

Gate the quant_algo assertion for transformers>=5.0 to avoid deterministic test failures.

Line 268 will fail unconditionally in environments with transformers 5.x due to the documented structural change in fused expert quantizer detection. The comment at modelopt/torch/quantization/plugins/huggingface.py ("transformers>=5.0 has batched experts, no per-expert quantizers") confirms that get_quant_config() cannot discover quantizers in the new fused expert format, leaving quant_algo as None. Add pytest.xfail() with reason explaining the transformers >= 5.0 gap until the exporter is updated.

Proposed patch

def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path): """...""" quant_section = hf_quant_config["quantization"] + if Version(transformers.__version__) >= Version("5.0"): + pytest.xfail( + "Known issue: transformers>=5.0 fused MoE experts are not recognized by " + "get_quant_config, so quant_algo is exported as None." + ) assert quant_section["quant_algo"] == "NVFP4"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 267 - 269, The test unconditionally asserts quant_section["quant_algo"] == "NVFP4" but in transformers>=5.0 get_quant_config() cannot detect per-expert quantizers (see hugggingface.py comment), so gate this by checking transformers.__version__ (or parsing with packaging.version.parse) and call pytest.xfail(...) with a clear reason about the transformers>=5.0 fused expert gap before the assertion; place this check just after quant_section = hf_quant_config["quantization"] (referencing quant_section, hf_quant_config and the get_quant_config() behavior) so environments with transformers 5.x mark the test as expected to fail until the exporter is updated.

kevalmorabia97 · 2026-04-16T07:07:13Z

Failing unit test: https://github.com/NVIDIA/Model-Optimizer/actions/runs/24496545237/job/71592788031?pr=1274#step:4:1042

…l/add_qwen_moe_test

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/gpu/torch/export/test_export.py`:
- Around line 503-505: The assertion that quant_section["quant_algo"] == "NVFP4"
fails on transformers ≥5.0; add a pytest skip guard using the existing
TRANSFORMERS_VERSION_GE_5_0 flag to skip the test when running under
transformers 5.x. Locate the test that contains the code referencing
hf_quant_config and quant_section (in tests/gpu/torch/export/test_export.py) and
decorate that test function with
`@pytest.mark.skipif`(TRANSFORMERS_VERSION_GE_5_0, reason="MoE expert fusion
change in transformers>=5.0 causes deterministic failure"), following the same
pattern used in
tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 19479a84-8ae4-402f-bf75-e547a33eab42

📥 Commits

Reviewing files that changed from the base of the PR and between aba7223 and 78046c4.

📒 Files selected for processing (1)

tests/gpu/torch/export/test_export.py

coderabbitai · 2026-04-16T23:32:13Z

+    quant_section = hf_quant_config["quantization"]
+    assert quant_section["quant_algo"] == "NVFP4"
+    exclude_modules = quant_section["exclude_modules"]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Dependency specs mentioning transformers ==" rg -n --iglob 'pyproject.toml' --iglob 'setup.py' --iglob 'setup.cfg' --iglob '*requirements*.txt' --iglob '*constraints*.txt' '\btransformers\b' echo echo "== Existing version guards/xfail patterns in tests ==" rg -n -C2 --iglob 'tests/**/*.py' 'xfail|skipif|transformers\.__version__|version\.parse\('

Repository: NVIDIA/Model-Optimizer

Length of output: 24668

🏁 Script executed:

# Find and examine the test_export.py file fd -type f "test_export.py" --path "*/gpu/torch/export/*"

Repository: NVIDIA/Model-Optimizer

Length of output: 236

🏁 Script executed:

# Check the specific lines mentioned in the review fd -type f "test_export.py" --path "*/tests/*" --exec wc -l {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 236

🏁 Script executed:

# Look for TRANSFORMERS_VERSION_GE_5_0 definition and usage rg -n "TRANSFORMERS_VERSION_GE_5_0" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 1080

🏁 Script executed:

# Read the test_export.py file around lines 477-521 fd -type f "test_export.py" --path "*/tests/*" --exec head -n 530 {} \; | tail -n 60

Repository: NVIDIA/Model-Optimizer

Length of output: 236

🏁 Script executed:

# Find test_export.py in the gpu torch export path find tests -name "test_export.py" -path "*/gpu/torch/export/*"

Repository: NVIDIA/Model-Optimizer

Length of output: 103

🏁 Script executed:

# Read the specific file and lines find tests -name "test_export.py" -path "*/gpu/torch/export/*" -exec head -n 530 {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 17974

🏁 Script executed:

# Check where TRANSFORMERS_VERSION_GE_5_0 is defined rg -n "TRANSFORMERS_VERSION_GE_5_0\s*=" --context 3

Repository: NVIDIA/Model-Optimizer

Length of output: 627

🏁 Script executed:

# Check imports at the top of test_export.py head -n 80 tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2567

🏁 Script executed:

# Check if test_export.py uses TRANSFORMERS_VERSION_GE_5_0 anywhere rg -n "TRANSFORMERS_VERSION_GE_5_0" tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check for any xfail/skipif in test_export.py rg -n "xfail|skipif" tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Get the exact line numbers and context of the test function sed -n '477,521p' tests/gpu/torch/export/test_export.py | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 2387

🏁 Script executed:

# Check pyproject.toml to see the exact transformers constraint sed -n '80,90p' pyproject.toml

Repository: NVIDIA/Model-Optimizer

Length of output: 569

🏁 Script executed:

# Look at how TRANSFORMERS_VERSION_GE_5_0 is used in the similar test file sed -n '30,40p' tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py

Repository: NVIDIA/Model-Optimizer

Length of output: 413

Add version guard to prevent deterministic test failure in transformers 5.x.

Line 504 unconditionally asserts quant_algo == "NVFP4", but this fails in transformers ≥5.0 due to how MoE expert fusion is handled (confirmed by PR context). The repository pins transformers>=4.56 in pyproject.toml, so v5.x environments are encountered in CI. Align with the established pattern in test_sparse_sequential_moe.py by using @pytest.mark.skipif with the existing TRANSFORMERS_VERSION_GE_5_0 flag.

Suggested approach

+from modelopt.torch.quantization.plugins.huggingface import ( + TRANSFORMERS_VERSION_GE_5_0, +) +@pytest.mark.skipif(TRANSFORMERS_VERSION_GE_5_0, reason="Transformers v5 does not recognize fused MoE experts in get_quant_config; quant_algo may be None") def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):

This matches the skip pattern already used in tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py (lines 178, 306) for similar MoE-related transformers version constraints.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/export/test_export.py` around lines 503 - 505, The assertion that quant_section["quant_algo"] == "NVFP4" fails on transformers ≥5.0; add a pytest skip guard using the existing TRANSFORMERS_VERSION_GE_5_0 flag to skip the test when running under transformers 5.x. Locate the test that contains the code referencing hf_quant_config and quant_section (in tests/gpu/torch/export/test_export.py) and decorate that test function with `@pytest.mark.skipif`(TRANSFORMERS_VERSION_GE_5_0, reason="MoE expert fusion change in transformers>=5.0 causes deterministic failure"), following the same pattern used in tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py.

codecov · 2026-04-16T23:40:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.42%. Comparing base (4e33368) to head (4d522da).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1274      +/-   ##
==========================================
- Coverage   72.74%   66.42%   -6.32%     
==========================================
  Files         459      459              
  Lines       48611    48611              
==========================================
- Hits        35361    32289    -3072     
- Misses      13250    16322    +3072

Flag	Coverage Δ
gpu	`28.26% <ø> (-23.94%)`	⬇️
unit	`52.22% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

meenchen

This is a clean, well-written test PR that adds a single focused GPU test for NVFP4_EXPERTS_ONLY_CFG with a Qwen3 MoE model HF export. The test:

Is correct: Creates a tiny Qwen3 MoE model, quantizes with NVFP4_EXPERTS_ONLY_CFG, exports via export_hf_checkpoint, then verifies the hf_quant_config.json content - specifically checking quant_algo == "NVFP4", that attention layers and lm_head are in exclude_modules, and that routed experts are NOT excluded.
Is focused and small: Only 54 lines added to a single file, with meaningful semantic assertions (not just checking file existence).
Uses existing utilities properly: Leverages existing get_tiny_qwen3_moe helper, existing NVFP4_EXPERTS_ONLY_CFG config, and existing export_hf_checkpoint function. No code duplication.
Is well-documented: Clear docstring explaining what's being tested and a reference to a real production config. The PR description honestly documents the known transformers 5.x incompatibility.
No new tests are duplicating existing ones: Checked and confirmed no existing tests cover this NVFP4_EXPERTS_ONLY_CFG + Qwen3 MoE export combination.

Minor note: The PR description states this test fails on transformers 5.x, but that's a known issue to be fixed in a follow-up. This is appropriate as a regression test for transformers 4.x behavior.

Add qwen3 moe experts only test

aba7223

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv force-pushed the chenjiel/add_qwen_moe_test branch from 17fd111 to aba7223 Compare April 16, 2026 06:55

cjluo-nv requested review from Edwardf0t1 and kevalmorabia97 April 16, 2026 06:56

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py Outdated

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

kevalmorabia97 reviewed Apr 16, 2026

View reviewed changes

Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py Outdated

kevalmorabia97 reviewed Apr 16, 2026

View reviewed changes

Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py Outdated

cjluo-nv added 2 commits April 16, 2026 23:20

Merge branch 'main' of github.com:NVIDIA/Model-Optimizer into chenjie…

d2e3d22

…l/add_qwen_moe_test

Fix comments

78046c4

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Fix test

65a8889

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

kevalmorabia97 approved these changes Apr 17, 2026

View reviewed changes

Merge branch 'main' into chenjiel/add_qwen_moe_test

4d522da

cjluo-nv enabled auto-merge (squash) April 17, 2026 16:18

meenchen approved these changes Apr 17, 2026

View reviewed changes

Conversation

cjluo-nv commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Known issue

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-17 16:20 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevalmorabia97 commented Apr 16, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cjluo-nv commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

github-actions bot commented Apr 16, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-17 16:20 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov bot commented Apr 16, 2026 •

edited

Loading