Update excluded modules for Qwen3.5 dense PTQ#1284
Update excluded modules for Qwen3.5 dense PTQ#1284amukkara wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughTwo new configuration entries were added to the default disabled quantizer configuration to prevent quantization of linear attention projection layer variants. These entries target quantizer names matching specific patterns for Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)
231-232: Add a focused regression test for these new exclusion patterns.Line 231-232 update global default exclusions; please add a test that verifies quantizers matching
*linear_attn.in_proj_a*and*linear_attn.in_proj_b*are disabled after config application. This helps lock in the intended Qwen3.5 PTQ behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 231 - 232, Add a unit test that verifies the new exclusion patterns "*linear_attn.in_proj_a*" and "*linear_attn.in_proj_b*" actually disable matching quantizers: import the exclusion patterns from modelopt.torch.quantization.config (e.g., DEFAULT_EXCLUSIONS or the global exclusions variable), create mock quantizer names like "encoder.linear_attn.in_proj_a.weight" and "decoder.linear_attn.in_proj_b.bias" and then use the module's exclusion-matching helper (e.g., matches_exclusion, is_excluded, or the function that decides quantizer enablement) to assert those names are considered excluded/disabled after applying the config; fail the test if any of those quantizers remain enabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 231-232: Add a unit test that verifies the new exclusion patterns
"*linear_attn.in_proj_a*" and "*linear_attn.in_proj_b*" actually disable
matching quantizers: import the exclusion patterns from
modelopt.torch.quantization.config (e.g., DEFAULT_EXCLUSIONS or the global
exclusions variable), create mock quantizer names like
"encoder.linear_attn.in_proj_a.weight" and "decoder.linear_attn.in_proj_b.bias"
and then use the module's exclusion-matching helper (e.g., matches_exclusion,
is_excluded, or the function that decides quantizer enablement) to assert those
names are considered excluded/disabled after applying the config; fail the test
if any of those quantizers remain enabled.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 3344bc77-0606-4684-a620-e45bc3886169
📒 Files selected for processing (1)
modelopt/torch/quantization/config.py
What does this PR do?
Type of change: Bug fix
For Qwen3.5 dense models, in_proj modules in linear attention are to be left unquantized.
Example in Qwen3.5-27B-FP8: https://huggingface.co/Qwen/Qwen3.5-27B-FP8/blob/main/config.json#L148
This PR updates
_default_disabled_quantizer_configso that all Qwen3.5 dense models are quantized with the same exclusion pattern.Usage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅Additional Information
Summary by CodeRabbit