Skip to content

Add LAQ (Learnable Amax Quantization) algorithm#1247

Open
realAsma wants to merge 1 commit intoasma/new-qat-2from
asma/laq-algorithm
Open

Add LAQ (Learnable Amax Quantization) algorithm#1247
realAsma wants to merge 1 commit intoasma/new-qat-2from
asma/laq-algorithm

Conversation

@realAsma
Copy link
Copy Markdown
Contributor

Summary

Add LAQ (Learnable Amax Quantization), a QAT algorithm that learns separate pre-quantization and post-dequantization amax values during training. Forward pass: w_q = Q_STE(w / s_pre) * s_post where s = amax / Q_max.

Key options:

  • learnable_amax: controls which amax parameters are trainable — ["pre", "post"] (both), "post" (post-only, default), "pre" (pre-only), or [] (frozen)
  • tied_amax: when True, pre and post share a single tensor (requires both to have the same learnable state)
  • scale_algorithm: optional initial scale calibration (mse, local_hessian, or max) before learning begins

Test plan

  • Run unit tests: pytest tests/unit/torch/quantization/test_laq.py
  • Run recipe tests: pytest tests/unit/recipe/test_laq_recipes.py
  • Run GPU tests: pytest tests/gpu/torch/quantization/test_laq_cuda.py
  • Verify LAQ with llm_qat example end-to-end

🤖 Generated with Claude Code

@realAsma realAsma requested review from a team as code owners April 13, 2026 16:40
@realAsma realAsma requested review from AAnoosheh, cjluo-nv, h-guo18, meenchen, mxinO and shengliangxu and removed request for a team April 13, 2026 16:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • release/.*
  • feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: dd7abecf-0ff5-4179-b38e-0654f61ee792

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch asma/laq-algorithm

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 75.96567% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.64%. Comparing base (f246115) to head (b159527).

Files with missing lines Patch % Lines
modelopt/torch/quantization/model_calib.py 77.46% 16 Missing ⚠️
modelopt/torch/quantization/tensor_quant.py 48.38% 16 Missing ⚠️
.../torch/quantization/nn/modules/tensor_quantizer.py 82.55% 15 Missing ⚠️
modelopt/torch/quantization/triton/fp4_kernel.py 65.21% 8 Missing ⚠️
modelopt/torch/quantization/config.py 92.85% 1 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           asma/new-qat-2    #1247      +/-   ##
==================================================
- Coverage           75.64%   75.64%   -0.01%     
==================================================
  Files                 462      462              
  Lines               50116    50340     +224     
==================================================
+ Hits                37912    38081     +169     
- Misses              12204    12259      +55     
Flag Coverage Δ
examples 41.68% <28.75%> (-0.07%) ⬇️
gpu 58.48% <75.96%> (+0.11%) ⬆️
regression 14.95% <28.32%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@realAsma realAsma force-pushed the asma/laq-algorithm branch from 16832fb to 8866b80 Compare April 16, 2026 19:32
@realAsma realAsma force-pushed the asma/new-qat-2 branch 2 times, most recently from be7d8e2 to f246115 Compare April 19, 2026 12:23
Also clean up llm_qat example configs and fix pad_token_id handling.

Preserve weight dtype for LAQ amax and per-tensor scales:
- StaticBlockScaleQuantizer.enable_laq no longer forces float32 on
  _amax_pre, _amax_post, and _per_tensor_scale buffers/parameters;
  they now inherit the dtype of the passed tensors.
- laq() calibration casts amax and per_tensor_scale to the weight
  dtype before calling enable_laq so the quantizer matches module
  precision (bf16/fp16) instead of silently upcasting to fp32.

Signed-off-by: realAsma <akuriparambi@nvidia.com>
@realAsma realAsma force-pushed the asma/laq-algorithm branch from 2154693 to b159527 Compare April 19, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant