keep deploy cases and Eagle fixes for merge by nvSiruiW · Pull Request #1287 · NVIDIA/Model-Optimizer

nvSiruiW · 2026-04-17T07:56:51Z

Summary by CodeRabbit

Tests
- Added coverage for additional EAGLE3/Nemotron-3 deployment variants.
- Added test support for FLUX.1-dev diffusion model with BF16/FP8 quantization case.
- Expanded LLM deployment tests with Qwen3-VL, Gemma-4, and Nemotron-3 Super variants.
- Introduced new GLM and MiniMax deployment tests across multiple backends.
- Enabled previously skipped Medusa tests and added extra Eagle3 scenarios.

copy-pr-bot · 2026-04-17T07:56:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-17T07:57:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 464ef70c-9f55-4881-8db9-e7a91fac7c46

📥 Commits

Reviewing files that changed from the base of the PR and between cbaeacf and 0c70ce0.

📒 Files selected for processing (4)

tests/_test_utils/deploy_utils.py
tests/_test_utils/examples/models.py
tests/examples/diffusers/test_diffusers.py
tests/examples/llm_ptq/test_deploy.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/_test_utils/deploy_utils.py
tests/_test_utils/examples/models.py
tests/examples/llm_ptq/test_deploy.py

📝 Walkthrough

Walkthrough

Added model identifiers and constants and expanded test parametrizations: EAGLE3/Nemotron-3-Nano model IDs added to deployment logic, new FLUX_DEV_PATH constant, and multiple new/extended tests for Qwen3-VL, Gemma-4, GLM, MiniMax, Nemotron, Kimi variants, and Medusa.

Changes

Cohort / File(s)	Summary
Deployment utils & model constants `tests/_test_utils/deploy_utils.py`, `tests/_test_utils/examples/models.py`	Extended `qwen3_models` with EAGLE3/Nemotron-3-Nano 30B BF16 identifiers (affects `KvCacheConfig(enable_block_reuse=...)` check). Added `FLUX_DEV_PATH` constant for `black-forest-labs/FLUX.1-dev`.
Diffusers tests `tests/examples/diffusers/test_diffusers.py`	Imported `FLUX_DEV_PATH` and added a `DiffuserModel` parametrization for `flux-dev` (BFloat16, fp8, quant_algo "max", collect_method "default").
LLM PTQ tests `tests/examples/llm_ptq/test_deploy.py`	Added/extended parametrizations and tests: added Qwen3-VL 235B, Gemma-4 31B, two Nemotron-3 Super 120B variants, two Kimi K2/K2.5 entries, added new `test_glm` and `test_minimax` functions, and removed the unconditional skip decorator from `test_medusa`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'keep deploy cases and Eagle fixes for merge' is vague and generic, using non-descriptive terms that don't clearly convey the main changes to someone reviewing the PR history.	Consider revising the title to be more specific about the primary changes, such as 'Add Flux-Dev and GLM/MiniMax deployment tests' or 'Update EAGLE3 models and expand LLM deployment test coverage'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns	✅ Passed	Pull request contains only test file modifications adding model identifiers and test parametrizations without security-sensitive patterns.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/examples/llm_ptq/test_deploy.py (1)

249-254: Consider gating the newly added heavy deploy matrix behind an opt-in selector.

These additions substantially increase tests/examples load (many TP=8, multi-backend cases). Keeping a default smoke subset and running the full matrix only when requested would keep turnaround manageable.

As per coding guidelines, "Integration tests in tests/examples/ should not take more than a few minutes to run".

Also applies to: 304-324, 377-383, 489-500, 655-668
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_ptq/test_deploy.py` around lines 249 - 254, The new heavy
deploy matrix entries (e.g., ModelDeployerList with model_id
"nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4", tensor_parallel_size=8, and
multi-backend backends tuple) should be gated behind an opt-in selector so
default example tests stay fast; update the test to skip these heavy cases
unless an opt-in condition is present (for example check an environment variable
like FULL_EXAMPLES or use a pytest marker/skipif such as
pytest.mark.skipif(os.environ.get("RUN_FULL_EXAMPLES") is None, reason="heavy
deploy matrix, opt-in required")), and apply the same gating approach to the
other heavy blocks referenced (around the ModelDeployerList blocks at the other
ranges) so only a small smoke subset runs by default.

tests/_test_utils/deploy_utils.py (1)

257-269: Make qwen3_models detection path-agnostic instead of hardcoding /s3.

The new /s3/... literal makes enable_block_reuse behavior dependent on one mount root. Any other local root path won’t match and can silently change cache behavior.

Proposed refactor

         qwen3_models = (
             "nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4",
             "nvidia/Qwen3-Next-80B-A3B-Thinking-NVFP4",
             "nvidia/EAGLE3-NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
-            "/s3/nvidia/EAGLE3-NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
         )
+        is_qwen3_model = self.model_id in qwen3_models or any(
+            self.model_id.endswith(f"/{model}") for model in qwen3_models
+        )
         nemotron_models = (
             "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8",
             "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4",
         )
         kv_cache_config = KvCacheConfig(
-            enable_block_reuse=self.model_id not in qwen3_models,
+            enable_block_reuse=not is_qwen3_model,
             free_gpu_memory_fraction=0.8,
             mamba_ssm_cache_dtype="float32" if self.model_id not in nemotron_models else "auto",
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/_test_utils/deploy_utils.py` around lines 257 - 269, The hardcoded
"/s3/..." entry in qwen3_models makes the check for enable_block_reuse path-root
dependent; change the detection to be path-agnostic by normalizing and matching
the model identifier regardless of leading path (e.g., use os.path.normpath or
pathlib.Path on self.model_id and check if any entry in qwen3_models matches the
tail of the path or if self.model_id.endswith(model) for each model), then use
that boolean to set KvCacheConfig(enable_block_reuse=...). Update the check
around qwen3_models and enable_block_reuse to use this normalized/endswith
comparison so local mounts with different roots still match.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/_test_utils/examples/models.py`:
- Around line 72-75: FLUX_DEV_PATH currently points to the heavy remote
checkpoint "black-forest-labs/FLUX.1-dev" causing slow tests; update the
_select_path call for FLUX_DEV_PATH to use a lightweight default remote
checkpoint (or a local placeholder) instead of the full remote id so
tests/examples run quickly. Locate the FLUX_DEV_PATH definition and change the
remote_id argument passed to _select_path (and/or set a lightweight local_id) to
a small test-friendly model identifier to avoid fetching the full checkpoint
during CI.

---

Nitpick comments:
In `@tests/_test_utils/deploy_utils.py`:
- Around line 257-269: The hardcoded "/s3/..." entry in qwen3_models makes the
check for enable_block_reuse path-root dependent; change the detection to be
path-agnostic by normalizing and matching the model identifier regardless of
leading path (e.g., use os.path.normpath or pathlib.Path on self.model_id and
check if any entry in qwen3_models matches the tail of the path or if
self.model_id.endswith(model) for each model), then use that boolean to set
KvCacheConfig(enable_block_reuse=...). Update the check around qwen3_models and
enable_block_reuse to use this normalized/endswith comparison so local mounts
with different roots still match.

In `@tests/examples/llm_ptq/test_deploy.py`:
- Around line 249-254: The new heavy deploy matrix entries (e.g.,
ModelDeployerList with model_id "nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4",
tensor_parallel_size=8, and multi-backend backends tuple) should be gated behind
an opt-in selector so default example tests stay fast; update the test to skip
these heavy cases unless an opt-in condition is present (for example check an
environment variable like FULL_EXAMPLES or use a pytest marker/skipif such as
pytest.mark.skipif(os.environ.get("RUN_FULL_EXAMPLES") is None, reason="heavy
deploy matrix, opt-in required")), and apply the same gating approach to the
other heavy blocks referenced (around the ModelDeployerList blocks at the other
ranges) so only a small smoke subset runs by default.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 686df9a1-c372-4333-a43c-5eed77553eef

📥 Commits

Reviewing files that changed from the base of the PR and between 7e82a5c and cbaeacf.

📒 Files selected for processing (4)

tests/_test_utils/deploy_utils.py
tests/_test_utils/examples/models.py
tests/examples/diffusers/test_diffusers.py
tests/examples/llm_ptq/test_deploy.py

coderabbitai · 2026-04-17T08:02:34Z

+FLUX_DEV_PATH = _select_path(
+    remote_id="black-forest-labs/FLUX.1-dev",
+    local_id="black-forest-labs/FLUX.1-dev",
+)


⚠️ Potential issue | 🟠 Major

Use a lightweight default remote for FLUX_DEV_PATH to avoid slow integration runs.

Line 73 points to the full black-forest-labs/FLUX.1-dev checkpoint when no local root is set; that can make the default tests/examples flow very slow and brittle.

⚙️ Proposed fix

+_FLUX_DEV_REMOTE_ID = os.getenv("MODELOPT_TEST_FLUX_DEV_REMOTE_ID", "hf-internal-testing/tiny-flux-pipe") + FLUX_DEV_PATH = _select_path( - remote_id="black-forest-labs/FLUX.1-dev", + remote_id=_FLUX_DEV_REMOTE_ID, local_id="black-forest-labs/FLUX.1-dev", )

As per coding guidelines, tests/examples/**/*.py: "Integration tests in tests/examples/ should not take more than a few minutes to run".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/_test_utils/examples/models.py` around lines 72 - 75, FLUX_DEV_PATH currently points to the heavy remote checkpoint "black-forest-labs/FLUX.1-dev" causing slow tests; update the _select_path call for FLUX_DEV_PATH to use a lightweight default remote checkpoint (or a local placeholder) instead of the full remote id so tests/examples run quickly. Locate the FLUX_DEV_PATH definition and change the remote_id argument passed to _select_path (and/or set a lightweight local_id) to a small test-friendly model identifier to avoid fetching the full checkpoint during CI.

Signed-off-by: Sirui Wang <siruiw@r6515-0097.ipp1a1.colossus.nvidia.com>

codecov · 2026-04-17T08:45:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.74%. Comparing base (04fcf24) to head (0c70ce0).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1287      +/-   ##
==========================================
- Coverage   72.74%   72.74%   -0.01%     
==========================================
  Files         459      459              
  Lines       48612    48615       +3     
==========================================
+ Hits        35365    35367       +2     
- Misses      13247    13248       +1

Flag	Coverage Δ
unit	`52.21% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai bot reviewed Apr 17, 2026

View reviewed changes

keep deploy cases and Eagle fixes for merge

0c70ce0

Signed-off-by: Sirui Wang <siruiw@r6515-0097.ipp1a1.colossus.nvidia.com>

nvSiruiW force-pushed the sirui/merge-ready-deploy-models-20260417 branch from cbaeacf to 0c70ce0 Compare April 17, 2026 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keep deploy cases and Eagle fixes for merge#1287

keep deploy cases and Eagle fixes for merge#1287
nvSiruiW wants to merge 1 commit intoNVIDIA:mainfrom
noeyy-mino:sirui/merge-ready-deploy-models-20260417

nvSiruiW commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 17, 2026

Uh oh!

coderabbitai bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 17, 2026

Uh oh!

codecov bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nvSiruiW commented Apr 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 17, 2026

Uh oh!

coderabbitai bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nvSiruiW commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 17, 2026 •

edited

Loading