Skip to content

keep deploy cases and Eagle fixes for merge#1287

Open
nvSiruiW wants to merge 1 commit intoNVIDIA:mainfrom
noeyy-mino:sirui/merge-ready-deploy-models-20260417
Open

keep deploy cases and Eagle fixes for merge#1287
nvSiruiW wants to merge 1 commit intoNVIDIA:mainfrom
noeyy-mino:sirui/merge-ready-deploy-models-20260417

Conversation

@nvSiruiW
Copy link
Copy Markdown

@nvSiruiW nvSiruiW commented Apr 17, 2026

Summary by CodeRabbit

  • Tests
    • Added coverage for additional EAGLE3/Nemotron-3 deployment variants.
    • Added test support for FLUX.1-dev diffusion model with BF16/FP8 quantization case.
    • Expanded LLM deployment tests with Qwen3-VL, Gemma-4, and Nemotron-3 Super variants.
    • Introduced new GLM and MiniMax deployment tests across multiple backends.
    • Enabled previously skipped Medusa tests and added extra Eagle3 scenarios.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 464ef70c-9f55-4881-8db9-e7a91fac7c46

📥 Commits

Reviewing files that changed from the base of the PR and between cbaeacf and 0c70ce0.

📒 Files selected for processing (4)
  • tests/_test_utils/deploy_utils.py
  • tests/_test_utils/examples/models.py
  • tests/examples/diffusers/test_diffusers.py
  • tests/examples/llm_ptq/test_deploy.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/_test_utils/deploy_utils.py
  • tests/_test_utils/examples/models.py
  • tests/examples/llm_ptq/test_deploy.py

📝 Walkthrough

Walkthrough

Added model identifiers and constants and expanded test parametrizations: EAGLE3/Nemotron-3-Nano model IDs added to deployment logic, new FLUX_DEV_PATH constant, and multiple new/extended tests for Qwen3-VL, Gemma-4, GLM, MiniMax, Nemotron, Kimi variants, and Medusa.

Changes

Cohort / File(s) Summary
Deployment utils & model constants
tests/_test_utils/deploy_utils.py, tests/_test_utils/examples/models.py
Extended qwen3_models with EAGLE3/Nemotron-3-Nano 30B BF16 identifiers (affects KvCacheConfig(enable_block_reuse=...) check). Added FLUX_DEV_PATH constant for black-forest-labs/FLUX.1-dev.
Diffusers tests
tests/examples/diffusers/test_diffusers.py
Imported FLUX_DEV_PATH and added a DiffuserModel parametrization for flux-dev (BFloat16, fp8, quant_algo "max", collect_method "default").
LLM PTQ tests
tests/examples/llm_ptq/test_deploy.py
Added/extended parametrizations and tests: added Qwen3-VL 235B, Gemma-4 31B, two Nemotron-3 Super 120B variants, two Kimi K2/K2.5 entries, added new test_glm and test_minimax functions, and removed the unconditional skip decorator from test_medusa.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'keep deploy cases and Eagle fixes for merge' is vague and generic, using non-descriptive terms that don't clearly convey the main changes to someone reviewing the PR history. Consider revising the title to be more specific about the primary changes, such as 'Add Flux-Dev and GLM/MiniMax deployment tests' or 'Update EAGLE3 models and expand LLM deployment test coverage'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns ✅ Passed Pull request contains only test file modifications adding model identifiers and test parametrizations without security-sensitive patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/examples/llm_ptq/test_deploy.py (1)

249-254: Consider gating the newly added heavy deploy matrix behind an opt-in selector.

These additions substantially increase tests/examples load (many TP=8, multi-backend cases). Keeping a default smoke subset and running the full matrix only when requested would keep turnaround manageable.

As per coding guidelines, "Integration tests in tests/examples/ should not take more than a few minutes to run".

Also applies to: 304-324, 377-383, 489-500, 655-668

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_ptq/test_deploy.py` around lines 249 - 254, The new heavy
deploy matrix entries (e.g., ModelDeployerList with model_id
"nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4", tensor_parallel_size=8, and
multi-backend backends tuple) should be gated behind an opt-in selector so
default example tests stay fast; update the test to skip these heavy cases
unless an opt-in condition is present (for example check an environment variable
like FULL_EXAMPLES or use a pytest marker/skipif such as
pytest.mark.skipif(os.environ.get("RUN_FULL_EXAMPLES") is None, reason="heavy
deploy matrix, opt-in required")), and apply the same gating approach to the
other heavy blocks referenced (around the ModelDeployerList blocks at the other
ranges) so only a small smoke subset runs by default.
tests/_test_utils/deploy_utils.py (1)

257-269: Make qwen3_models detection path-agnostic instead of hardcoding /s3.

The new /s3/... literal makes enable_block_reuse behavior dependent on one mount root. Any other local root path won’t match and can silently change cache behavior.

Proposed refactor
         qwen3_models = (
             "nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4",
             "nvidia/Qwen3-Next-80B-A3B-Thinking-NVFP4",
             "nvidia/EAGLE3-NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
-            "/s3/nvidia/EAGLE3-NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
         )
+        is_qwen3_model = self.model_id in qwen3_models or any(
+            self.model_id.endswith(f"/{model}") for model in qwen3_models
+        )
         nemotron_models = (
             "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8",
             "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4",
         )
         kv_cache_config = KvCacheConfig(
-            enable_block_reuse=self.model_id not in qwen3_models,
+            enable_block_reuse=not is_qwen3_model,
             free_gpu_memory_fraction=0.8,
             mamba_ssm_cache_dtype="float32" if self.model_id not in nemotron_models else "auto",
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/_test_utils/deploy_utils.py` around lines 257 - 269, The hardcoded
"/s3/..." entry in qwen3_models makes the check for enable_block_reuse path-root
dependent; change the detection to be path-agnostic by normalizing and matching
the model identifier regardless of leading path (e.g., use os.path.normpath or
pathlib.Path on self.model_id and check if any entry in qwen3_models matches the
tail of the path or if self.model_id.endswith(model) for each model), then use
that boolean to set KvCacheConfig(enable_block_reuse=...). Update the check
around qwen3_models and enable_block_reuse to use this normalized/endswith
comparison so local mounts with different roots still match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/_test_utils/examples/models.py`:
- Around line 72-75: FLUX_DEV_PATH currently points to the heavy remote
checkpoint "black-forest-labs/FLUX.1-dev" causing slow tests; update the
_select_path call for FLUX_DEV_PATH to use a lightweight default remote
checkpoint (or a local placeholder) instead of the full remote id so
tests/examples run quickly. Locate the FLUX_DEV_PATH definition and change the
remote_id argument passed to _select_path (and/or set a lightweight local_id) to
a small test-friendly model identifier to avoid fetching the full checkpoint
during CI.

---

Nitpick comments:
In `@tests/_test_utils/deploy_utils.py`:
- Around line 257-269: The hardcoded "/s3/..." entry in qwen3_models makes the
check for enable_block_reuse path-root dependent; change the detection to be
path-agnostic by normalizing and matching the model identifier regardless of
leading path (e.g., use os.path.normpath or pathlib.Path on self.model_id and
check if any entry in qwen3_models matches the tail of the path or if
self.model_id.endswith(model) for each model), then use that boolean to set
KvCacheConfig(enable_block_reuse=...). Update the check around qwen3_models and
enable_block_reuse to use this normalized/endswith comparison so local mounts
with different roots still match.

In `@tests/examples/llm_ptq/test_deploy.py`:
- Around line 249-254: The new heavy deploy matrix entries (e.g.,
ModelDeployerList with model_id "nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4",
tensor_parallel_size=8, and multi-backend backends tuple) should be gated behind
an opt-in selector so default example tests stay fast; update the test to skip
these heavy cases unless an opt-in condition is present (for example check an
environment variable like FULL_EXAMPLES or use a pytest marker/skipif such as
pytest.mark.skipif(os.environ.get("RUN_FULL_EXAMPLES") is None, reason="heavy
deploy matrix, opt-in required")), and apply the same gating approach to the
other heavy blocks referenced (around the ModelDeployerList blocks at the other
ranges) so only a small smoke subset runs by default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 686df9a1-c372-4333-a43c-5eed77553eef

📥 Commits

Reviewing files that changed from the base of the PR and between 7e82a5c and cbaeacf.

📒 Files selected for processing (4)
  • tests/_test_utils/deploy_utils.py
  • tests/_test_utils/examples/models.py
  • tests/examples/diffusers/test_diffusers.py
  • tests/examples/llm_ptq/test_deploy.py

Comment on lines +72 to +75
FLUX_DEV_PATH = _select_path(
remote_id="black-forest-labs/FLUX.1-dev",
local_id="black-forest-labs/FLUX.1-dev",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a lightweight default remote for FLUX_DEV_PATH to avoid slow integration runs.

Line 73 points to the full black-forest-labs/FLUX.1-dev checkpoint when no local root is set; that can make the default tests/examples flow very slow and brittle.

⚙️ Proposed fix
+_FLUX_DEV_REMOTE_ID = os.getenv("MODELOPT_TEST_FLUX_DEV_REMOTE_ID", "hf-internal-testing/tiny-flux-pipe")
+
 FLUX_DEV_PATH = _select_path(
-    remote_id="black-forest-labs/FLUX.1-dev",
+    remote_id=_FLUX_DEV_REMOTE_ID,
     local_id="black-forest-labs/FLUX.1-dev",
 )

As per coding guidelines, tests/examples/**/*.py: "Integration tests in tests/examples/ should not take more than a few minutes to run".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/_test_utils/examples/models.py` around lines 72 - 75, FLUX_DEV_PATH
currently points to the heavy remote checkpoint "black-forest-labs/FLUX.1-dev"
causing slow tests; update the _select_path call for FLUX_DEV_PATH to use a
lightweight default remote checkpoint (or a local placeholder) instead of the
full remote id so tests/examples run quickly. Locate the FLUX_DEV_PATH
definition and change the remote_id argument passed to _select_path (and/or set
a lightweight local_id) to a small test-friendly model identifier to avoid
fetching the full checkpoint during CI.

Signed-off-by: Sirui Wang <siruiw@r6515-0097.ipp1a1.colossus.nvidia.com>
@nvSiruiW nvSiruiW force-pushed the sirui/merge-ready-deploy-models-20260417 branch from cbaeacf to 0c70ce0 Compare April 17, 2026 08:31
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.74%. Comparing base (04fcf24) to head (0c70ce0).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1287      +/-   ##
==========================================
- Coverage   72.74%   72.74%   -0.01%     
==========================================
  Files         459      459              
  Lines       48612    48615       +3     
==========================================
+ Hits        35365    35367       +2     
- Misses      13247    13248       +1     
Flag Coverage Δ
unit 52.21% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant