keep deploy cases and Eagle fixes for merge#1287
keep deploy cases and Eagle fixes for merge#1287nvSiruiW wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdded model identifiers and constants and expanded test parametrizations: EAGLE3/Nemotron-3-Nano model IDs added to deployment logic, new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/examples/llm_ptq/test_deploy.py (1)
249-254: Consider gating the newly added heavy deploy matrix behind an opt-in selector.These additions substantially increase
tests/examplesload (many TP=8, multi-backend cases). Keeping a default smoke subset and running the full matrix only when requested would keep turnaround manageable.As per coding guidelines, "Integration tests in
tests/examples/should not take more than a few minutes to run".Also applies to: 304-324, 377-383, 489-500, 655-668
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/llm_ptq/test_deploy.py` around lines 249 - 254, The new heavy deploy matrix entries (e.g., ModelDeployerList with model_id "nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4", tensor_parallel_size=8, and multi-backend backends tuple) should be gated behind an opt-in selector so default example tests stay fast; update the test to skip these heavy cases unless an opt-in condition is present (for example check an environment variable like FULL_EXAMPLES or use a pytest marker/skipif such as pytest.mark.skipif(os.environ.get("RUN_FULL_EXAMPLES") is None, reason="heavy deploy matrix, opt-in required")), and apply the same gating approach to the other heavy blocks referenced (around the ModelDeployerList blocks at the other ranges) so only a small smoke subset runs by default.tests/_test_utils/deploy_utils.py (1)
257-269: Makeqwen3_modelsdetection path-agnostic instead of hardcoding/s3.The new
/s3/...literal makesenable_block_reusebehavior dependent on one mount root. Any other local root path won’t match and can silently change cache behavior.Proposed refactor
qwen3_models = ( "nvidia/Qwen3-Next-80B-A3B-Instruct-NVFP4", "nvidia/Qwen3-Next-80B-A3B-Thinking-NVFP4", "nvidia/EAGLE3-NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", - "/s3/nvidia/EAGLE3-NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", ) + is_qwen3_model = self.model_id in qwen3_models or any( + self.model_id.endswith(f"/{model}") for model in qwen3_models + ) nemotron_models = ( "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4", ) kv_cache_config = KvCacheConfig( - enable_block_reuse=self.model_id not in qwen3_models, + enable_block_reuse=not is_qwen3_model, free_gpu_memory_fraction=0.8, mamba_ssm_cache_dtype="float32" if self.model_id not in nemotron_models else "auto", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/_test_utils/deploy_utils.py` around lines 257 - 269, The hardcoded "/s3/..." entry in qwen3_models makes the check for enable_block_reuse path-root dependent; change the detection to be path-agnostic by normalizing and matching the model identifier regardless of leading path (e.g., use os.path.normpath or pathlib.Path on self.model_id and check if any entry in qwen3_models matches the tail of the path or if self.model_id.endswith(model) for each model), then use that boolean to set KvCacheConfig(enable_block_reuse=...). Update the check around qwen3_models and enable_block_reuse to use this normalized/endswith comparison so local mounts with different roots still match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/_test_utils/examples/models.py`:
- Around line 72-75: FLUX_DEV_PATH currently points to the heavy remote
checkpoint "black-forest-labs/FLUX.1-dev" causing slow tests; update the
_select_path call for FLUX_DEV_PATH to use a lightweight default remote
checkpoint (or a local placeholder) instead of the full remote id so
tests/examples run quickly. Locate the FLUX_DEV_PATH definition and change the
remote_id argument passed to _select_path (and/or set a lightweight local_id) to
a small test-friendly model identifier to avoid fetching the full checkpoint
during CI.
---
Nitpick comments:
In `@tests/_test_utils/deploy_utils.py`:
- Around line 257-269: The hardcoded "/s3/..." entry in qwen3_models makes the
check for enable_block_reuse path-root dependent; change the detection to be
path-agnostic by normalizing and matching the model identifier regardless of
leading path (e.g., use os.path.normpath or pathlib.Path on self.model_id and
check if any entry in qwen3_models matches the tail of the path or if
self.model_id.endswith(model) for each model), then use that boolean to set
KvCacheConfig(enable_block_reuse=...). Update the check around qwen3_models and
enable_block_reuse to use this normalized/endswith comparison so local mounts
with different roots still match.
In `@tests/examples/llm_ptq/test_deploy.py`:
- Around line 249-254: The new heavy deploy matrix entries (e.g.,
ModelDeployerList with model_id "nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4",
tensor_parallel_size=8, and multi-backend backends tuple) should be gated behind
an opt-in selector so default example tests stay fast; update the test to skip
these heavy cases unless an opt-in condition is present (for example check an
environment variable like FULL_EXAMPLES or use a pytest marker/skipif such as
pytest.mark.skipif(os.environ.get("RUN_FULL_EXAMPLES") is None, reason="heavy
deploy matrix, opt-in required")), and apply the same gating approach to the
other heavy blocks referenced (around the ModelDeployerList blocks at the other
ranges) so only a small smoke subset runs by default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 686df9a1-c372-4333-a43c-5eed77553eef
📒 Files selected for processing (4)
tests/_test_utils/deploy_utils.pytests/_test_utils/examples/models.pytests/examples/diffusers/test_diffusers.pytests/examples/llm_ptq/test_deploy.py
| FLUX_DEV_PATH = _select_path( | ||
| remote_id="black-forest-labs/FLUX.1-dev", | ||
| local_id="black-forest-labs/FLUX.1-dev", | ||
| ) |
There was a problem hiding this comment.
Use a lightweight default remote for FLUX_DEV_PATH to avoid slow integration runs.
Line 73 points to the full black-forest-labs/FLUX.1-dev checkpoint when no local root is set; that can make the default tests/examples flow very slow and brittle.
⚙️ Proposed fix
+_FLUX_DEV_REMOTE_ID = os.getenv("MODELOPT_TEST_FLUX_DEV_REMOTE_ID", "hf-internal-testing/tiny-flux-pipe")
+
FLUX_DEV_PATH = _select_path(
- remote_id="black-forest-labs/FLUX.1-dev",
+ remote_id=_FLUX_DEV_REMOTE_ID,
local_id="black-forest-labs/FLUX.1-dev",
)As per coding guidelines, tests/examples/**/*.py: "Integration tests in tests/examples/ should not take more than a few minutes to run".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/_test_utils/examples/models.py` around lines 72 - 75, FLUX_DEV_PATH
currently points to the heavy remote checkpoint "black-forest-labs/FLUX.1-dev"
causing slow tests; update the _select_path call for FLUX_DEV_PATH to use a
lightweight default remote checkpoint (or a local placeholder) instead of the
full remote id so tests/examples run quickly. Locate the FLUX_DEV_PATH
definition and change the remote_id argument passed to _select_path (and/or set
a lightweight local_id) to a small test-friendly model identifier to avoid
fetching the full checkpoint during CI.
Signed-off-by: Sirui Wang <siruiw@r6515-0097.ipp1a1.colossus.nvidia.com>
cbaeacf to
0c70ce0
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1287 +/- ##
==========================================
- Coverage 72.74% 72.74% -0.01%
==========================================
Files 459 459
Lines 48612 48615 +3
==========================================
+ Hits 35365 35367 +2
- Misses 13247 13248 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit