[3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins by h-guo18 · Pull Request #1297 · NVIDIA/Model-Optimizer

h-guo18 · 2026-04-19T22:00:37Z

What does this PR do?

Type of change: refactoring

Part 3 of a 3-PR series splitting #1271:

[1/3] [1/3][Refactor]: File reorg; deprecate ParallelDraft #1296: File reorg + deprecate `ParallelDraft`
[2/3] [2/3][Feat]: Offline DFlash training #1295: Offline DFlash training
[3/3] this PR: Extract `HFSpecDecMixin` (depends on [2/3][Feat]: Offline DFlash training #1295)

Changes:

New `modelopt/torch/speculative/plugins/hf_spec_mixin.py` containing `HFSpecDecMixin` with:
- Properties: `_base_model`, `_base_model_embeddings`, `_base_model_lm_head`, `_base_llm_config` (VLM-aware).
- `_find_base_model_parts()` — probe `modeling_fakebase` paths.
- `_base_model_forward()` — generic base forward with optional freeze + CE loss.
- `_nvtx_range()` and `_activate_torch_compile()` driven by subclass `_compile_targets`.
`HFEagleModel` now `(HFSpecDecMixin, EagleModel)`; drops the duplicated helpers; sets `_compile_targets` and `self._enable_nvtx` in `modify()`.
`HFDFlashModel` now `(HFSpecDecMixin, DFlashModel)`; drops the duplicated helpers; `_dflash_base_model_forward` reuses the mixin's generic forward.

Testing

No behavioral change expected. Verified MRO includes `HFSpecDecMixin` and existing Eagle/DFlash training scripts run unchanged.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (`git commit -s -S`).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.).

Is this change backward compatible?: ✅ — internal refactor; no public API change.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
Did you write any new necessary tests?: N/A — pure refactor.
Did you update Changelog?: ❌

Additional Information

Base branch is #1295. Retarget to `main` once #1296 and #1295 merge.

- Add `dflash_offline` config flag for training from pre-computed hidden states; deletes base model layers to save memory. - Move `dflash_mask_token_id` auto-detection from `main.py` into `DFlashConfig` Pydantic validators; derive `dflash_offline` from `data_args.offline_data_path`. - Add `DFlashBaseModelOutput.from_offline_dict` classmethod for consuming pre-computed hidden states in the forward path. Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

Extract duplicated base-model discovery, forward pass, NVTX profiling, and torch.compile logic from HFEagleModel / HFDFlashModel into a shared mixin (hf_spec_mixin.py). HFEagleModel and HFDFlashModel now inherit from (HFSpecDecMixin, EagleModel/DFlashModel). Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

copy-pr-bot · 2026-04-19T22:00:41Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-19T22:00:44Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 81e36b72-3144-46e9-b217-9598fb20a176

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch haoguo/spec-mixin-new

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

h-guo18 added 2 commits April 19, 2026 21:51

This was referenced Apr 19, 2026

[1/3][Refactor]: File reorg; deprecate ParallelDraft #1296

Open

[2/3][Feat]: Offline DFlash training #1295

Open

h-guo18 changed the title ~~[Refactor]: HFSpecDecMixin shared across HF spec-decoding plugins~~ [3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins Apr 19, 2026

h-guo18 force-pushed the haoguo/dflash-offline branch from f208109 to 178b191 Compare April 19, 2026 23:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins#1297

[3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins#1297
h-guo18 wants to merge 2 commits intohaoguo/dflash-offlinefrom
haoguo/spec-mixin-new

h-guo18 commented Apr 19, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 19, 2026

Uh oh!

coderabbitai Bot commented Apr 19, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

h-guo18 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 19, 2026

Uh oh!

coderabbitai Bot commented Apr 19, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

h-guo18 commented Apr 19, 2026 •

edited

Loading