Skip to content

[3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins#1297

Draft
h-guo18 wants to merge 2 commits intohaoguo/dflash-offlinefrom
haoguo/spec-mixin-new
Draft

[3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins#1297
h-guo18 wants to merge 2 commits intohaoguo/dflash-offlinefrom
haoguo/spec-mixin-new

Conversation

@h-guo18
Copy link
Copy Markdown
Contributor

@h-guo18 h-guo18 commented Apr 19, 2026

What does this PR do?

Type of change: refactoring

Part 3 of a 3-PR series splitting #1271:

Changes:

  • New `modelopt/torch/speculative/plugins/hf_spec_mixin.py` containing `HFSpecDecMixin` with:
    • Properties: `_base_model`, `_base_model_embeddings`, `_base_model_lm_head`, `_base_llm_config` (VLM-aware).
    • `_find_base_model_parts()` — probe `modeling_fakebase` paths.
    • `_base_model_forward()` — generic base forward with optional freeze + CE loss.
    • `_nvtx_range()` and `_activate_torch_compile()` driven by subclass `_compile_targets`.
  • `HFEagleModel` now `(HFSpecDecMixin, EagleModel)`; drops the duplicated helpers; sets `_compile_targets` and `self._enable_nvtx` in `modify()`.
  • `HFDFlashModel` now `(HFSpecDecMixin, DFlashModel)`; drops the duplicated helpers; `_dflash_base_model_forward` reuses the mixin's generic forward.

Testing

No behavioral change expected. Verified MRO includes `HFSpecDecMixin` and existing Eagle/DFlash training scripts run unchanged.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (`git commit -s -S`).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.).

  • Is this change backward compatible?: ✅ — internal refactor; no public API change.
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
  • Did you write any new necessary tests?: N/A — pure refactor.
  • Did you update Changelog?: ❌

Additional Information

Base branch is #1295. Retarget to `main` once #1296 and #1295 merge.

h-guo18 added 2 commits April 19, 2026 21:51
- Add `dflash_offline` config flag for training from pre-computed hidden states;
  deletes base model layers to save memory.
- Move `dflash_mask_token_id` auto-detection from `main.py` into `DFlashConfig`
  Pydantic validators; derive `dflash_offline` from `data_args.offline_data_path`.
- Add `DFlashBaseModelOutput.from_offline_dict` classmethod for consuming
  pre-computed hidden states in the forward path.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Extract duplicated base-model discovery, forward pass, NVTX profiling, and
torch.compile logic from HFEagleModel / HFDFlashModel into a shared mixin
(hf_spec_mixin.py). HFEagleModel and HFDFlashModel now inherit from
(HFSpecDecMixin, EagleModel/DFlashModel).

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 81e36b72-3144-46e9-b217-9598fb20a176

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch haoguo/spec-mixin-new

Comment @coderabbitai help to get the list of available commands and usage tips.

@h-guo18 h-guo18 changed the title [Refactor]: HFSpecDecMixin shared across HF spec-decoding plugins [3/3][Refactor]: Extract HFSpecDecMixin for HF spec-decoding plugins Apr 19, 2026
@h-guo18 h-guo18 force-pushed the haoguo/dflash-offline branch from f208109 to 178b191 Compare April 19, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant