Skip to content

fix(geo3k-vlm-sft): remove --apply-chat-template from SFT launch script#1791

Merged
zhuzilin merged 1 commit intoTHUDM:mainfrom
DongzhuoranZhou:feat/search-r1-geo3k-vlm-sft
Apr 3, 2026
Merged

fix(geo3k-vlm-sft): remove --apply-chat-template from SFT launch script#1791
zhuzilin merged 1 commit intoTHUDM:mainfrom
DongzhuoranZhou:feat/search-r1-geo3k-vlm-sft

Conversation

@DongzhuoranZhou
Copy link
Copy Markdown
Contributor

@DongzhuoranZhou DongzhuoranZhou commented Apr 1, 2026

Problem

--apply-chat-template in run_geo3k_vlm_sft.sh converts the messages list into a plain string before it reaches the SFT rollout. sft_rollout.generate_rollout() expects sample.prompt to be a list of message dicts, so passing a string causes a TypeError at runtime — training fails immediately.

Fix

Remove the flag. The messages column produced by prepare_sft_data.py is already a structured list; the SFT rollout consumes it directly to build the per-token loss mask.

--apply-chat-template flattens the messages list into a plain string,
stripping the role boundaries that the SFT rollout needs to build the
per-token loss mask. With the flag present, user/system tokens are
included in the loss — the model trains on the wrong tokens.

The sft_rollout.generate_rollout() expects messages as a structured
list so it can identify assistant turns and mask everything else out.
Removing the flag restores correct SFT loss masking behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zhuzilin zhuzilin merged commit 887fba1 into THUDM:main Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants