feat: preserve token IDs on messages by Hecate0821 · Pull Request #448 · eval-protocol/python-sdk

Hecate0821 · 2026-04-28T03:32:09Z

Summary

Adds token-native trace support to EP messages without turning EP into an RL framework.

Current Design

eval_protocol.models.Message now has optional token_ids: list[int] | None. This lets rollout processors preserve engine token IDs alongside message content and provider logprob metadata.

The schema remains backward compatible:

Provider-specific logprobs dictionaries are still accepted unchanged.
Text-only messages still work.
A strict alignment check only applies to the clean token-native shape where both token_ids and flat list[float] logprobs are set. In that case, lengths must match.

SingleTurnRolloutProcessor now extracts token IDs from serialized provider logprobs when they are available:

logprobs.content[].token_id
logprobs.token_ids[]

Those IDs are stored on the assistant Message.token_ids. If the provider does not expose token IDs, the field remains unset.

Why

The cookbook async RL path is token-native only. Multi-turn RL cannot safely re-tokenize assistant text after rollout because BPE merges can cross turn boundaries and inference logprobs can become misaligned. EP needs a simple message-level carrier for token IDs so downstream training can consume traces without reconstructing tokens from text.

Tests / Checks

python3.11 -m pytest tests/test_rollout_logprobs.py tests/test_eval_protocol_import.py::TestRewardProtocolFunctionality::test_message_preserves_token_ids tests/test_eval_protocol_import.py::TestRewardProtocolFunctionality::test_message_rejects_misaligned_float_logprobs
python3.11 -m ruff check eval_protocol/models.py eval_protocol/pytest/default_single_turn_rollout_process.py tests/test_rollout_logprobs.py tests/test_eval_protocol_import.py
git diff --check

Note: full tests/test_eval_protocol_import.py currently has an unrelated failure in TestRewardProtocolImports.test_star_import_works because eval_protocol.adapters does not expose LangfuseAdapter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: preserve token IDs on messages#448

feat: preserve token IDs on messages#448
Hecate0821 wants to merge 1 commit into
mainfrom
chengxi/message-token-ids

Hecate0821 commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hecate0821 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Current Design

Why

Tests / Checks

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hecate0821 commented Apr 28, 2026 •

edited

Loading