Skip to content

feat: add R3/v1 router replay deserialization support#450

Merged
SunnySoldier357 merged 8 commits into
mainfrom
sandeep/router-replay-support
May 11, 2026
Merged

feat: add R3/v1 router replay deserialization support#450
SunnySoldier357 merged 8 commits into
mainfrom
sandeep/router-replay-support

Conversation

@SunnySoldier357
Copy link
Copy Markdown
Collaborator

@SunnySoldier357 SunnySoldier357 commented May 4, 2026

Summary

  • Adds r3_deserializer module that decompresses and unpacks R3/v1 binary router-replay payloads (base64-encoded, zstd-compressed) into per-token routing matrices. Supports ALL, SUFFIX, and BITMAP selector modes with uint8/uint16 dtypes.
  • Threads a new include_payloads parameter through FireworksTracingAdapter.get_evaluation_rows(), RemoteRolloutProcessor, DataLoaderConfig, and update_row_with_remote_trace() so callers can opt-in to fetching and extracting router replay data from traces.
  • When include_payloads=True, convert_trace_dict_to_evaluation_row automatically decompresses any payloads.router_replay.data blob and attaches routing_matrices and routing_metadata to execution_metadata.extra.
  • Adds zstandard>=0.19.0 as a dependency.

Test plan

  • Unit tests in tests/adapters/test_r3_deserializer.py covering:
    • Header parsing (valid, bad magic, too short, unsupported version)
    • Bitmap position reading (all set, none set, sparse, multi-byte)
    • Full decompress+parse for ALL, SUFFIX, and BITMAP selector modes
    • uint16 dtype support
    • Zero replayed tokens edge case
    • Round-trip test against the gateway's r3_serializer (skips if serializer not available)
    • Integration with convert_trace_dict_to_evaluation_row (with payload, without, empty data)

Made with Cursor


Note

Medium Risk
Adds zstd decompression and binary parsing of trace payloads (potentially large/untrusted data) and threads a new opt-in flag through rollout/tracing code paths, which could impact performance and error handling when enabled.

Overview
Adds opt-in router replay extraction from Fireworks/Langfuse traces. A new include_payloads flag is threaded through FireworksTracingAdapter.get_evaluation_rows(), remote rollout processing, and DataLoaderConfig, so callers can request trace payloads from the gateway.

When payloads are present, convert_trace_dict_to_evaluation_row now attempts to decompress and deserialize payloads.router_replay.data (R3/v1) and attaches routing_matrices plus routing_metadata onto execution_metadata.extra.

Introduces a new adapters/r3_deserializer.py implementing the R3/v1 zstd+base64 binary format (ALL/SUFFIX/BITMAP selectors; uint8/uint16) with comprehensive unit and integration tests, and adds the zstandard dependency.

Reviewed by Cursor Bugbot for commit f1e393d. Bugbot is set up for automated code reviews on this repo. Configure here.

@SunnySoldier357 SunnySoldier357 self-assigned this May 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e769ac1e1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
@SunnySoldier357 SunnySoldier357 force-pushed the sandeep/router-replay-support branch from f10a6a9 to 6be8056 Compare May 6, 2026 01:52
@SunnySoldier357 SunnySoldier357 requested a review from benjibc May 6, 2026 17:44
# Build final model base URL with tracing metadata
final_model_base_url = model_base_url
if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost")):
if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost") or "litellm-gateway" in model_base_url):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the check for tracing.fireworks.ai or litellm-gateway. Which one is it. Are there cases where its one and not the other, and vice versa?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for dev testing since that is litellm-gateway

Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
SunnySoldier357 and others added 6 commits May 7, 2026 13:53
Mirrors the gateway-side r3_serializer change: the per-token matrix
shape (num_moe_layers, top_k) is no longer required and is no longer
written into the r3/v1 binary header.  Per-token matrix byte size is
recovered as matrix_byte_length / replayed_token_count.

- HEADER_FORMAT: "<4sBBBBIIHHIIQ" (36 bytes) -> "<4sBBBBIIIIQ" (32 bytes).
- Drop num_moe_layers/top_k from _parse_header() and the metadata dict
  returned by decompress_and_parse_r3().
- Compute matrix_elem_size from matrix_byte_length / replayed_token_count
  with a divisibility check that surfaces malformed payloads early.
- Update unit tests to use matrix_elem_size as the parameter and drop
  assertions on the removed header fields; round-trip test no longer
  passes num_moe_layers/top_k to RouterReplayData.

Co-authored-by: Cursor <cursoragent@cursor.com>
ZstdCompressor.compress() (used by the gateway-side r3_serializer)
embeds the uncompressed size in the frame header, so passing
max_output_size=len(compressed)*20 was both unnecessary and incorrect:
highly compressible router-replay payloads (e.g. tokens routing to a
small subset of experts) routinely exceed a 20:1 ratio, and would have
failed deserialization with ZstdError.

Removing the cap lets the library auto-allocate from the embedded
content size.  Verified locally: a 64 KiB zero-filled matrix payload
compresses to ~35 bytes (>1800x ratio) and now deserializes cleanly.

Adds a regression test covering the high-compression case.

Co-authored-by: Cursor <cursoragent@cursor.com>
_RoutingDtype(int) and _SelectorMode(int) raise ValueError for any
value not in the enum, so the .get() fallback was unreachable: a
future routing_dtype=3 in the header would crash metadata
construction before str(int) could run.

Look up names by raw int instead — IntEnum keys hash-equal their int
values, so known modes resolve to their lowercase name and unknown
ones fall back to str(int) without ever constructing the enum.
Adds a regression test exercising routing_dtype=99.

Co-authored-by: Cursor <cursoragent@cursor.com>
decompress_and_parse_r3 now derives matrix_elem_size from
matrix_byte_length / replayed_token_count, so the dtype's per-element
byte width is no longer referenced anywhere. Removing dead code.

Co-authored-by: Cursor <cursoragent@cursor.com>
@SunnySoldier357 SunnySoldier357 force-pushed the sandeep/router-replay-support branch from 6be8056 to 6639c01 Compare May 7, 2026 21:00
Comment thread eval_protocol/adapters/r3_deserializer.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f1e393d. Configure here.

Comment thread eval_protocol/pytest/tracing_utils.py
@SunnySoldier357 SunnySoldier357 merged commit 99e49fa into main May 11, 2026
17 checks passed
@SunnySoldier357 SunnySoldier357 deleted the sandeep/router-replay-support branch May 11, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants