Skip to content

Pull requests: eval-protocol/python-sdk

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

feat: preserve token IDs on messages
#448 opened Apr 28, 2026 by Hecate0821 Loading…
Fix duplicate CLI shorthand flag registration
#441 opened Mar 31, 2026 by benjibc Contributor Loading…
Fix argparse conflict for duplicate TypedDict field short aliases
#433 opened Mar 9, 2026 by wirjo Loading…
4 tasks done
temp hot fix
#423 opened Jan 26, 2026 by xzrderek Contributor Loading…
Fix bug in preprocess_fn codepath
#420 opened Jan 24, 2026 by RNHTTR Draft
Add IFEval
#414 opened Jan 16, 2026 by SandyYuan Collaborator Loading…
perf: Add server-side filtering for ep logs to improve performance
#408 opened Jan 11, 2026 by benjibc Contributor Loading…
Verify Fireworks API key once for CLI flows codex
#407 opened Jan 10, 2026 by benjibc Contributor Loading…
Record rollout start time and show rollout latency in UI codex
#398 opened Jan 7, 2026 by benjibc Contributor Loading…
enforce single evaluator upload per command
#387 opened Dec 24, 2025 by benjibc Contributor Loading…
updated tests
#382 opened Dec 18, 2025 by shreymodi1 Contributor Loading…
support extra headers
#373 opened Dec 15, 2025 by benjibc Contributor Loading…
warn if large datasets + force 1 run
#365 opened Dec 12, 2025 by xzrderek Contributor Loading…
Shrey/modelquality
#353 opened Dec 2, 2025 by shreymodi1 Contributor Loading…
18 tasks
support for tokenids logprobs
#350 opened Nov 26, 2025 by shreymodi1 Contributor Loading…
18 tasks
calibration evaluator
#345 opened Nov 24, 2025 by benjibc Contributor Draft
18 tasks
adding response quality validation for retry
#344 opened Nov 24, 2025 by morgendave Collaborator Loading…
10 tasks
tests fix
#341 opened Nov 21, 2025 by shreymodi1 Contributor Loading…
18 tasks
Shrey/trl
#335 opened Nov 17, 2025 by shreymodi1 Contributor Loading…
18 tasks
Update Klavis MCP use case
#330 opened Nov 14, 2025 by LLiuZheng Contributor Loading…
Text to SQL RFT example
#324 opened Nov 10, 2025 by benjibc Contributor Loading…
swe-bench
#280 opened Oct 15, 2025 by shreymodi1 Contributor Loading…
reasoning effort string change
#267 opened Oct 10, 2025 by shreymodi1 Contributor Loading…
18 tasks
ProTip! Exclude everything labeled bug with -label:bug.