fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219
Open
qizwiz wants to merge 1 commit into
Open
fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219qizwiz wants to merge 1 commit into
qizwiz wants to merge 1 commit into
Conversation
…eval calls client.chat.completions.create() can return choices=[] on content-policy rejections or provider errors, and choices[0].message=None on filtered responses (e.g. Gemini PROHIBITED_CONTENT via OpenAI-compatible endpoint). Both crash with IndexError/AttributeError. The existing try/except blocks catch these as generic 'LLM evaluation failed' errors, making them hard to diagnose. Explicit guards surface the root cause clearly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add guards at three LLM evaluation call sites in the EvoAgentBench domain evaluators before accessing
choices[0].message.content.Why
client.chat.completions.create()can return two empty-response shapes:choices = []— on content-policy rejections, rate-limit errors, or provider failureschoices[0].message = None— e.g. Gemini 2.5 Flash (via OpenAI-compatible endpoint) returns HTTP 200 withfinish_reason: PROHIBITED_CONTENTandmessage=NoneBoth crash with
IndexErrororAttributeError. The existingtry/exceptblocks catch these as generic"LLM evaluation failed: list index out of range"errors, making benchmark runs hard to diagnose.Files changed
benchmarks/EvoAgentBench/src/domains/information_retrieval/judge.pyresp.choices[0].message.content or ""benchmarks/EvoAgentBench/src/domains/knowledge_work/evaluate.pyresp.choices[0].message.contentbenchmarks/EvoAgentBench/src/domains/reasoning/evaluate.pyresponse.choices[0].message.contentCorpus context
Detected by
pact(llm_response_unguardedmode), a Z3-verified static analyzer for LLM crash vectors. This pattern was found across 13.8k violations in 800+ repos.