Skip to content

fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219

Open
qizwiz wants to merge 1 commit into
EverMind-AI:mainfrom
qizwiz:fix/guard-empty-llm-response
Open

fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219
qizwiz wants to merge 1 commit into
EverMind-AI:mainfrom
qizwiz:fix/guard-empty-llm-response

Conversation

@qizwiz
Copy link
Copy Markdown

@qizwiz qizwiz commented May 18, 2026

What

Add guards at three LLM evaluation call sites in the EvoAgentBench domain evaluators before accessing choices[0].message.content.

Why

client.chat.completions.create() can return two empty-response shapes:

  1. choices = [] — on content-policy rejections, rate-limit errors, or provider failures
  2. choices[0].message = None — e.g. Gemini 2.5 Flash (via OpenAI-compatible endpoint) returns HTTP 200 with finish_reason: PROHIBITED_CONTENT and message=None

Both crash with IndexError or AttributeError. The existing try/except blocks catch these as generic "LLM evaluation failed: list index out of range" errors, making benchmark runs hard to diagnose.

Files changed

File Fix
benchmarks/EvoAgentBench/src/domains/information_retrieval/judge.py Guard before resp.choices[0].message.content or ""
benchmarks/EvoAgentBench/src/domains/knowledge_work/evaluate.py Guard before resp.choices[0].message.content
benchmarks/EvoAgentBench/src/domains/reasoning/evaluate.py Guard before response.choices[0].message.content
# Before
eval_text = resp.choices[0].message.content

# After
if not resp.choices or resp.choices[0].message is None:
    raise ValueError("LLM returned empty or filtered response")
eval_text = resp.choices[0].message.content

Corpus context

Detected by pact (llm_response_unguarded mode), a Z3-verified static analyzer for LLM crash vectors. This pattern was found across 13.8k violations in 800+ repos.

…eval calls

client.chat.completions.create() can return choices=[] on content-policy
rejections or provider errors, and choices[0].message=None on filtered
responses (e.g. Gemini PROHIBITED_CONTENT via OpenAI-compatible endpoint).
Both crash with IndexError/AttributeError. The existing try/except blocks
catch these as generic 'LLM evaluation failed' errors, making them hard
to diagnose. Explicit guards surface the root cause clearly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant