Skip to content

fix: graceful recovery from malformed LLM tool call output#1458

Open
gdeyoung wants to merge 1 commit intoagent0ai:mainfrom
gdeyoung:fix/validation-recovery-malformed-llm-v17
Open

fix: graceful recovery from malformed LLM tool call output#1458
gdeyoung wants to merge 1 commit intoagent0ai:mainfrom
gdeyoung:fix/validation-recovery-malformed-llm-v17

Conversation

@gdeyoung
Copy link
Copy Markdown
Contributor

@gdeyoung gdeyoung commented Apr 6, 2026

TL;DR — Agent crashes on malformed LLM output instead of recovering gracefully

When validate_tool_request() raises a ValueError (e.g., malformed JSON from non-OpenAI models like GLM, MiniMax, Qwen), the exception propagates uncaught and crashes the entire agent. A simple try/except wrapper routes the output to the existing misformat warning path, allowing the agent to self-correct and continue.

One line of defensive wrapping prevents a crash loop that requires manual agent restart.


Problem

In agent.py, the validate_tool_request() call at line ~878 is unguarded:

if tool_request is not None:
    await self.validate_tool_request(tool_request)  # Can raise ValueError

When the LLM produces malformed JSON (common with GLM-5.1, MiniMax, Qwen), validate_tool_request() raises ValueError. This exception is not caught, causing the agent to crash instead of using the existing misformat recovery path.

Solution

Wrap the call in try/except ValueError:

if tool_request is not None:
    try:
        await self.validate_tool_request(tool_request)
    except ValueError:
        # Malformed JSON from LLM — treat as misformat, not fatal crash
        tool_request = None

When tool_request is set to None, the existing misformat warning logic handles it — prompting the agent to self-correct on the next turn.

Impact

Changes

File Change
agent.py Wrap validate_tool_request() in try/except ValueError

Testing

  • 2 regression tests verify ValueError is caught and tool_request is set to None
  • 21/21 tests passing across all critical patches
  • Running in production for 3+ days

Related

When validate_tool_request() raises ValueError on malformed JSON from the LLM,
the agent crashes instead of recovering. Wrapping in try/except ValueError
routes the output to the misformat warning path, allowing the agent to
self-correct and continue operating.
gdeyoung added a commit to gdeyoung/agent-zero that referenced this pull request Apr 9, 2026
When LLMs output valid JSON with tool_name but missing/malformed tool_args, json_parse_dirty() returned the incomplete dict as-is, causing ValueError crashes in validate_tool_request().

Root cause fix: normalize tool requests in json_parse_dirty() before returning them:

- Missing tool_args -> defaults to {}

- tool_args as string -> converts to dict

- tool_args as non-dict -> defaults to {}

- tool key -> normalized to tool_name

Non-tool requests are left untouched.

Related: agent0ai#1458, agent0ai#1466
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant