Skip to content

feat(a2a): implement full A2A task lifecycle state support#2245

Open
agent-of-mkmeral wants to merge 5 commits intostrands-agents:mainfrom
agent-of-mkmeral:feat/a2a-lifecycle-states
Open

feat(a2a): implement full A2A task lifecycle state support#2245
agent-of-mkmeral wants to merge 5 commits intostrands-agents:mainfrom
agent-of-mkmeral:feat/a2a-lifecycle-states

Conversation

@agent-of-mkmeral
Copy link
Copy Markdown
Contributor

@agent-of-mkmeral agent-of-mkmeral commented May 4, 2026

Summary

Implements full A2A task lifecycle state support in the Strands SDK. Previously, A2A tasks could only end in completed — errors crashed, cancellation was unsupported, and there was no way for an agent to ask the user for more input mid-task. This PR adds proper support for failed, canceled, input_required, rejected, and auth_required states.

Related to: #1371


What Actually Changes (Before → After)

1. Agent Errors No Longer Crash the Task

Before After
Behavior If agent.stream_async() threw an exception, it propagated as an unhandled ServerError — the client got a JSON-RPC error and no task state transition The exception is caught; the task gracefully transitions to TaskState.failed with a status message
Client sees {"error": {"code": -32603, "message": "Internal error"}} A proper TaskStatusUpdateEvent with state: "failed" and a message like "Agent execution failed"
Task state Stuck in working forever (zombie task) Cleanly in failed (terminal state)

2. Cancellation Now Works

Before After
Behavior cancel() always raised UnsupportedOperationError cancel() transitions the task to canceled and attempts cooperative agent shutdown
Client sees Error response saying operation not supported TaskStatusUpdateEvent with state: "canceled" + message
Agent Not signaled If agent has a cancel() method, it's called (best-effort)

3. Strands Interrupts → A2A input_required

This is the big one. Strands agents that use interrupts can now participate in multi-turn A2A workflows.

Before After
Behavior If agent returned with stop_reason="interrupt", the executor ignored it and called updater.complete() anyway The interrupt is detected; task transitions to input_required with a status message listing what input is needed
Client sees Premature completed with possibly empty/partial content TaskStatusUpdateEvent with state: "input-required" + message describing each interrupt (name + reason)
Workflow One-shot only — no HITL (human-in-the-loop) possible Client can send a follow-up message to resume the task

Example flow:

Client → send_message("Book me a flight to NYC")
Server → TaskStatus: working (streaming...)
Server → TaskStatus: input_required
         message: "Agent requires input:\n- confirm_booking: Please confirm: NYC flight on May 10 for $450"
Client → send_message("Yes, confirm it")
Server → TaskStatus: working (streaming...)
Server → TaskStatus: completed
         artifact: "Flight booked! Confirmation: ABC123"

4. Client-Side (A2AAgent) Handles All States

Before After
_is_complete_event() Only recognized TaskState.completed Recognizes all 6 lifecycle states as "stream complete" events
AgentResult.stop_reason Always "end_turn" "end_turn" for terminal states (completed/failed/canceled/rejected); "interrupt" for pausing states (input_required/auth_required)
AgentResult.state Empty {} Contains {"a2a_task_state": "failed"} (or whichever state) for programmatic inspection

Impact on A2A Tasks

Task State Machine (what this PR enables)

                    ┌──────────────────────────────────────────┐
                    │                                          │
  ┌─────────┐   ┌──▼───────┐   ┌────────────────┐   ┌─────────────┐
  │submitted│──►│ working  │──►│ input_required │──►│  completed  │
  └─────────┘   └──────────┘   └────────────────┘   └─────────────┘
                    │                                    
                    ├──────────────────────►┌────────┐
                    │         error         │ failed │
                    │                       └────────┘
                    │                                    
                    ├──────────────────────►┌──────────┐
                    │       cancel()        │ canceled │
                    │                       └──────────┘
                    │                                    
                    └──────────────────────►┌──────────┐
                          (future)          │ rejected │
                                            └──────────┘

Who Is Affected

User Impact
A2A server operators (using StrandsA2AExecutor) Tasks no longer zombie on errors. Cancel button works. Interrupt-based agents can now be exposed via A2A.
A2A clients (using A2AAgent) AgentResult.stop_reason now differentiates terminal vs. pausing states. state["a2a_task_state"] gives exact A2A state.
Existing users No breaking changes. Happy-path completed behavior is identical. The only visible difference: errors return failed status instead of crashing.

Backwards Compatibility

  • No dependency changes — uses existing a2a-sdk 0.3.26 APIs (TaskUpdater.failed(), .requires_input(), .cancel() were already there, just unused)
  • No API signature changesStrandsA2AExecutor, A2AAgent have same constructors and methods
  • Behavioral change (errors only): Previously, agent errors caused a JSON-RPC error response. Now they cause a proper failed task state. This is spec-correct and strictly better for clients.

Changes

Server-side (executor.py)

  • execute(): Wraps streaming in try/except → updater.failed() on error
  • _execute_streaming(): Extracts result from stream loop; checks stop_reason == "interrupt" after streaming
  • _handle_interrupt_result(): New method — builds descriptive message from interrupt list, calls updater.requires_input()
  • cancel(): Calls agent.cancel() if available, then updater.cancel(). Raises ServerError only if no task or already terminal
  • Error messages on InternalError: Now includes "No valid content found..." / "Request message is missing..."

Client-side (a2a_agent.py)

  • _is_complete_event(): Recognizes _COMPLETE_STATES (all 6 lifecycle states)
  • State sets derived from _STATE_TO_STOP_REASON in _converters.py (single source of truth)

Converters (_converters.py)

  • _STATE_TO_STOP_REASON: Canonical mapping of A2A state → Strands stop_reason
  • _extract_task_state(): Helper to pull TaskState from response tuples
  • convert_response_to_agent_result(): Uses mapping for stop_reason; adds a2a_task_state to AgentResult.state

Testing

  • 182 tests pass (existing + 22 new)
  • 98% coverage on executor.py (0 missing statements)
  • New test cases cover:
    • Error → failed transition
    • Error when task already terminal (graceful handling)
    • Single interrupt → input_required
    • Multiple interrupts → input_required with all listed
    • Cancel with valid task
    • Cancel with no task (error)
    • Cancel when already terminal (error)
    • Cancel with agent.cancel() exception (best-effort)
    • Client recognition of all 6 states
    • State-to-stop_reason mapping for all states
    • _extract_task_state helper

Quality

  • ✅ Lint clean (ruff check)
  • ✅ Format clean (ruff format)
  • ✅ 98% patch coverage (codecov threshold met)
  • ✅ Backwards compatible
  • ✅ No dependency changes
  • ✅ Structured logging throughout (task_id=<%s> | message)

cc @mkmeral

Implements all lifecycle states for the A2A protocol integration:

1. Error mapping → failed state:
   - Agent exceptions now gracefully transition tasks to TaskState.failed
   - Error details are communicated via the status message
   - ServerErrors (setup failures) still propagate as before

2. Cancel support:
   - cancel() now transitions tasks to TaskState.canceled with a message
   - Raises UnsupportedOperationError only when no current task exists
   - Handles already-terminal tasks gracefully

3. input_required via Strands interrupts:
   - Agent stop_reason='interrupt' maps to TaskState.input_required
   - Interrupt names and reasons are communicated to the client
   - Multiple interrupts are listed in the status message

4. Client-side handling of all states:
   - _is_complete_event recognizes all terminal states (completed, failed,
     canceled, rejected) and pausing states (input_required, auth_required)
   - convert_response_to_agent_result maps A2A states to Strands stop_reasons:
     * completed/failed/canceled/rejected → end_turn
     * input_required/auth_required → interrupt
   - AgentResult.state includes a2a_task_state for downstream inspection

All changes use the existing a2a-sdk 0.3.26 - no dependency changes needed.

Resolves: strands-agents#1371
- Fix B904: add 'from None' to re-raised ServerError in cancel()
- Fix F401: remove unused imports (TaskStatus, Part, TextPart, Role) in tests
- Fix I001: sort import blocks in test files
- Fix ruff formatting in executor.py and test_executor.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 98.70130% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/multiagent/a2a/executor.py 98.07% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@mkmeral mkmeral self-assigned this May 4, 2026
Copy link
Copy Markdown
Contributor Author

@agent-of-mkmeral agent-of-mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: FAIL — 2 bugs found. Both are TypeError crashes in _converters.py when A2A responses contain None parts lists.

Adversarial Testing Report

Scope: Full A2A lifecycle state support (executor error/interrupt/cancel handling, client-side state recognition, converter state mapping)
Tests written: 32
Tests passing: 30
Tests failing (findings): 2

# Category Severity Description
1 Bug Medium convert_response_to_agent_result crashes with TypeError when artifact.parts is None
2 Bug Medium convert_response_to_agent_result crashes with TypeError when status.message.parts is None
Finding 1 — Bug (Medium): TypeError when artifact.parts is None

Location: src/strands/multiagent/a2a/_converters.py, line 136

Root Cause: The code checks hasattr(update_event.artifact, "parts") but does NOT check if parts is actually not None before iterating. Since hasattr returns True even when the attribute is set to None, this crashes.

Reproduction:

from unittest.mock import MagicMock
from a2a.types import TaskArtifactUpdateEvent
from strands.multiagent.a2a._converters import convert_response_to_agent_result

task = MagicMock()
task.artifacts = None

event = MagicMock(spec=TaskArtifactUpdateEvent)
artifact = MagicMock()
artifact.parts = None  # Has attribute, but it's None
event.artifact = artifact

# CRASH: TypeError: 'NoneType' object is not iterable
result = convert_response_to_agent_result((task, event))

Observed behavior: TypeError: 'NoneType' object is not iterable at line 136
Expected behavior: Should return an empty AgentResult gracefully

Fix:

# Line 135-136: Change from:
if update_event.artifact and hasattr(update_event.artifact, "parts"):
    for part in update_event.artifact.parts:
# To:
if update_event.artifact and hasattr(update_event.artifact, "parts") and update_event.artifact.parts:
    for part in update_event.artifact.parts:
Finding 2 — Bug (Medium): TypeError when status.message.parts is None

Location: src/strands/multiagent/a2a/_converters.py, line 142

Root Cause: The code checks that update_event.status.message is truthy, but does NOT check if message.parts is not None before iterating.

Reproduction:

from unittest.mock import MagicMock
from a2a.types import TaskState, TaskStatusUpdateEvent
from strands.multiagent.a2a._converters import convert_response_to_agent_result

task = MagicMock()
task.artifacts = None

status = MagicMock()
status.state = TaskState.failed
message = MagicMock()
message.parts = None  # Message exists but parts is None
status.message = message

event = MagicMock(spec=TaskStatusUpdateEvent)
event.status = status

# CRASH: TypeError: 'NoneType' object is not iterable
result = convert_response_to_agent_result((task, event))

Observed behavior: TypeError: 'NoneType' object is not iterable at line 142
Expected behavior: Should handle gracefully and fall back to task.artifacts

Fix:

# Line 141-142: Change from:
if update_event.status and hasattr(update_event.status, "message") and update_event.status.message:
    for part in update_event.status.message.parts:
# To:
if update_event.status and hasattr(update_event.status, "message") and update_event.status.message and update_event.status.message.parts:
    for part in update_event.status.message.parts:
What Survived (30 tests passed)

Executor error handling — exceptions correctly transition to failed state
Executor interrupt handlingstop_reason="interrupt" correctly maps to input_required
Cancel support — properly transitions to canceled, handles already-terminal tasks
ServerError re-raise — setup failures propagate correctly (not caught as generic)
Client-side state recognition — all 6 lifecycle states correctly recognized
State-to-stop_reason mapping — all states map correctly to end_turn/interrupt
A2A-compliant streaming + lifecycle — interrupt/error work in new artifact mode
State cleanup on error_current_artifact_id and _is_first_chunk properly reset
Executor reuse after failure — state is clean for subsequent calls
Backward compatibility — normal completion, max_tokens, empty results all work
Edge cases — empty interrupts, None interrupts, None reasons, multiple results


🤖 AI agent response. Strands Agents. Feedback welcome!

Comment thread src/strands/multiagent/a2a/executor.py Outdated
Comment thread src/strands/multiagent/a2a/executor.py Outdated
Comment thread src/strands/multiagent/a2a/executor.py Outdated
Comment thread src/strands/multiagent/a2a/executor.py
Comment thread src/strands/agent/a2a_agent.py Outdated
Comment thread src/strands/agent/a2a_agent.py Outdated
Comment thread src/strands/multiagent/a2a/_converters.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Assessment: Comment

This PR is a well-structured implementation of the A2A task lifecycle states. The architectural decisions (error→failed, interrupt→input_required, cancel support) are sound and align well with the A2A protocol spec.

Review Categories
  • DRY / Single Source of Truth: State classification knowledge is duplicated between a2a_agent.py (_TERMINAL_STATES/_INPUT_STATES) and _converters.py (_STATE_TO_STOP_REASON). Consider consolidating to avoid divergence.
  • Security: Raw exception messages are passed to A2A clients in the failed state message, which could leak internal details.
  • Error Handling: Broad except RuntimeError catches could mask unrelated errors if the A2A SDK behavior changes.
  • Logging Style: New log messages don't follow the repo's structured logging format (field=<value> | message).
  • Documentation Accuracy: The cancel() docstring implies the agent execution is stopped, but the implementation only transitions the task state.

Good test coverage with both positive and negative cases, and backwards compatibility is maintained for existing completed state flows.

Comment thread src/strands/multiagent/a2a/executor.py Outdated
logger.warning("Cancellation requested but no current task found")
raise ServerError(error=UnsupportedOperationError()) from None

updater = TaskUpdater(event_queue, task.id, task.context_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent has cancel method. where do we actually use it? I don't see it. where do we cancel the current agent run?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in the latest push. cancel() now calls self.agent.cancel() if the method exists (cooperative cancellation):

# Attempt to stop the agent if it supports cancellation
if hasattr(self.agent, "cancel") and callable(self.agent.cancel):
    try:
        self.agent.cancel()
    except Exception:
        logger.debug("task_id=<%s> | agent cancel signal failed (non-critical)", task.id)

This is best-effort — if agent.cancel() fails, we still transition the A2A task state to canceled. The agent's current model call may complete, but no new iterations will start.

Docstring updated to reflect this accurately.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isn't the type of self.agent Agent? why do we need ducktyping checks?

Fixes:
- mypy error: _STATE_TO_STOP_REASON now typed as dict[TaskState, StopReason]
  instead of implicit str values (was: 'str' incompatible with Literal type)
- Bug: None parts crash in convert_response_to_agent_result (artifact.parts
  and message.parts checked for None before iteration)
- Security: error messages no longer expose raw exception details to clients
- Lint: removed unused variable 'e' in outer except clause (F841)

Review feedback addressed:
- Structured logging: all log messages now use 'task_id=<%s> | message' format
- cancel() docstring: accurately describes state-only transition + best-effort
  agent.cancel() call
- cancel() now calls agent.cancel() if method exists (cooperative cancellation)
- DRY: added _COMPLETE_STATES = _TERMINAL_STATES | _INPUT_STATES and simplified
  _is_complete_event to use single set membership test
- Type annotation: state dict uses dict[str, str] instead of bare dict
- RuntimeError catches: added comments explaining they guard TaskUpdater's
  terminal state enforcement

Coverage improvement:
- Added tests for: task-already-terminal error path, agent.cancel() call,
  agent.cancel() exception handling
- 181 tests pass (up from 178)
@agent-of-mkmeral
Copy link
Copy Markdown
Contributor Author

CI Fix Pushed + All Review Feedback Addressed

Why CI Failed

mypy type error — the _STATE_TO_STOP_REASON dict had implicit str values, but AgentResult.stop_reason expects StopReason (a Literal type):

src/strands/multiagent/a2a/_converters.py:169: error: Argument "stop_reason" to "AgentResult" 
has incompatible type "str"; expected "Literal['cancelled', 'checkpoint', ...]"  [arg-type]

Fix: Explicitly typed the dict as dict[TaskState, StopReason] and imported StopReason from types.event_loop.

What Else Was Fixed

Issue Fix
mypy arg-type error _STATE_TO_STOP_REASON: dict[TaskState, StopReason]
Bug: artifact.parts = None crash Added and update_event.artifact.parts guard
Bug: message.parts = None crash Added and update_event.status.message.parts guard
Security: raw exception in response Generic message: "Agent execution failed" (details only in logs)
Logging: non-structured format All messages now use task_id=<%s> | message format
cancel(): no agent stop Now calls self.agent.cancel() if method exists
DRY: duplicate state sets Added _COMPLETE_STATES = _TERMINAL_STATES | _INPUT_STATES
Type safety: bare dict Changed to dict[str, str]
Lint: F841 unused var Removed as e from outer except Exception
Coverage: 6 lines missing Added 3 new tests covering edge cases → 181 tests total

All tests pass locally. CI should be green now. @mkmeral

Comment thread src/strands/agent/a2a_agent.py Outdated
Comment thread src/strands/multiagent/a2a/executor.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Assessment: Approve

Great job addressing the previous review feedback. The security fix (generic error messages), cooperative agent.cancel() call, structured logging, accurate docstrings, and _COMPLETE_STATES simplification all look good.

Remaining Minor Items
  • Comment accuracy: The "Derived from _STATE_TO_STOP_REASON" comment on the state sets is misleading since they're still manually defined (not programmatically derived). Consider fixing the wording or actually deriving them.
  • One log line: Line 289 doesn't follow the structured format (low priority since there's no task context available).

These are non-blocking. The implementation is solid and well-tested.

Copy link
Copy Markdown
Contributor Author

@agent-of-mkmeral agent-of-mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: PASS — All previously-found bugs are fixed. 26 additional tests pass including A2A spec conformance validation against Google ADK and a2a-sdk 0.3.26.

Adversarial Testing Report (Re-run on 451cae9)

Scope: Full A2A lifecycle state support re-tested on latest commit + spec conformance against open source
Tests written: 26 (5 regression + 21 spec conformance)
Tests passing: 26
Tests failing: 0

Previously-Found Bugs: FIXED ✅

# Bug Status
1 artifact.parts is None crash Fixed (line 136 now checks and update_event.artifact.parts)
2 status.message.parts is None crash Fixed (line 142 now checks and update_event.status.message.parts)
3 task.artifacts[].parts fallback also had same pattern Fixed (line 150 now checks and artifact.parts)

Spec Conformance Validation

Validated against:

  • A2A Specification (google/A2A repo, spec 1.0)
  • a2a-sdk 0.3.26 (reference Python library)
  • Google ADK (google-adk, Agent Development Kit — the reference A2A executor implementation)
TaskState Alignment (7 tests)

✅ All 9 a2a-sdk TaskState enum values accounted for
✅ Terminal states match spec: {completed, failed, canceled, rejected}
✅ Pausing states match spec: {input_required, auth_required}
✅ Non-complete states (submitted, working, unknown) correctly ignored
✅ All lifecycle states mapped to stop_reasons
✅ Terminal → end_turn, Pausing → interrupt

Executor Behavior vs Google ADK (3 tests)

✅ Error → TaskState.failed (same pattern as Google ADK)
✅ Cancel with no task → ServerError (Strands actually does BETTER than ADK which just raises NotImplementedError)
✅ Cancel with task → TaskState.canceled (Strands implements proper cancel; ADK doesn't)

Client-Side Event Recognition (3 tests)

✅ Stream closes on terminal states (spec: "stream MUST close when task reaches terminal state")
✅ Stream pauses on input_required/auth_required
✅ Stream continues on submitted/working

Converter Output (4 tests)

completedend_turn with a2a_task_state: "completed"
failedend_turn with error content preserved
input_requiredinterrupt with input message
✅ Direct Message response → no state metadata

TaskUpdater API Compatibility (4 tests)

✅ All required methods exist: failed, cancel, requires_input, complete, add_artifact, update_status, new_agent_message
failed(), requires_input(), cancel() all accept message parameter

Previous Adversarial Tests Still Passing (5 tests)

artifact.parts = None no longer crashes
message.parts = None no longer crashes
task.artifacts[].parts = None fallback also protected
_STATE_TO_STOP_REASON properly typed (mypy fix)
✅ All 6 lifecycle states have mappings

Key Findings from Open Source Comparison

  1. Google ADK's cancel() raises NotImplementedError('Cancellation is not supported') — Strands actually provides superior A2A spec compliance by implementing proper cancel support.

  2. Google ADK's input_required uses LongRunningFunctions to detect when an agent needs user input and transitions to input_required — Strands maps this from the Strands interrupt mechanism, which is a clean and correct mapping.

  3. Error handling — Both Google ADK and Strands catch exceptions and transition to TaskState.failed with error messages. Strands additionally handles the edge case where updater.failed() itself raises (task already terminal).


🤖 AI agent response. Strands Agents. Feedback welcome!

Changes:
1. InternalError now includes descriptive messages:
   - 'No valid content found in request message parts'
   - 'Request message is missing or has no parts'
   (was: bare InternalError() with no explanation)

2. State sets derived from _STATE_TO_STOP_REASON (single source of truth):
   - _TERMINAL_STATES = {s for s, r in _STATE_TO_STOP_REASON.items() if r == 'end_turn'}
   - _INPUT_STATES = {s for s, r in _STATE_TO_STOP_REASON.items() if r == 'interrupt'}
   - Removed manual TaskState enum listing (no more duplication)

3. Structured logging on cancel no-task path:
   - context_id=<%s> | cancel requested but no current task found

4. Coverage: executor.py now 98% (0 missing statements)
   - Fixed test_execute_error_when_task_already_terminal: patches TaskUpdater
   - Added test_cancel_raises_when_task_already_terminal: covers RuntimeError path
   - 182 tests pass
@agent-of-mkmeral
Copy link
Copy Markdown
Contributor Author

Latest Push (2c8539f) — All Review Comments Addressed

What Changed

Issue Fix
"why mysterious server error?" InternalError(message="No valid content found...") — now tells client what went wrong
"+1 unify these" State sets now actually derived from _STATE_TO_STOP_REASON via set comprehensions
Line 289 logging Now uses context_id=<%s> | cancel requested...
codecov/patch < threshold executor.py now 98% coverage (0 missing statements, only branch partials)

Coverage Fix

Previous tests mocked at wrong level (event_queue). New tests properly patch TaskUpdater to raise RuntimeError:

  • test_execute_error_when_task_already_terminal — patches TaskUpdater.failed()
  • test_cancel_raises_when_task_already_terminal — patches TaskUpdater.cancel()

182 tests pass. @mkmeral

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Assessment: Approve

All previous review feedback has been addressed in the latest push. The implementation is clean, well-tested, and ready to merge.

Verified Fixes
  • ✅ State sets now properly derived from _STATE_TO_STOP_REASON (single source of truth)
  • ✅ Structured logging format used consistently (including context_id for cancel-without-task)
  • InternalError now includes descriptive messages for setup failures
  • cancel() calls self.agent.cancel() cooperatively + accurate docstring
  • ✅ Coverage gaps addressed with proper TaskUpdater mocking
  • TaskState import removed from a2a_agent.py (no longer needed)

No further comments.

Copy link
Copy Markdown
Contributor Author

@agent-of-mkmeral agent-of-mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: CONCERNED — Tests provide good coverage for the happy paths but have significant gaps in edge cases that could produce silent failures in production. 2 critical, 3 major, 2 minor findings.

🔍 Devil's Advocate Test Review

Scope: All tests added/modified in PR #2245 (lifecycle state support)
Tests reviewed: 26 new tests across 3 files
Findings: 2 critical, 3 major, 2 minor

Summary Verdict

The tests accurately verify the designed happy paths and state mappings. However, they fail to test the boundaries where the code's behavioral decisions are most questionable: empty interrupt lists silently completing tasks, asyncio.CancelledError leaving tasks in indeterminate states, and an inconsistent assertion in the most important state test. The test suite gives confidence that designed flows work but provides no assurance about unexpected flows.


Critical Findings

Finding 1 — Critical: asyncio.CancelledError leaves task in indeterminate state (UNTESTED)

Test: None (gap)
Source: executor.py lines 99-109 (execute() method)
Claims to verify: "Agent execution errors now gracefully transition the task to the failed state"
Actually verifies: Only RuntimeError (an Exception subclass) — test: test_execute_transitions_to_failed_on_streaming_error
Gap: asyncio.CancelledError is a BaseException, not Exception. The except Exception: block at line 107 does NOT catch it. When an asyncio task is cancelled (e.g., HTTP client timeout, server shutdown, concurrent task group cancellation), CancelledError propagates uncaught, and the A2A task is left in working state forever — it never transitions to failed or canceled.
Risk: In production with async frameworks (uvicorn, starlette), HTTP request cancellation raises CancelledError. The remote client sees the task stuck in working with no terminal state — a zombie task.
What good looks like: Either: (1) catch BaseException and transition to canceled/failed, or (2) explicitly test and document that CancelledError propagation is intentional and the framework handles cleanup.

Proof:

async def mock_stream(content_blocks, **kwargs):
    yield {"data": "partial"}
    raise asyncio.CancelledError()

# Result: CancelledError PROPAGATES. Task stays in working state.
Finding 2 — Critical: stop_reason="interrupt" with empty interrupts silently completes task (UNTESTED)

Test: None (gap)
Source: executor.py line 153: if result.stop_reason == "interrupt" and result.interrupts:
Claims to verify: "When the Strands Agent returns with stop_reason="interrupt", the task transitions to input_required"
Actually verifies: Only the case where result.interrupts has actual Interrupt objects
Gap: If result.interrupts = [] (empty list, which is falsy in Python), the condition evaluates to False and the task transitions to completed instead of input_required. The agent explicitly signaled it needs input, but the A2A task says "completed". No test validates this behavior or proves it's intentional.
Risk: A Strands agent with stop_reason="interrupt" but an empty interrupts list (possible during concurrent modifications or edge cases in interrupt handling) will silently succeed instead of pausing. The calling agent gets end_turn instead of interrupt, losing the signal entirely.
What good looks like: A test that explicitly asserts the behavior for interrupts=[] — either validating it should be treated as completion (with a docstring explaining why) or revealing it as a bug that should transition to input_required.

Proof:

result.stop_reason = "interrupt"
result.interrupts = []  # Falsy!
# Result: Task transitions to `completed`, NOT `input_required`

Major Findings

Finding 3 — Major: test_convert_response_completed_state_maps_to_end_turn missing state assertion

Test: test_converters.py::test_convert_response_completed_state_maps_to_end_turn
Claims to verify: "completed state maps to end_turn stop_reason" (name) + state metadata
Actually verifies: Only assert result.stop_reason == "end_turn" — does NOT assert result.state
Gap: Every other state test (failed, canceled, rejected, input_required, auth_required) asserts BOTH result.stop_reason AND result.state.get("a2a_task_state"). The completed test — arguably the MOST important one since it's the happy path — is the only one missing the state assertion.
Risk: If a regression removes the a2a_task_state: "completed" from the result state, no test catches it. Downstream consumers relying on result.state["a2a_task_state"] would break silently.
What good looks like: Add assert result.state.get("a2a_task_state") == "completed" (one line).

Finding 4 — Major: No test for TaskState.unknown mapping

Test: None (gap)
Source: _converters.py line 122: _STATE_TO_STOP_REASON.get(task_state, "end_turn") if task_state else "end_turn"
Claims to verify: PR description says "proper handling of all A2A task states"
Actually verifies: 6 of 9 TaskState values are tested. unknown, submitted, working are not tested in converters.
Gap: The a2a-sdk has TaskState.unknown. The code handles it via the .get() default (end_turn). But there's no explicit test proving that unknownend_turn is intentional. If the a2a-sdk adds new states in the future and the default is wrong, there's no guardrail.
Risk: Low-probability but high-impact: if a real A2A server returns unknown state, the client treats it as a successful completion with end_turn. Is that correct? Without a test, nobody knows if this was a deliberate decision or an oversight.
What good looks like: One test: assert _STATE_TO_STOP_REASON.get(TaskState.unknown, "end_turn") == "end_turn" with a docstring explaining the design choice.

Finding 5 — Major: No end-to-end wire format validation between server and client

Test: None (gap)
Source: executor.py (server) + a2a_agent.py (client)
Claims to verify: "A2A task lifecycle" works end-to-end
Actually verifies: Server tests verify events are enqueued correctly. Client tests verify events are recognized correctly. But there's NO test proving the server's output format is what the client expects.
Gap: The server-side test for input_required creates a TaskStatusUpdateEvent via TaskUpdater.requires_input(). The client-side test creates a TaskStatusUpdateEvent via MagicMock(spec=TaskStatusUpdateEvent). These could have different structures (e.g., the final flag, message format, context_id presence). No integration test validates compatibility.
Risk: If TaskUpdater.requires_input() produces a slightly different event shape than what _is_complete_event() checks for (e.g., the event has status=None in some edge case), the client would loop forever waiting for completion.
What good looks like: One integration test that creates a real TaskUpdater, calls .requires_input(), captures the enqueued event, and passes it to _is_complete_event().


Minor Findings

Finding 6 — Minor: test_execute_error_when_task_already_terminal doesn't verify event_queue state

Test: test_executor.py::test_execute_error_when_task_already_terminal
Claims to verify: "error handled gracefully when task already in terminal state"
Actually verifies: mock_updater.failed.assert_called_once() (that the attempt was made)
Gap: Doesn't verify that event_queue received no spurious events. If the code inadvertently enqueues a partial event before the error, this test wouldn't catch it.
What good looks like: Add assert mock_event_queue.enqueue_event.call_count == 1 (only the initial task creation event).

Finding 7 — Minor: State tests could be parameterized (DRY concern)

Tests: All 7 test_is_complete_event_*_state_* tests in test_a2a_agent.py
Issue: Each test is an identical 10-line copy with only the state value changed. Using @pytest.mark.parametrize would reduce 70 lines to ~15 lines and make it trivial to add new states when the a2a-sdk evolves.
Risk: None (correctness is fine). But when a2a-sdk 0.4.0 adds new states, it's easy to forget to add the corresponding test.
What good looks like: @pytest.mark.parametrize("state,expected", [(TaskState.failed, True), (TaskState.working, False), ...])


Missing Test Coverage (Priority Order)

Priority Scenario Why It Matters
Critical asyncio.CancelledError during stream Leaves zombie tasks in production
Critical stop_reason="interrupt" + interrupts=[] Silent data loss of interrupt signal
Major TaskState.unknown converter behavior Future-proofing against SDK changes
Major Server→Client wire format round-trip Integration gap between components
Minor _extract_task_state with TaskArtifactUpdateEvent input Returns None (untested path)

What's Actually Good

The tests that DO exist are well-structured: they use real A2A types (not just strings), verify both the state transition AND the message content, test the error-already-terminal edge case, and verify cooperative cancellation (agent.cancel() called + exception swallowed). The multi-interrupt test is particularly good — it validates that all interrupt names and reasons appear in the status message.


Recommendations

  1. Add asyncio.CancelledError handling in execute() — either except BaseException with proper cancel transition, or document the deliberate propagation. Either way, add a test.
  2. Decide on interrupts=[] behavior — add a test that either asserts completion (documenting the intentional fallthrough) or fix it to treat any stop_reason="interrupt" as input_required.
  3. **Add assert result.state.get("a2a_task_state") == "completed"** to the completed converter test (1-line fix).
  4. Add one TaskState.unknown test to document the default behavior.
  5. Consider one integration test that runs the executor and feeds its output to the client's _is_complete_event.

🤖 AI agent response. Strands Agents. Feedback welcome!

…bugs

Devil's Advocate Review Findings Addressed:

Critical (2):
1. asyncio.CancelledError now transitions task to 'canceled' state before
   re-raising. Previously, CancelledError (BaseException, not Exception) would
   propagate uncaught, leaving the A2A task stuck in 'working' forever (zombie).
   - Added explicit 'except asyncio.CancelledError' handler in execute()
   - Transitions to canceled, then re-raises for framework cleanup
   - Handles edge case where task is already terminal (RuntimeError)

2. stop_reason='interrupt' with empty/None interrupts list no longer silently
   completes the task. The stop_reason is now the authoritative signal — if the
   agent says 'interrupt', we transition to input_required regardless of whether
   the interrupts list is populated.
   - Removed 'and result.interrupts' from the condition
   - Added fallback message: 'Agent requires additional input to continue'

Major (3):
3. test_convert_response_completed_state now asserts result.state metadata
   (was the only lifecycle test missing this assertion)
4. Added test for TaskState.unknown → end_turn default behavior
5. Added test_state_to_stop_reason_covers_all_lifecycle_states (guards
   against future a2a-sdk additions we miss)

Minor (2):
6. Added test_extract_task_state_from_artifact_update_returns_none
7. Added parametrized test covering ALL 9 TaskState values for
   _is_complete_event (replaces verbose individual tests)

Code fixes:
- cancel(): Removed hasattr/callable duck-typing (nit from review),
  now uses try/except (AttributeError, NotImplementedError) directly
- Added 'import asyncio' to executor.py

Tests: 201 pass (was 182)
@agent-of-mkmeral
Copy link
Copy Markdown
Contributor Author

All Devil's Advocate Findings Addressed

Commit: a66e210
Tests: 201 pass (was 182)

Critical Fixes (Code + Tests)

# Finding Fix
1 asyncio.CancelledError left tasks as zombies in working state Added except asyncio.CancelledError handler → transitions to canceled before re-raising
2 stop_reason='interrupt' with empty interrupts=[] silently completed task Removed and result.interrupts condition — stop_reason is now the authoritative signal

Major Fixes (Tests)

# Finding Fix
3 Completed state test missing result.state assertion Added assert result.state.get('a2a_task_state') == 'completed'
4 No test for TaskState.unknown mapping Added test documenting unknown → end_turn default (intentional design decision)
5 No guard against future a2a-sdk state additions Added test_state_to_stop_reason_covers_all_lifecycle_states

Minor Fixes

# Finding Fix
6 _extract_task_state with artifact event untested Added test (returns None)
7 State tests verbose/hard to extend Added @pytest.mark.parametrize covering all 9 TaskState values

Also Fixed (from mkmeral's nit)

  • Removed duck-typing hasattr/callable check for agent.cancel() — now uses try/except directly

🤖 AI agent response. Strands Agents. Feedback welcome!

raise ServerError(error=UnsupportedOperationError()) from None

# Attempt to cooperatively cancel the agent's execution (best-effort).
# Agent.cancel() may not exist on all implementations, so we guard with hasattr.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The comment says "we guard with hasattr" but the implementation actually uses try/except on AttributeError (which is the right approach per mkmeral's feedback). The comment is a leftover from the previous iteration.

Suggestion: Update the comment to match the implementation:

# Attempt to cooperatively cancel the agent's execution (best-effort).
# Catches AttributeError/NotImplementedError if agent doesn't support cancel().

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Assessment: Approve

Excellent iteration. The two critical findings (asyncio.CancelledError zombie prevention and empty-interrupts edge case) are real bugs that have been properly fixed and tested. The code is now robust, well-documented, and comprehensive.

New Changes Verified
  • asyncio.CancelledError handling — transitions to canceled before re-raising (prevents zombie tasks)
  • stop_reason="interrupt" is now the authoritative signal (empty/None interrupts still trigger input_required)
  • cancel() uses try/except AttributeError instead of hasattr (per mkmeral's feedback)
  • ✅ Parametrized tests covering all 9 TaskState values
  • ✅ Guard test (test_state_to_stop_reason_covers_all_lifecycle_states) prevents future state drift
  • TaskState.unknown default behavior documented and tested

One trivial nit: line 318 comment says "we guard with hasattr" but the implementation uses try/except. Non-blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants