Skip to content

Throw descriptive error when sandbox is killed mid-request#291

Open
mishushakov wants to merge 6 commits into
mainfrom
mishushakov/handle-sandbox-killed-econnreset
Open

Throw descriptive error when sandbox is killed mid-request#291
mishushakov wants to merge 6 commits into
mainfrom
mishushakov/handle-sandbox-killed-econnreset

Conversation

@mishushakov

Copy link
Copy Markdown
Member

When the sandbox is killed or times out while a request is in flight, runCode/run_code and the context-management methods surfaced a raw socket error (e.g. ECONNRESET on Bun, TypeError: fetch failed on Node, httpx.ReadError/RemoteProtocolError in Python). Both SDKs now detect the closed connection across runtimes, confirm via the sandbox health check that it's actually gone, and throw a descriptive SandboxError/SandboxException suggesting timeoutMs/.setTimeout instead. If the sandbox is still running or its state can't be determined, the original error propagates unchanged so genuine network issues aren't masked. Includes kill-during-execution tests for JS, Python sync, and Python async (verified against the live API on Node and Bun), plus a patch changeset for both packages.

🤖 Generated with Claude Code

When the sandbox is killed or times out while a request to the Jupyter
server is in flight (runCode/run_code or context management), the SDKs
surfaced a raw socket error (e.g. ECONNRESET). Now they detect the
closed connection, confirm the sandbox is gone via its health check,
and throw a descriptive SandboxError/SandboxException instead. If the
sandbox is still running (or its state can't be determined), the
original error propagates unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 11, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Narrow change to error mapping on existing request paths; real network blips stay as original errors when the sandbox still appears running or state cannot be checked.

Overview
When a sandbox dies or times out during runCode/run_code or context API calls, clients used to surface low-level connection failures (ECONNRESET, fetch TypeError, httpx.ReadError/RemoteProtocolError). Both JS and Python now treat those as possible mid-flight kills, confirm with isRunning() (and leave the original error if the check fails or the sandbox is still up), then raise a TimeoutError/TimeoutException that explains the sandbox was killed and points at timeout options.

JS adds isConnectionClosedError for runtime-specific socket-reset shapes and routes request failures through handleRequestError instead of only formatRequestTimeoutError. Python adds format_sandbox_killed_error and _handle_connection_error on the same code paths. Integration tests kill the sandbox during a long sleep for JS, sync Python, and async Python; a patch changeset covers both packages.

Reviewed by Cursor Bugbot for commit bcaab38. Bugbot is set up for automated code reviews on this repo. Configure here.

mishushakov and others added 2 commits June 11, 2026 14:10
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Matches the existing 502 mapping in extractError/extract_exception and
the base SDK convention: a dead sandbox surfaces as TimeoutError /
TimeoutException. When the health probe is inconclusive or the sandbox
is still running, the original transport error propagates unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 17cce6a. Configure here.

Comment thread js/tests/killedSandbox.test.ts Outdated
mishushakov and others added 3 commits June 16, 2026 15:54
Consolidate the two-line catch handler into a single
formatRequestError call that returns the error to throw, matching
the main SDK pattern (e2b-dev/E2B#1419).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Align with e2b-dev/E2B#1419, which names the health-check-aware
error wrappers handle*Error / handle_*_exception:
- JS: formatRequestError -> handleRequestError
- Python: _raise_if_sandbox_killed -> _handle_connection_error

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The kill-during-execution tests used time.sleep(60), which in JS
matched the default execution timeout (DEFAULT_TIMEOUT_MS = 60s). A
slow kill could let the body-timer abort (or the sleep completing)
end the request instead of the connection reset, masking the
sandbox-killed path the test asserts.

Bump the sleep to 300s and set an explicit execution timeout well
beyond the kill + disconnect-detection window so the sandbox kill is
the only thing that ends the request, matching the interrupt test's
convention. Add a 60s vitest timeout to the JS test for the
disconnect-detection window.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants