fix: add panic recovery to goroutines that execute user code by blackwell-systems · Pull Request #959 · modelcontextprotocol/go-sdk

blackwell-systems · 2026-05-14T19:45:36Z

Implementation for #958.

Problem

Library goroutines that dispatch user-provided handlers (tool handlers, middleware, etc.) lack defer recover(). A panic in user code crashes the entire host process.

As a library, the SDK doesn't own the process, doesn't control the supervisor, and doesn't know what else runs in the same address space. A panicking goroutine takes down the consumer's HTTP server, metrics exporters, graceful shutdown hooks, and any other in-flight work.

Fix

Add recovery to 5 goroutine sites:

File	Goroutine	Recovery behavior
`internal/jsonrpc2/conn.go`	`handleAsync` handler dispatch	Sends INTERNAL_ERROR JSON-RPC response to client
`mcp/shared.go`	keepalive goroutine	Logs via structured logger, goroutine exits
`mcp/sse.go`	SSE reader goroutine	Logs via `slog.Default()`, `defer Close()` still fires
`mcp/streamable.go`	`handleJSON` response processor	Calls `c.fail()` to signal connection error
`mcp/streamable.go`	`handleSSE` stream processor	Calls `c.fail()` to signal connection error
`mcp/transport.go`	stdio reader goroutine	Sends error on incoming channel

The critical path is handleAsync in internal/jsonrpc2/conn.go: this is where ALL tool, prompt, and resource handlers execute. A panicking tool handler currently kills the process; with this fix it returns an error response to the client and the server continues serving other sessions.

Tests

TestToolHandler_PanicRecovery: registers a tool that panics, calls it, verifies process survives and error is returned
TestToolHandler_PanicDoesNotAffectSubsequentCalls: panics one tool, then calls a different tool successfully, proving the server continues operating

Both tests crash without the fix (verified by reverting and running).

Notes

Defer order is verified correct in all sites (LIFO semantics)
go vet, -race, and full test suite pass
Happy to adjust the approach based on feedback from Library goroutines lack panic recovery (9 goroutines can crash host process) #958

Library goroutines that dispatch user-provided handlers (tool handlers, middleware, etc.) lack defer recover(). A panic in user code crashes the entire host process. Add recovery to 5 goroutine sites: - internal/jsonrpc2/conn.go: handleAsync handler dispatch (critical path: ALL tool/prompt/resource handlers execute here) - mcp/shared.go: keepalive goroutine - mcp/sse.go: SSE reader goroutine - mcp/streamable.go: handleJSON and handleSSE response processors - mcp/transport.go: stdio reader goroutine Recovery behavior: - handleAsync: sends INTERNAL_ERROR JSON-RPC response to client (graceful) - handleJSON/handleSSE: calls c.fail() to signal connection error - keepalive: logs via structured logger, goroutine exits - SSE reader: logs via slog.Default(), defer Close() still fires - stdio reader: sends error on incoming channel Tests added: - TestToolHandler_PanicRecovery: proves panic doesn't crash process - TestToolHandler_PanicDoesNotAffectSubsequentCalls: proves server continues serving after a panic Fixes modelcontextprotocol#958

guglielmo-san · 2026-05-15T10:53:07Z

+	defer func() {
+		if r := recover(); r != nil {
+			c.fail(fmt.Errorf("%s: panic in handleJSON: %v", requestSummary, r))
+		}


What is the threat this recover() is intended to address?
A recover() here wouldn't catch panics from user handlers since those execute on the goroutine spawned in handleAsync.

blackwell-systems · 2026-05-15T16:14:23Z

The recover is inside the go func() spawned by handleAsync (line 640), on the same goroutine that calls c.handler.Handle(ctx, req.Request). The full call chain from that goroutine is synchronous:

Handle → ServerSession.handle (server.go:1445)
ServerSession.handle → handleReceive (shared.go:157)
handleReceive → mh(ctx, method, req) (shared.go:171, comment: "mh might be user code")
mh → the user's tool handler

No additional goroutine is spawned between the recover and the user handler. jsonrpc2.Async(ctx) (called at server.go:1466) releases a concurrency semaphore so the next request can start being handled; it does not spawn a new goroutine.

The test TestToolHandler_PanicRecovery confirms: a tool handler calling panic("deliberate panic") is caught, the client receives an error response, and the process survives. Without the recover, that test crashes the binary.

jba · 2026-05-16T00:51:00Z

Please see my comment on #958. Recovering from panics is not the right approach for these sorts of faults.

guglielmo-san reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add panic recovery to goroutines that execute user code#959

fix: add panic recovery to goroutines that execute user code#959
blackwell-systems wants to merge 1 commit into
modelcontextprotocol:mainfrom
blackwell-systems:fix/goroutine-panic-recovery

blackwell-systems commented May 14, 2026

Uh oh!

guglielmo-san May 15, 2026

Uh oh!

blackwell-systems commented May 15, 2026

Uh oh!

jba commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

blackwell-systems commented May 14, 2026

Problem

Fix

Tests

Notes

Uh oh!

guglielmo-san May 15, 2026

Choose a reason for hiding this comment

Uh oh!

blackwell-systems commented May 15, 2026

Uh oh!

jba commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants