fix(streamable-http): prevent SSE reconnect loops with shadow channels by its-mash · Pull Request #660 · modelcontextprotocol/rust-sdk

its-mash · 2026-02-14T01:11:23Z

Summary

Fixes infinite SSE reconnect loops caused by multiple competing EventSource connections in the Streamable HTTP server session layer. Tested with Cursor, VS Code, and Claude Desktop.

Problem

When the server sends POST SSE responses with a retry field (priming event), the browser's EventSource API automatically reconnects via GET after the stream ends. This creates multiple competing EventSource connections — one from the initial standalone GET, and additional ones from completed POST responses (initialize, tools/list, etc.).

Each reconnecting GET called resume() which replaced self.common.tx (the common channel sender). Dropping the old sender closed the old receiver, terminating the OTHER EventSource's stream. Both reconnected every sse_retry seconds (3s), leapfrogging each other in an infinite loop.

Root Cause Analysis

The connection lifecycle that causes the loop:

1. Client sends POST initialize → SSE response with priming (retry: 3000) → stream ends
2. Client sends GET (standalone) → becomes primary common channel
3. POST EventSource reconnects via GET (3s later) → replaces common.tx → kills stream from step 2
4. GET from step 2's EventSource reconnects → replaces common.tx → kills stream from step 3
5. Repeat every 3 seconds indefinitely

Solution — Shadow Channels

Instead of always replacing the common channel sender on resume, the new resume_or_shadow_common() method checks whether the primary common channel is still active:

Primary dead (tx.is_closed()) → Replace it. The new stream becomes the primary notification channel.
Primary alive → Create a shadow stream — an idle SSE connection kept alive by SSE keep-alive pings (: comments) that does NOT receive notifications and does NOT replace the primary channel.

Shadow streams prevent competing EventSource connections from killing each other while keeping the HTTP connections alive (so EventSource doesn't reconnect again).

Changes

Commit	Description
`8bd424e`	Initial 409 Conflict approach (returned error on duplicate standalone stream)
`0d03eb5`	Handle resume with completed request-wise channels (fall through to common)
`a7bb822`	Remove 409 Conflict — allow channel replacement per MCP spec ("client MAY remain connected to multiple SSE streams")
`7cf5406`	Skip cache replay (`sync`) when replacing active streams (replaying list_changed notifications caused notification loops)
`a7df58c`	Shadow channels — the final fix that prevents the leapfrog loop entirely

Key Design Decisions

No cache replay on common channel resume: Server-initiated notifications (tools/list_changed, resources/list_changed) are idempotent signals. Replaying cached ones causes clients to re-process old events, triggering unnecessary re-fetches. Missing one is harmless — the next real event will arrive naturally.
Shadow streams are idle: They don't receive notifications — only the primary common channel does. This is intentional: the reconnecting POST EventSources don't need notifications (their purpose — delivering the POST response — is already fulfilled). They just need to stay alive so EventSource stops reconnecting.
Automatic cleanup: Closed shadow senders (from client disconnection) are cleaned up via retain(!is_closed()) on each resume() call and clear() on close_sse_stream().
Request-wise channels still sync: Only the common channel skips replay. Request-wise channels (POST response streams) still use sync() for proper resumption, since those carry actual request/response data that may need replay.

Files Changed

crates/rmcp/src/transport/streamable_http_server/session/local.rs
- Added shadow_txs: Vec<Sender<ServerSseMessage>> to LocalSessionWorker
- New method resume_or_shadow_common() with primary-alive check
- Updated resume() to use shadow logic for both direct common and request-wise fallback paths
- Updated close_sse_stream() to clear shadow senders
- Updated create_local_session() to initialize shadow_txs

Test Plan

Cursor connects and initializes successfully (no 409/500 errors)
Cursor does NOT enter infinite GET reconnect loop after connection
Feature changes trigger exactly one batch of list_changed notifications
Cursor receives and processes notifications correctly (re-fetches tools/resources)
No notification replay loop (no repeated ResourceListChanged every 3s)
VS Code connects and works correctly (unaffected by changes)
cargo check --workspace passes

crates/rmcp/src/transport/streamable_http_server/session/local.rs

crates/rmcp/Cargo.toml

DaleSeo

Thanks for addressing my feedback, @its-mash. The code looks great now! I just have a couple of small suggestions.

To fix the failing checks, please rebase your branch onto the latest main. I think it's because we recently upgraded to reqwest 0.13.

DaleSeo · 2026-02-18T21:18:05Z

crates/rmcp/src/transport/streamable_http_server/session/local.rs

+        &mut self,
+        last_event_index: usize,
+    ) -> Result<StreamableHttpMessageReceiver, SessionError> {
+        let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);


Shadow streams are idle by design. They don’t receive notifications, just SSE keep-alive pings. However, they still get the full channel_capacity here.

Since only the primary replacement path needs that full capacity, we might want to use a minimal buffer for shadows:

Suggested change

let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);

let is_replacing_dead_primary = self.common.tx.is_closed();

let capacity = if is_replacing_dead_primary {

self.session_config.channel_capacity

} else {

1 // Shadow streams only need keep-alive pings

};

let (tx, rx) = tokio::sync::mpsc::channel(capacity);

agree, updated

DaleSeo · 2026-02-18T21:18:55Z

crates/rmcp/src/transport/streamable_http_server/session/local.rs

+                shadow_count = self.shadow_txs.len(),
+                "Common channel active, creating shadow stream"
+            );
+            self.shadow_txs.push(tx);


The shadow_txs: Vec<Sender<ServerSseMessage>> vector can grow indefinitely. If a client misbehaves or has a bug, it could open hundreds of GET streams, each creating a new sender. The cleanup with retain(!is_closed()) only happens at the beginning of resume(), so there's a period when all shadows are active at the same time. It would be helpful to add an upper limit.

Suggested change

self.shadow_txs.push(tx);

const MAX_SHADOW_STREAMS: usize = 32;

if self.shadow_txs.len() >= MAX_SHADOW_STREAMS {

tracing::warn!(

shadow_count = self.shadow_txs.len(),

"Shadow stream limit reached, dropping oldest"

);

self.shadow_txs.remove(0);

}

self.shadow_txs.push(tx);

agree, updated

…already active LocalSessionWorker::resume() unconditionally replaced self.common.tx on every GET request, orphaning the receiver the first SSE stream was reading from. All subsequent server-to-client notifications were sent to the new sender while the original client was still listening on the old, now-dead receiver. notify_tool_list_changed().await returned Ok(()) silently. This is triggered by VS Code's MCP extension which reconnects SSE every ~5 minutes with the same session ID. Fix: Check tx.is_closed() before replacing the common channel sender. If an active stream exists, return SessionError::Conflict which is propagated as HTTP 409 Conflict. This matches the TypeScript SDK behavior (streamableHttp.ts:423). Signed-off-by: Mohammod Al Amin Ashik <maa.ashik00@gmail.com>

When a client sends GET with Last-Event-ID from a completed POST SSE response, the request-wise channel no longer exists in tx_router. Previously this returned ChannelClosed -> 500, causing clients like Cursor to enter an infinite re-initialization loop. Now falls back to the common channel when the request-wise channel is completed, per MCP spec: "Resumption applies regardless of how the original stream was initiated (POST or GET)."

Per MCP spec §Streamable HTTP, "The client MAY remain connected to multiple SSE streams simultaneously." Returning 409 Conflict when a second GET arrives causes Cursor to enter an infinite re-initialization loop (~3s cycle). Instead of rejecting, replace the old common channel sender. Dropping the old sender closes the old receiver, cleanly terminating the previous SSE stream so the client can reconnect on the new stream. This fixes both code paths: - GET with Last-Event-ID from a completed POST SSE response - GET without Last-Event-ID (standalone stream reconnection)

When a client opens a new GET SSE stream while a previous one is still active, the old sender is dropped (terminating the old stream) and a new channel is created. Previously, sync() replayed all cached events to the new stream, but the client already received those events on the old stream. This caused an infinite notification loop: 1. Client receives notifications (e.g. ResourceListChanged) 2. Old SSE stream dies (sender replaced) 3. Client reconnects after sse_retry (3s) 4. sync() replays cached notifications the client already handled 5. Client processes them again → goto 2 Fix: check tx.is_closed() BEFORE replacing the sender. If the old stream was still alive, skip replay entirely — the client already has those events. Only replay when the old stream was genuinely dead (network failure, timeout) so the client catches up on missed events.

When POST SSE responses include a `retry` field, the browser's EventSource automatically reconnects via GET after the stream ends. This creates multiple competing EventSource connections that each replace the common channel sender, killing the other stream's receiver. Both reconnect every sse_retry seconds, creating an infinite loop. Instead of always replacing the common channel, check if the primary is still active. If so, create a "shadow" stream — an idle SSE connection kept alive by keep-alive pings that doesn't receive notifications or interfere with the primary channel. Also removes cache replay (sync) on common channel resume, as replaying server-initiated list_changed notifications causes clients to re-process old signals. Signed-off-by: Myko Ash <myko@mcpmux.com> Signed-off-by: Mohammod Al Amin Ashik <maa.ashik00@gmail.com>

Rewrite test suite for SSE channel replacement fix: - Shadow creation: standalone GET returns 200, multiple GETs coexist - Dead primary: replacement, notification delivery, repeated cycles - Notification routing: primary receives, shadow does not - Resume paths: completed request-wise, common alive/dead - Real scenarios: Cursor leapfrog, VS Code reconnect - Edge cases: invalid session, missing header, shadow cleanup Fix Accept header bug (was missing text/event-stream for notifications/initialized POST, causing 406 rejection).

MCP spec (2025-11-25) section "Session Management" requires: - Missing session ID header → 400 Bad Request (not 401) - Unknown/terminated session → 404 Not Found (not 401) Using 401 Unauthorized caused MCP clients (e.g. VS Code) to trigger full OAuth re-authentication on server restart, instead of simply re-initializing the session. Signed-off-by: Mohammod Al Amin Ashik <maa.ashik00@gmail.com>

…sync on resume, rename test - Remove unused SessionError::Conflict and dead string-matching in tower.rs (leftover from abandoned 409 approach) - Restore sync() replay when replacing a dead primary common channel so server-initiated requests and cached notifications are not lost on reconnect - Rename test from test_sse_channel_replacement_bug to test_sse_concurrent_streams per reviewer suggestion (describe what tests verify, not what triggered them) - Add test for cache replay on dead primary replacement - Use generic "MCP clients" in comments instead of specific client names Signed-off-by: Mohammod Al Amin Ashik <maa.ashik00@gmail.com>

- Shadow streams only receive SSE keep-alive pings, so use capacity 1 instead of full channel_capacity - Cap shadow_txs at 32 to prevent unbounded growth from misbehaving clients, dropping the oldest shadow when the limit is reached - Add test verifying primary works after exceeding shadow limit Signed-off-by: Mohammod Al Amin Ashik <maa.ashik00@gmail.com>

alexhancock · 2026-02-19T16:24:28Z

@its-mash Can you fix this minor lint issue

error: this import is redundant
  --> crates/rmcp/tests/test_sse_concurrent_streams.rs:18:1
   |
18 | use reqwest;
   | ^^^^^^^^^^^^ help: remove it entirely
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#single_component_path_imports
   = note: `-D clippy::single-component-path-imports` implied by `-D warnings`
   = help: to override `-D warnings` add `#[allow(clippy::single_component_path_imports)]`

github-actions bot added T-dependencies Dependencies related changes T-test Testing related changes T-config Configuration file changes T-core Core library changes T-transport Transport layer changes labels Feb 14, 2026

its-mash force-pushed the fix/sse-channel-replacement-conflict branch from a727949 to 8bd424e Compare February 14, 2026 01:13

its-mash changed the title ~~fix(streamable-http): return 409 Conflict when standalone SSE stream already active~~ fix(streamable-http): prevent SSE reconnect loops with shadow channels Feb 14, 2026

its-mash mentioned this pull request Feb 14, 2026

Streamable HTTP: Multiple SSE streams cause infinite reconnect loop #659

Open

7 tasks

its-mash force-pushed the fix/sse-channel-replacement-conflict branch from a4b8ffe to c3a557e Compare February 14, 2026 07:02

its-mash mentioned this pull request Feb 14, 2026

feat: Streamable HTTP transport with SSE notifications and E2E tests mcpmux/mcp-mux#61

Merged

6 tasks

DaleSeo reviewed Feb 17, 2026

View reviewed changes

crates/rmcp/src/transport/streamable_http_server/session/local.rs Outdated Show resolved Hide resolved

crates/rmcp/src/transport/streamable_http_server/session/local.rs Outdated Show resolved Hide resolved

crates/rmcp/Cargo.toml Outdated Show resolved Hide resolved

its-mash force-pushed the fix/sse-channel-replacement-conflict branch from 84cf667 to 4fd9ae7 Compare February 18, 2026 07:12

DaleSeo reviewed Feb 18, 2026

View reviewed changes

its-mash added 9 commits February 19, 2026 12:28

its-mash force-pushed the fix/sse-channel-replacement-conflict branch from 4fd9ae7 to e01dacb Compare February 19, 2026 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streamable-http): prevent SSE reconnect loops with shadow channels#660

fix(streamable-http): prevent SSE reconnect loops with shadow channels#660
its-mash wants to merge 9 commits intomodelcontextprotocol:mainfrom
mcpmux:fix/sse-channel-replacement-conflict

its-mash commented Feb 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaleSeo left a comment •

edited

Loading

Uh oh!

DaleSeo Feb 18, 2026

Uh oh!

its-mash Feb 19, 2026

Uh oh!

DaleSeo Feb 18, 2026

Uh oh!

its-mash Feb 19, 2026

Uh oh!

alexhancock commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

-        let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
+        let is_replacing_dead_primary = self.common.tx.is_closed();
+        let capacity = if is_replacing_dead_primary {
+            self.session_config.channel_capacity
+        } else {
+// Shadow streams only need keep-alive pings
+        };
+        let (tx, rx) = tokio::sync::mpsc::channel(capacity);

-            self.shadow_txs.push(tx);
+            const MAX_SHADOW_STREAMS: usize = 32;
+            if self.shadow_txs.len() >= MAX_SHADOW_STREAMS {
+                tracing::warn!(
+                    shadow_count = self.shadow_txs.len(),
+                    "Shadow stream limit reached, dropping oldest"
+                );
+                self.shadow_txs.remove(0);
+            }
+            self.shadow_txs.push(tx);

Conversation

its-mash commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause Analysis

Solution — Shadow Channels

Changes

Key Design Decisions

Files Changed

Test Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaleSeo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaleSeo Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

its-mash Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

DaleSeo Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

its-mash Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

alexhancock commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

its-mash commented Feb 14, 2026 •

edited

Loading

DaleSeo left a comment •

edited

Loading