fix: handle WebGL context loss and add stream auto-reconnect (#584)#585
fix: handle WebGL context loss and add stream auto-reconnect (#584)#585claude[bot] wants to merge 5 commits intomainfrom
Conversation
The 3D scene could go white and lose all objects due to two issues: 1. No WebGL context loss handling — when the browser evicts a WebGL context (common with multiple 3D tabs), the scene goes blank with no recovery. Add a ContextLossHandler component that calls preventDefault() on context loss (allowing restoration) and invalidates the renderer when the context is restored. 2. No auto-reconnect for Connect-RPC streams — when streamEntityChanges or streamSceneChanges errors out, the stream dies permanently. Add a retryStream utility with exponential backoff (1s–30s) that automatically reconnects failed streams, preserving existing entities in the scene during brief disconnections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 37d17e1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
@claude Where is the evidence that opening multiple 3D tabs can cause context loss? |
|
Claude finished @micheal-parks's task in 1m 2s —— View job Evidence for multi-tab WebGL context lossShort answer: the PR cites no direct evidence — no reproduction steps, no console logs, no user report attributing the white scene specifically to context loss from multiple tabs. It's a hypothesis presented as a confirmed root cause. What is true at the browser/spec level:
What the PR does not provide:
The If you want to validate the multi-tab scenario, you can trigger context loss artificially with: canvas.getContext('webgl2').getExtension('WEBGL_lose_context').loseContext()and confirm the handler fires and the scene recovers. |
When the server closes the stream cleanly, retryStream was incorrectly calling onRetry (which sets connectionStatus=DISCONNECTED) and sleeping 1s before reconnecting. This caused a spurious disconnect flash and unnecessary delay on normal stream restarts. Also restores error logging that was dropped when the original try/catch was replaced with retryStream. Adds a test verifying onRetry is not called on clean stream end.
There was a problem hiding this comment.
Two bugs fixed directly in a follow-up commit; one minor nit left as an inline comment.
Fixed: onRetry fired on clean stream end
retryStream called onRetry (and slept for 1 s) regardless of whether run threw or ended cleanly. For streamEntityChanges, onRetry sets connectionStatus = DISCONNECTED, so any normal server-side stream close would flash a disconnect banner and delay reconnect by 1 s unnecessarily.
Fix: track whether the iteration errored; call onRetry/sleep only in that branch, and restart immediately after a clean end. The "resets delay after a successful run" test already exercised this path but didn't assert onRetry call count, so the bug was masked — a new test does not call onRetry and restarts immediately on clean stream end covers it explicitly.
Fixed: errors silently swallowed
The original streamEntityChanges/streamSceneChanges logged errors with console.error. The new retryStream catch block dropped the error entirely, making transient stream failures invisible in devtools. Fixed by adding console.warn in the catch block (warn rather than error because the failure is transient and will be retried).
Nit (inline): console.warn on context restored
See inline comment on ContextLossHandler.svelte line 14 — console.warn is semantically wrong for a successful recovery; console.info is more appropriate.
Everything else looks good
ContextLossHandler:preventDefault()on context lost is the correct mechanism;invalidate()on restore is sufficient for Threlte's on-demand renderer. Canvas reference is stable and captured correctly.retryStreamsignal propagation: innersigis correctly passed toclient.streamEntityChangesso abort cancels the active stream, not just the retry loop.- Cleanup in
provideDrawService's effect teardown destroys all entities and aborts the controller — reconnect won't leak ECS entities. - Existing tests are all logically correct and still pass with the fix applied (traced manually; browser Playwright runner unavailable in this environment).
|
@claude Two issues:
|
|
Claude finished @micheal-parks's task in 1m 30s —— View job Removed
|
The user reported that non-network scene elements (like Grid) remained visible when entities disappeared, ruling out WebGL context loss as the cause. The stream reconnect fix in retryStream is sufficient. Blindly calling preventDefault() on webglcontextlost would also silently recover an app that is legitimately overusing WebGL contexts rather than failing explicitly. Co-authored-by: Micheal Parks <micheal-parks@users.noreply.github.com>
|
@claude this looks good but incomplete. The user reported the complete loss of a scene with objects that not only originate from that source, but also from |
|
Claude finished @micheal-parks's task in 10m 6s —— View job
Root causeBoth
Fix (
|
When useResourceNames returns empty (machine connection dropped), useGeometries and usePointclouds were destroying all ECS entities. On reconnect the queries came back but there was a blank window. Guard the outer cleanup: if ALL queries are gone (activeQueryKeys empty), the machine is likely temporarily disconnected — skip entity destruction. Only destroy when the partID changed or other queries are still active (connected machine, resource legitimately removed). Co-authored-by: Micheal Parks <micheal-parks@users.noreply.github.com>
|
@claude why did you update
Plus it is not related to machine connection, it is only used when running the app locally with |
|
Claude finished @DTCurrie's task in 1m 20s —— View job
Good catch — the previous comment conflated two separate things. Here's the full picture: Why
|
| Hook | Transport | Bug pattern | Fix applied |
|---|---|---|---|
useDrawService |
Raw Connect-RPC stream | No reconnect on error | retryStream ✓ |
useGeometries / usePointclouds |
TanStack Query + useResourceNames |
Entity cleanup on empty queries | Guard on anyQueriesActive ✓ |
useWorldState |
SDK createResourceStream |
N/A — SDK handles reconnect; entity cleanup not via query pattern | No change needed |
So the deferral was correct in outcome, but the previous explanation was imprecise — it should have said "uses SDK-level reconnect (not a raw stream) and manages entities directly, bypassing the query-cleanup bug" rather than leaning on "re-initialized from scratch when clients return."
| Branch
|
Summary
Closes #584
The 3D scene can go white and lose all objects due to two root causes, both now addressed:
WebGL context loss — When running multiple 3D tabs, Chrome may evict WebGL contexts to reclaim GPU memory. Without handling the
webglcontextlost/webglcontextrestoredevents, the scene goes permanently blank. A newContextLossHandlercomponent callspreventDefault()on loss (allowing the browser to restore the context) andinvalidate()on restore to re-render the scene.No auto-reconnect for Connect-RPC streams — When
streamEntityChangesorstreamSceneChangeserrors out, the stream dies permanently with no retry. A newretryStreamutility wraps streams with exponential backoff (1s–30s), automatically reconnecting while preserving existing entities in the scene during brief disconnections. This matches the reconnect pattern already used by the WebSocket-baseduseDrawAPI.DRI
@btshrewsbury-viam
Test plan
retryStreamcovering: success, retry on error, abort during backoff, delay reset after success🤖 Generated with Claude Code