fix(swarm): WindowsTerminalBackend pidFile health check + 5-state lifecycle#1237
fix(swarm): WindowsTerminalBackend pidFile health check + 5-state lifecycle#1237amDosion wants to merge 2 commits into
Conversation
…ecycle 修 wt.exe split-pane fire-and-forget 导致 teammate 假死、TeamDelete 卡死、 kill-while-spawn race 等多个问题。 - 加 waitForPidFile() 在 wt.exe 返回后等 powershell.exe 真启动写 pidFile 默认 8s timeout,env CLAUDE_WT_PANE_TIMEOUT_MS 覆盖,超时 throw 含完整诊断 - 加 5 态生命周期 (registered/spawning/ready/killing/dead),sendCommandToPane inner Promise 包装 spawnPromise,ready 态重 spawn 直接 throw - killPane TOCTOU 修正:await spawnPromise 后重读 status;优先用缓存 pane.pid 避免读盘,Stop-Process 失败也清缓存 + 标 dead 防 PID 复用误杀 - pid 解析严格化:/^\d+$/ + Number.isFinite + >0;移除 dead try/catch - 构造函数 options 对象注入 pidFileDir(兼容原位置参数) - 清启动前陈旧 pidFile,killPane fallback 3×500ms retry 兜底
…, pid validation
为 WindowsTerminalBackend 加 12 个测试覆盖 v2 全部新行为,含 5 个 v1 兼容 + 7 个
v2 新场景。配套构造函数 options 对象,测试用 pidFileDir: tempDir 隔离防泄漏到
真实 OS tmpdir。
新场景覆盖:
- unlinks stale pidFile so a stale pid is not adopted
- rejects re-spawn on a ready pane
- throws on unknown paneId in sendCommandToPane
- rejects corrupted pidFile content ("123abc") and times out
- killPane awaits in-flight spawn before killing (kill-while-spawn race)
- Stop-Process failure clears cached pid and marks pane dead
- killPane uses cached pid and returns false when pane is unknown
createBackend helper 改用 options 对象 + simulatePidWrite 模拟 powershell 写
pidFile,pidFileDir 注入 tempDir,env CLAUDE_WT_PANE_TIMEOUT_MS beforeEach 设置
afterEach 清理。
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughWindowsTerminalBackend pane lifecycle management is refactored to support dependency injection, explicit state tracking via a PaneStatus enum, and timeout-driven PID file polling. sendCommandToPane and killPane are rewritten as state machines that properly handle concurrent spawn/kill races and prevent invalid lifecycle transitions. Tests validate state transitions, timeout behavior, and edge cases. ChangesWindows Terminal Pane Lifecycle Management
Sequence DiagramsequenceDiagram
participant Caller
participant Backend
participant PowerShell
participant FileSystem
Caller->>Backend: sendCommandToPane(paneId, cmd)
Backend->>Backend: validate pane status not spawning/ready/killing/dead
Backend->>Backend: create & attach spawnPromise
Backend->>PowerShell: wt.exe via PowerShell -EncodedCommand
Backend->>FileSystem: delete existing PID file
Backend->>FileSystem: waitForPidFile (poll with timeout)
FileSystem-->>Backend: PID file appears & validates
Backend->>Backend: update pid, status=ready, resolve spawnPromise
Backend-->>Caller: resolved promise
note over Backend: on error: status=dead, reject promise, clear promise
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
修复 fork 自研的 Windows Terminal Agent Teams backend(
src/utils/swarm/backends/WindowsTerminalBackend.ts)两类问题:wt.exe返回 exit 0 不代表 PowerShell 真启动。原代码立即写 mailbox,导致 teammate 假死、TeamDelete卡在 "active teammate"。加waitForPidFile()在 wt.exe 返回后轮询 pidFile 直到子 PowerShell 真写入 PID,默认 8s timeout(envCLAUDE_WT_PANE_TIMEOUT_MS覆盖),超时 throw 含完整诊断信息。registered/spawning/ready/killing/dead),killPane在 spawning 中先awaitin-flight Promise 再决策(含 TOCTOU 重读),优先用缓存pane.pid避免读盘,Stop-Process失败一律清缓存 + 标 dead 防 PID 复用误杀。同时严格化 pid 解析(
/^\d+$/+Number.isFinite + > 0,拒绝"123abc"等)、构造函数改 options 对象支持pidFileDir注入(测试隔离)、makePidFile由模块级函数改为私有方法。Test plan
bun test src/utils/swarm/backends/__tests__/WindowsTerminalBackend.test.ts— 12 pass / 0 fail(5 v1 适配 + 7 v2 新场景:kill-while-spawn race / ready 态重 spawn / corrupted pid / Stop-Process 失败清缓存 等)bun run tsc --noEmit— 零新错误(pre-existingdoubaoSTT.ts4 个doubaoime-asr模块缺失与本 PR 无关)bun test src/utils/swarm/backends/__tests__/PaneBackendExecutor.test.ts— 2 pass(未破坏已有 PaneBackendExecutor 用例)wt.exe是 UWP app 仅 Windows 可测,建议 reviewer 协助验证)范围
src/utils/swarm/backends/WindowsTerminalBackend.ts(+215/-38)src/utils/swarm/backends/__tests__/WindowsTerminalBackend.test.ts(+247/-13)PaneBackendExecutor/registry/detection/ 其他 backend / CI / workflow / 其他 fork-specific 配置Follow-up(不阻断本 PR)
isFirstTeammate判断)fs.watch替代 200ms 轮询(性能优化)killPanefalse 区分 "pane 不存在" vs "kill 真失败"(observability)Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests