fix: 10min idle timeout + poll stdin for dead clients (#148)#233
fix: 10min idle timeout + poll stdin for dead clients (#148)#233
Conversation
Two changes: 1. Idle timeout increased from 2min to 10min — allows long debugging sessions where the agent thinks for >2min 2. Watchdog now polls stdin every 10s for POLLHUP — detects dead clients immediately instead of waiting for the full timeout. When the client pipe closes, server exits within 10 seconds. This means: - Client disconnects cleanly → stdin EOF → server exits immediately - Client crashes/killed → POLLHUP detected → server exits in ≤10s - Client connected but idle → 10min timeout (plenty for debugging) - Orphaned process (no client ever) → 10min timeout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Verify idle_timeout_ms is 10 minutes (600,000ms) - Verify POLLHUP detected when pipe write end is closed (client gone) - Verify open pipe does NOT trigger false HUP Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 252a9f1fed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Wait up to 15 seconds for the process to exit | ||
| // (watchdog polls every 10s, so it should detect POLLHUP within ~10s) | ||
| const start = std.time.milliTimestamp(); | ||
| const term = child.wait() catch { |
There was a problem hiding this comment.
Add a bounded wait when asserting MCP exit on stdin close
child.wait() blocks indefinitely, so if the MCP process fails to exit after stdin is closed, this test hangs the entire suite until an external timeout kills it. The test comment says it waits “up to 15 seconds,” but there is no actual timeout enforcement before calling wait(), which means this regression test cannot fail fast when the shutdown path breaks.
Useful? React with 👍 / 👎.
| test "issue-148: codedb mcp exits when stdin is closed" { | ||
| // Integration test: spawn codedb mcp, close stdin, verify it exits | ||
| var child = std.process.Child.init( | ||
| &.{ "zig", "build", "run", "--", "--mcp" }, |
There was a problem hiding this comment.
Remove build latency from the 15s MCP shutdown assertion
This integration test launches zig build run -- --mcp, so compile/link startup time is included in elapsed before the < 15_000 assertion. On slower or cold-cache environments, build time alone can exceed 15s even when stdin-HUP shutdown works correctly, making the test flaky and failing for reasons unrelated to watchdog behavior.
Useful? React with 👍 / 👎.
# Conflicts: # src/main.zig # src/tests.zig
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 051d6c84e9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| switch (term) { | ||
| .Exited => |code| _ = code, | ||
| else => {}, |
There was a problem hiding this comment.
Assert successful MCP termination in stdin-close test
The new integration test accepts any termination outcome because the switch ignores non-Exited cases and also discards the exit code, so crashes or non-zero exits (for example from a panic or failed zig build run) still satisfy the test as long as the process returns quickly. This can mask regressions in the stdin-HUP shutdown path instead of proving a clean exit.
Useful? React with 👍 / 👎.
Benchmark Regression ReportThreshold: 10.00%
|
Summary
POLLHUPso dead clients are detected soonerVerification
Refs #148