Skip to content

feat(windows): foundation — build system, stack safety, heap allocation#170

Open
JF10R wants to merge 4 commits intojustrach:mainfrom
JF10R:windows/foundation
Open

feat(windows): foundation — build system, stack safety, heap allocation#170
JF10R wants to merge 4 commits intojustrach:mainfrom
JF10R:windows/foundation

Conversation

@JF10R
Copy link
Copy Markdown

@JF10R JF10R commented Apr 6, 2026

Closes #181

Summary

Adds Windows build support by fixing stack overflow, heap allocation, and path buffer issues that prevent codedb from compiling and running on Windows.

Changes

File Change
build.zig Set exe.stack_size = 8MB for Windows targets (Linux default is 8MB, Windows default is 1MB)
src/compat.zig Add path_buf_size constant (1024 on Windows, max_path_bytes elsewhere)
src/main.zig noinline fn mainImpl trampoline to prevent LLVM frame merging; fix double scan_thread.join() (UB on all platforms); guard POSIX-only poll() in idle watchdog
src/watcher.zig Heap-allocate EventQueue events array via page_allocator (4MB inline would blow stack); use path_buf_size for FsEvent.path_buf
src/index.zig Heap-allocate frequency table counts (512KB) and out (128KB) buffers; guard POSIX-only mmap in MmapTrigramIndex
src/tests.zig Update EventQueue tests to use init()/deinit() API; skip POSIX-only mmap and pipe tests on Windows

Tests

zig build test — 252/258 passed, 2 failed (pre-existing), 4 skipped (POSIX-only)

The 2 failures (issue-150: --help/--h) are pre-existing upstream test issues on Windows (child process execution). The 4 skips are POSIX-only tests (mmap, pipe) correctly guarded with SkipZigTest.

Before/After

  • Before: zig build -Dtarget=x86_64-windows fails — STATUS_STACK_OVERFLOW at runtime due to LLVM merging all command branch frames into one ~128MB frame exceeding the 1MB Windows default stack. EventQueue puts 4MB on stack. Frequency table buffers put 640KB on stack.
  • After: Cross-compiles and runs. Stack pressure reduced from ~133MB to <1MB per function frame.

Compliance

  • Rebased onto current main (d7e3bfb)
  • Linked issue: build: Windows support — build system, stack safety, heap allocation #181
  • No generated artifacts committed
  • No behavioral change on Linux/macOS (all Windows paths behind comptime checks)
  • Each file change is directly related to Windows build support
  • Under 500 changed lines (106 insertions, 66 deletions)
  • Matches CONTRIBUTING.md requirements

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9be3f687d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

var root_buf: [std.fs.max_path_bytes]u8 = undefined;
var root_buf: [compat.path_buf_size]u8 = undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use full path buffer when resolving repository root

On Windows, compat.path_buf_size is now 1024, and mainImpl uses that size for root_buf before calling resolveRoot/realpath. This means any valid project root whose absolute path exceeds 1024 bytes will now fail startup with cannot resolve root, whereas the previous std.fs.max_path_bytes path supported long-path Windows setups. This is a functional regression for deep workspace paths; keep the reduced buffer only for watcher event storage and use std.fs.max_path_bytes (or a dynamic buffer) for root resolution.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Windows support by reducing stack pressure and adjusting allocation strategies in core components of codedb, aiming to avoid stack overflows caused by large stack frames and large inline buffers on Windows.

Changes:

  • Introduce compat.path_buf_size (1024 on Windows) and use it for path buffers to avoid huge std.fs.max_path_bytes stack allocations.
  • Increase Windows executable stack size to 8MB and switch the CLI entrypoint to a noinline implementation to reduce large merged stack frames.
  • Heap-allocate large structures (watcher EventQueue storage and index frequency table buffers) and update tests for the new EventQueue.init()/deinit() API.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/watcher.zig Switch FsEvent path buffer sizing to compat.path_buf_size; heap-allocate the queue backing array via EventQueue.init()/deinit().
src/tests.zig Update watcher queue tests to use EventQueue.init()/deinit().
src/main.zig Replace thread trampoline with noinline mainImpl(), use compat.path_buf_size for root resolution buffers, switch to stack EventQueue + init/deinit, and remove duplicate thread join.
src/index.zig Heap-allocate frequency table count/output buffers and refactor finishFrequencyTable to fill an output buffer.
src/compat.zig Add path_buf_size with a smaller Windows-specific value to prevent oversized path buffers.
build.zig Set Windows executable stack size to 8MB.

Comment on lines 461 to +465
var shutdown = std.atomic.Value(bool).init(false);
defer shutdown.store(true, .release);
var scan_already_done = std.atomic.Value(bool).init(true);

const queue = try allocator.create(watcher.EventQueue);
defer allocator.destroy(queue);
queue.* = watcher.EventQueue{};
const watch_thread = try std.Thread.spawn(.{}, watcher.incrementalLoop, .{ &store, &explorer, queue, root, &shutdown, &scan_already_done });
var queue = try watcher.EventQueue.init();
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the serve command, defer shutdown.store(true, .release); is registered before the defer ...join() calls. Because defers run LIFO, the joins execute before shutdown is set, so watch_thread.join() / reap_thread.join() can block forever if server.serve() ever returns (e.g., accept/spawn error). Consider setting shutdown before joining (e.g., a single defer { shutdown.store(...); reap_thread.join(); watch_thread.join(); } block, or move the shutdown defer below the join defers).

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +52
events: *[CAPACITY]?FsEvent,
head: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
tail: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
mu: std.Thread.Mutex = .{},

pub fn init() !EventQueue {
const events = try std.heap.page_allocator.create([CAPACITY]?FsEvent);
events.* = [_]?FsEvent{null} ** CAPACITY;
return .{ .events = events };
}

pub fn deinit(self: *EventQueue) void {
std.heap.page_allocator.destroy(self.events);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventQueue.init() hard-codes std.heap.page_allocator, which makes the type harder to reuse and test (callers can’t choose an allocator, and it bypasses the existing GPA/testing allocator patterns used elsewhere). Consider accepting an allocator: std.mem.Allocator in init/deinit (and storing it in the struct) so production and tests can use the same allocation strategy.

Suggested change
events: *[CAPACITY]?FsEvent,
head: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
tail: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
mu: std.Thread.Mutex = .{},
pub fn init() !EventQueue {
const events = try std.heap.page_allocator.create([CAPACITY]?FsEvent);
events.* = [_]?FsEvent{null} ** CAPACITY;
return .{ .events = events };
}
pub fn deinit(self: *EventQueue) void {
std.heap.page_allocator.destroy(self.events);
allocator: std.mem.Allocator,
events: *[CAPACITY]?FsEvent,
head: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
tail: std.atomic.Value(usize) = std.atomic.Value(usize).init(0),
mu: std.Thread.Mutex = .{},
pub fn init(allocator: std.mem.Allocator) !EventQueue {
const events = try allocator.create([CAPACITY]?FsEvent);
events.* = [_]?FsEvent{null} ** CAPACITY;
return .{
.allocator = allocator,
.events = events,
};
}
pub fn deinit(self: *EventQueue) void {
self.allocator.destroy(self.events);

Copilot uses AI. Check for mistakes.
Comment on lines 1313 to +1317
pub fn buildFrequencyTableFromSlices(slices: []const []const u8) [256][256]u16 {
var counts: [256][256]u64 = .{.{0} ** 256} ** 256;
const counts = std.heap.page_allocator.create([256][256]u64) catch return default_pair_freq;
defer std.heap.page_allocator.destroy(counts);
const out = std.heap.page_allocator.create([256][256]u16) catch return default_pair_freq;
defer std.heap.page_allocator.destroy(out);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says “Zero extra memory”, but this function now performs heap allocations (page_allocator.create for both counts and out). Either adjust the comment to reflect the new allocation behavior, or consider an approach that avoids heap allocation on non-Windows targets if the intent is to remain allocation-free.

Copilot uses AI. Check for mistakes.
@JF10R JF10R force-pushed the windows/foundation branch from b9be3f6 to 143c890 Compare April 6, 2026 14:53
@JF10R
Copy link
Copy Markdown
Author

JF10R commented Apr 6, 2026

Addressing reviewer feedback

✅ Fixed: stale docstring in index.zig:1317

Updated buildFrequencyTableFromSlices docstring from "Zero extra memory" to accurately describe the heap allocation. (commit cb8b339)

Won't fix: main.zig:114compat.path_buf_size (1024) for resolveRoot

1024 bytes is sufficient for real-world Windows project roots. Windows MAX_PATH is 260 by default; the extended \?\ prefix paths that reach 32K are not used for project root directories. Using std.fs.max_path_bytes (32767) would defeat the purpose of this PR — it allocates ~32KB on the stack per path buffer, and mainImpl has multiple such buffers. The tradeoff (reject exotic 1024+ byte roots vs. avoid stack overflow) is intentional.

Won't fix: main.zig:465 — defer ordering (shutdown vs join)

This is pre-existing upstream behavior, not introduced by this PR. The defer LIFO ordering means joins execute before shutdown.store(true), but the MCP run() call above is blocking — by the time we reach the defers, the server has already exited and threads are winding down. Not a functional issue in practice.

Won't fix: watcher.zig:52EventQueue.init() hardcodes page_allocator

The EventQueue events array is ~4MB (4096 × ~1KB FsEvent). page_allocator is the right choice for allocations this large — it goes directly to the OS (VirtualAlloc/mmap) rather than fragmenting the GPA. Accepting an allocator parameter would suggest callers should pass their GPA, which would be worse. The testing concern is valid in theory, but all three EventQueue tests work fine with page_allocator via testing.allocator for their own state — the queue's internal storage doesn't need to be test-observable.

@justrach
Copy link
Copy Markdown
Owner

justrach commented Apr 6, 2026

Hey @JF10R — thanks for the Windows work! Before we can review and merge, could you please:

  1. Rebase onto current main — we've landed several changes (mmap trigram index, fuzzy search, process lifecycle fixes, root policy updates) that may conflict
  2. Follow the requirements in CONTRIBUTING.md — each PR needs:
    • Linked issue number
    • Summary of exact changes + files touched
    • Tests run (include zig build test output)
    • Failing test before fix / passing test after
    • Confirmation branch is rebased onto current main
    • No generated artifacts committed
  3. Keep PRs tightly scoped — the foundation → I/O → MCP → security chain looks good, just make sure each one compiles independently after rebase

We'll get Codex to do a review pass once rebased. Thanks!

Copy link
Copy Markdown
Owner

@justrach justrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting rebase onto current main and CONTRIBUTING.md compliance before review. See comment above.

JF10R added 4 commits April 6, 2026 19:37
Add Windows platform support foundations that all subsequent Windows PRs
depend on. No behavioral change on Linux/macOS.

- compat.zig: add path_buf_size (1024 on Windows vs max_path_bytes)
- build.zig: set 8MB stack for Windows (matches Linux default)
- main.zig: replace thread trampoline with noinline mainImpl to avoid
  LLVM frame merging (~128MB) that exceeds Windows guard page stride
- main.zig: fix double scan_thread.join() (UB on all platforms)
- watcher.zig: heap-allocate EventQueue (avoids ~4MB stack array)
- watcher.zig: use compat.path_buf_size in FsEvent
- index.zig: heap-allocate frequency table buffers (~640KB each)
- tests.zig: update EventQueue construction to use init()/deinit()
Upstream issue-148 added stdin HUP detection via std.posix.poll() and
pipe tests. These use POSIX APIs unavailable on Windows. Guard the poll
block with comptime os check (Windows uses idle timeout only) and skip
pipe tests on Windows.
Upstream's mmap trigram index (justrach#167) uses std.posix.mmap/munmap which
are unavailable on Windows. Guard initFromDisk to return null on Windows
(falling back to heap-based index) and skip mmap-dependent tests.
@JF10R JF10R force-pushed the windows/foundation branch from cb8b339 to e869707 Compare April 6, 2026 23:45
@JF10R
Copy link
Copy Markdown
Author

JF10R commented Apr 6, 2026

Rebased onto current main (d7e3bfb), updated PR description per CONTRIBUTING.md.

New since last review: added comptime guards for upstream mmap trigram index (#167) — MmapTrigramIndex.initFromDisk returns null on Windows, falling back to heap-based index. Also guarded the 3 new mmap tests with SkipZigTest.

zig build test output

$ zig build test
Build Summary: 5/7 steps succeeded; 1 failed; 297/303 tests passed; 4 skipped; 2 failed
  run test 252/258 passed, 2 failed, 4 skipped

error: while executing test 'tests.test.issue-163: transpositions handled by Smith-Waterman', the following test command failed:

Note on the issue-163 line: This is not a third failure. The Zig build system reports which test was in the runner queue when the binary exited non-zero. The binary exited non-zero because of the 2 issue-150 assertion failures below. issue-163 itself is among the 252 that passed (252 + 2 + 4 = 258 total, all accounted for).

2 failed (pre-existing, not introduced by this PR):

  • issue-150: --help prints usage — spawns codedb as child process, requires built binary in PATH
  • issue-150: -h prints usage — same

4 skipped (correctly guarded with SkipZigTest on Windows):

  • issue-148: POLLHUP detects closed pipe — uses std.posix.pipe() / std.posix.poll()
  • issue-164: mmap trigram index returns same candidates as heap index — uses std.posix.mmap
  • issue-164: mmap binary search on sorted lookup table — uses std.posix.mmap
  • issue-164: AnyTrigramIndex dispatches to mmap variant — uses std.posix.mmap

Pre-existing failures proof

Upstream main (d7e3bfb) does not compile at all on Windows:

$ git checkout upstream/main && zig build test
Build Summary: 2/7 steps succeeded; 2 failed
  compile test Debug native — 1 error (std.posix.mmap requires libc)
  compile test Debug native — 4 errors (mmap libc, void struct init, ws2_32 SOCKET type)

This PR is what makes the codebase compile on Windows. The 2 --help test failures exist because those tests spawn codedb as a child process — they would fail on upstream too if it compiled.

Red-to-green

Before (upstream main on Windows):

$ zig build test
→ 5 compilation errors, 0 tests run

After (this PR):

$ zig build test
→ 252/258 passed, 2 failed (pre-existing), 4 skipped (POSIX-only)

Nearby non-regression checks

All 252 passing tests are upstream tests running unmodified — they serve as non-regression guards. Specifically:

  • All EventQueue tests pass with the new init()/deinit() heap-allocation API
  • All frequency table tests pass with heap-allocated buffers
  • All trigram index tests pass (heap variant; mmap variant correctly skipped)
  • All explorer, watcher, store, mcp, snapshot tests pass unchanged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

build: Windows support — build system, stack safety, heap allocation

3 participants