perf: skip re-indexing unchanged files in drainNotifyFile (#228)#230
perf: skip re-indexing unchanged files in drainNotifyFile (#228)#230JF10R wants to merge 3 commits intojustrach:mainfrom
Conversation
drainNotifyFile re-indexes every path listed in the notify file without checking whether the file has actually changed. This wastes CPU and compounds memory growth in the trigram index when the same unchanged files are notified repeatedly. Add a mtime+size check against the known-file state before calling indexFileContent, mirroring the guard already present in incrementalDiff. The stat is reused for the post-index known-state update, so there is no extra I/O. Closes justrach#228
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf84e64f46
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Improves watcher performance by avoiding redundant re-indexing when drainNotifyFile receives repeated notifications for files that haven’t changed, reducing unnecessary CPU work and helping limit index growth from repeated identical updates.
Changes:
- Add a pre-index
mtime+sizeguard indrainNotifyFileto skip unchanged files. - Reuse the same stat result for the subsequent known-state update to avoid extra stat calls.
Address review feedback from Codex and Copilot on justrach#230: 1. Add content hash check for same-mtime same-size rewrites (Codex P2). The mtime+size fast-path now falls through to a hash comparison when both match, catching rewrites within the same millisecond. 2. Re-stat after indexing to update known-state (Copilot). The pre-index stat could go stale if the file changes between the check and the index call. Post-index re-stat ensures known-state reflects what was actually indexed. Note: the path normalization issue (relative root vs absolute notify paths) is pre-existing in upstream drainNotifyFile and is out of scope for this PR.
src/watcher.zig
Outdated
| const pre_stat = compat.dirStatFile(dir, rel) catch continue; | ||
| const pre_mtime: i64 = @intCast(@divTrunc(pre_stat.mtime, std.time.ns_per_ms)); | ||
| if (known.getPtr(rel)) |existing| { | ||
| if (existing.mtime == pre_mtime and existing.size == pre_stat.size) { |
There was a problem hiding this comment.
This skip condition is narrower than the existing incrementalDiff dedup logic. Right now you only hash when existing.mtime == pre_mtime, so same-size no-op rewrites with a newer mtime still fall through to a full re-index on the notify path. To actually mirror incrementalDiff, the comparison here should handle the mtime changed, size same, content hash unchanged case as well; otherwise a chunk of the wasted-work problem from #228 remains.
| const post_stat = compat.dirStatFile(dir, rel) catch continue; | ||
| const post_mtime: i64 = @intCast(@divTrunc(post_stat.mtime, std.time.ns_per_ms)); | ||
| const post_hash = hashFile(dir, rel, post_stat.size) catch continue; | ||
| if (known.getPtr(rel)) |existing| { |
There was a problem hiding this comment.
This only updates known for paths that were already present. If rel is not yet in known, repeated notify entries before the next poll cycle will still re-index the file every time. Please insert/update the known-state here for newly-notified files too, using the post-index stat/hash so the notify path can dedup its own repeats immediately.
justrach
left a comment
There was a problem hiding this comment.
I found two issues that should be fixed before this merges.
-
The new notify-path guard does not actually match the dedup semantics already used in
incrementalDiff. It only hashes whenmtimeand size are unchanged, so same-size no-op rewrites with a newermtimestill trigger a full re-index throughdrainNotifyFile. That leaves part of issue #228 unfixed. -
The post-index known-state update only runs for files already present in
known. Newly-notified files still do not populateknownhere, so repeated notify entries before the next poll cycle can still re-index the same file over and over.
I left inline comments on the exact lines in src/watcher.zig where I think the fix should land. Once those are addressed, I’m happy to re-review.
Match notify-path dedup with incrementalDiff for same-size mtime changes, and seed known state for newly notified paths after indexing. Co-Authored-By: Codex <noreply@openai.com>
Closes #228
Summary
drainNotifyFilere-indexes every path in the notify file without checking whether the file has actually changed since the last index. This wastes CPU and compounds memory growth in the trigram index when the same unchanged files are notified repeatedly.Exact change
Added a mtime+size check against the
knownfile-state map before callingindexFileContent, mirroring the guard already present inincrementalDiff(line 531). The stat result is reused for the post-index known-state update, so there is no extra I/O.Files touched
src/watcher.zig—drainNotifyFilefunction onlyRed-to-green
Before
Appending the same unchanged file path to the notify file triggers a full re-index every time:
After
Unchanged files are skipped:
Nearby non-regression
The guard mirrors the existing
incrementalDiffpattern exactly. All existing watcher and incremental index tests pass unchanged. Files that have genuinely changed (different mtime or size) are still re-indexed as before.Note: upstream
maindoes not compile on Windows (4 pre-existing errors unrelated to watcher.zig). The change in this PR is pure Zig with no OS-specific paths.Rebased
Yes — single commit on top of current
main(268f9c9).Generated files / lockfiles / benchmarks
None changed.
CONTRIBUTING.md compliance
Confirmed.