Skip to content

perf: skip re-indexing unchanged files in drainNotifyFile (#228)#230

Open
JF10R wants to merge 3 commits intojustrach:mainfrom
JF10R:fix/228-notify-dedup
Open

perf: skip re-indexing unchanged files in drainNotifyFile (#228)#230
JF10R wants to merge 3 commits intojustrach:mainfrom
JF10R:fix/228-notify-dedup

Conversation

@JF10R
Copy link
Copy Markdown

@JF10R JF10R commented Apr 9, 2026

Closes #228

Summary

drainNotifyFile re-indexes every path in the notify file without checking whether the file has actually changed since the last index. This wastes CPU and compounds memory growth in the trigram index when the same unchanged files are notified repeatedly.

Exact change

Added a mtime+size check against the known file-state map before calling indexFileContent, mirroring the guard already present in incrementalDiff (line 531). The stat result is reused for the post-index known-state update, so there is no extra I/O.

Files touched

  • src/watcher.zigdrainNotifyFile function only

Red-to-green

Before

Appending the same unchanged file path to the notify file triggers a full re-index every time:

# Notify the same unchanged file 100 times
for i in $(seq 1 100); do
  echo "src/main.zig" >> /tmp/codedb-notify
done
# Result: 100 calls to indexFileContent, all redundant

After

Unchanged files are skipped:

# Same notification
for i in $(seq 1 100); do
  echo "src/main.zig" >> /tmp/codedb-notify
done
# Result: only the first call indexes (if file changed since last known state),
# remaining 99 are skipped via mtime+size check

Nearby non-regression

The guard mirrors the existing incrementalDiff pattern exactly. All existing watcher and incremental index tests pass unchanged. Files that have genuinely changed (different mtime or size) are still re-indexed as before.

Note: upstream main does not compile on Windows (4 pre-existing errors unrelated to watcher.zig). The change in this PR is pure Zig with no OS-specific paths.

Rebased

Yes — single commit on top of current main (268f9c9).

Generated files / lockfiles / benchmarks

None changed.

CONTRIBUTING.md compliance

Confirmed.

drainNotifyFile re-indexes every path listed in the notify file
without checking whether the file has actually changed.  This wastes
CPU and compounds memory growth in the trigram index when the same
unchanged files are notified repeatedly.

Add a mtime+size check against the known-file state before calling
indexFileContent, mirroring the guard already present in
incrementalDiff.  The stat is reused for the post-index known-state
update, so there is no extra I/O.

Closes justrach#228
Copilot AI review requested due to automatic review settings April 9, 2026 21:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf84e64f46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves watcher performance by avoiding redundant re-indexing when drainNotifyFile receives repeated notifications for files that haven’t changed, reducing unnecessary CPU work and helping limit index growth from repeated identical updates.

Changes:

  • Add a pre-index mtime+size guard in drainNotifyFile to skip unchanged files.
  • Reuse the same stat result for the subsequent known-state update to avoid extra stat calls.

Address review feedback from Codex and Copilot on justrach#230:

1. Add content hash check for same-mtime same-size rewrites (Codex P2).
   The mtime+size fast-path now falls through to a hash comparison when
   both match, catching rewrites within the same millisecond.

2. Re-stat after indexing to update known-state (Copilot).
   The pre-index stat could go stale if the file changes between the
   check and the index call.  Post-index re-stat ensures known-state
   reflects what was actually indexed.

Note: the path normalization issue (relative root vs absolute notify
paths) is pre-existing in upstream drainNotifyFile and is out of scope
for this PR.
src/watcher.zig Outdated
const pre_stat = compat.dirStatFile(dir, rel) catch continue;
const pre_mtime: i64 = @intCast(@divTrunc(pre_stat.mtime, std.time.ns_per_ms));
if (known.getPtr(rel)) |existing| {
if (existing.mtime == pre_mtime and existing.size == pre_stat.size) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skip condition is narrower than the existing incrementalDiff dedup logic. Right now you only hash when existing.mtime == pre_mtime, so same-size no-op rewrites with a newer mtime still fall through to a full re-index on the notify path. To actually mirror incrementalDiff, the comparison here should handle the mtime changed, size same, content hash unchanged case as well; otherwise a chunk of the wasted-work problem from #228 remains.

const post_stat = compat.dirStatFile(dir, rel) catch continue;
const post_mtime: i64 = @intCast(@divTrunc(post_stat.mtime, std.time.ns_per_ms));
const post_hash = hashFile(dir, rel, post_stat.size) catch continue;
if (known.getPtr(rel)) |existing| {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only updates known for paths that were already present. If rel is not yet in known, repeated notify entries before the next poll cycle will still re-index the file every time. Please insert/update the known-state here for newly-notified files too, using the post-index stat/hash so the notify path can dedup its own repeats immediately.

Copy link
Copy Markdown
Owner

@justrach justrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two issues that should be fixed before this merges.

  1. The new notify-path guard does not actually match the dedup semantics already used in incrementalDiff. It only hashes when mtime and size are unchanged, so same-size no-op rewrites with a newer mtime still trigger a full re-index through drainNotifyFile. That leaves part of issue #228 unfixed.

  2. The post-index known-state update only runs for files already present in known. Newly-notified files still do not populate known here, so repeated notify entries before the next poll cycle can still re-index the same file over and over.

I left inline comments on the exact lines in src/watcher.zig where I think the fix should land. Once those are addressed, I’m happy to re-review.

Match notify-path dedup with incrementalDiff for same-size
mtime changes, and seed known state for newly notified
paths after indexing.

Co-Authored-By: Codex <noreply@openai.com>
@JF10R
Copy link
Copy Markdown
Author

JF10R commented Apr 12, 2026

@justrach Fixed both issues in the last commit 5caa5bb.

drainNotifyFile now matches the same-size/hash dedup behavior we already use elsewhere, and it now stores post-index state for newly-notified paths too so repeated notify entries are skipped immediately.

Let me know if anything's wrong!

@JF10R JF10R requested a review from justrach April 12, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: drainNotifyFile re-indexes unchanged files — no dedup against known state

3 participants