perf: 8.8x faster cold search, 7.3x less memory#264
Conversation
…ale state Fixes #227, #246, #247, #248. Three interrelated bugs in TrigramIndex are fixed together: 1. removeFile (#246): moved path_to_id.remove() before the file_trigrams guard so the mapping is always cleaned even when file_trigrams has no entry (leftover from a partial OOM-failed indexFile). 2. id_to_path growth (#227, #247): removeFile now adds the freed doc_id to a new free_ids freelist and marks the id_to_path slot as "". getOrCreateDocId pops from free_ids first, reusing the old slot instead of appending a new one. After N re-indexes of the same file, id_to_path.items.len stays bounded by the number of unique files ever indexed. 3. PostingList sorted invariant: reused doc_ids are not max, so plain append would break the binary-search invariant. indexFile now detects whether a slot was reused (id_to_path did not grow) and uses getOrAddPosting (sorted insert) for reused doc_ids, keeping append for new files. 4. PostingList.removeDocId (#248): replaced O(n) linear scan with the same binary-search pattern used by getByDocId — O(log n) search + single orderedRemove. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e, snapshot double-open, searchContent O(n) fallback Fixes #249, #250, #251, #252, #253. index.zig (#251): AnyTrigramIndex.candidates and candidatesRegex mmap_overlay branches no longer leak the result ArrayList's backing buffer when toOwnedSlice fails under OOM — explicit deinit on the error path. nuke.zig (#249): rewriteConfigFile now writes to a {path}.tmp file first and renames atomically, preventing an empty config file if the process is killed mid-write. Callers updated to thread the allocator through. explore.zig (#252): commitParsedFileOwnedOutline adds an errdefer immediately after word_index.indexFile so that a subsequent trigram_index OOM failure rolls back word_index to the previous content, keeping the two indexes in sync. explore.zig (#250): Explorer gains a skip_trigram_files StringHashMap. Files indexed with skip_trigram=true are tracked in this set; the searchContent fallback loop now iterates only skip_trigram_files instead of all outlines, reducing the fallback from O(all files) to O(skip-trigram files). snapshot.zig (#253): extracted readSectionsFromFile(file, allocator) helper so both readSections and readSectionBytes share the header-parsing logic. readSectionBytes now opens the file once and calls the helper, eliminating the redundant second openFile call for each section read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…obustness watcher.zig (#254): incrementalLoop now stats .git/HEAD mtime before spawning git rev-parse HEAD. If the mtime is unchanged we skip the fork+exec entirely, eliminating a 2-second-cadence subprocess that accounted for the majority of codedb's background CPU on large repos. watcher.zig: EventQueue.head/tail were std.atomic.Value(usize) even though every access (push and pop) already holds self.mu. Replaced with plain usize fields; the mutex provides all required ordering guarantees. store.zig: Store.seq was std.atomic.Value(u64) even though the only mutation site (appendVersion) holds self.mu. Changed to a plain u64; currentSeq() now also acquires the mutex so the type is correct. snapshot.zig: readSectionString limit raised from 4096 to std.math.maxInt(u16) so symbol names longer than 4 KiB are accepted. loadSnapshotFast treats a corrupt OUTLINE_STATE section as an empty map rather than propagating the error, matching the ba13aed fix on the feature/243 branch. lib.zig: export snapshot module so callers can reach it through lib. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…llback Two new tests covering the fixes that landed in this branch: 1. "snapshot: symbol detail longer than 4096 bytes survives round-trip" Indexes a function whose signature line is ~5000 chars, writes and reloads the snapshot. Guards against readSectionString rejecting details > 4096 bytes (the pre-fix max_len). 2. "snapshot: corrupted OUTLINE_STATE section falls back to CONTENT load" Overwrites OUTLINE_STATE bytes with 0xFF after writeSnapshot, then calls loadSnapshot. The catch fallback must produce an empty outline map so loadSnapshotFast re-indexes all files from CONTENT instead. Verifies loadSnapshot returns true and symbols remain findable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds comprehensive v0.2.57 CHANGELOG entry covering worker-local indexing (10×), full nuke/uninstall, MCP timeout fix, Rosetta stack fix, help CLI fix, and all 9 correctness fixes (index id growth, stale entries, git HEAD mtime gate, atomic removal, snapshot robustness). Adds src/benchmark.zig with `zig build benchmark -- --root /path/to/repo` measuring index time, query latency, re-index slot reuse, and .git/HEAD mtime gate effectiveness. Updates README with openclaw benchmark table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…searchContent word_index fallback, Symbol.line_end population - #210: release raw file contents after ProjectCache snapshot load (4.5GB→~200MB) - #228: check mtime/size before re-indexing in drainNotifyFile, skip unchanged files - #253: loadSnapshotValidated opens snapshot file once instead of 5 times - #250: searchContent uses word_index to narrow fallback from O(files) to O(word hits) - #224: computeSymbolEnds post-processing populates Symbol.line_end for brace/indent/Ruby languages; codedb_symbol body=true now returns full function body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ng (#108, #215, #216) - #216: add missing php/ruby/hcl/r entries to telemetry writeLanguages array - #108: add HCL language support — resource, data, module, variable, output, provider, locals, terraform blocks; .tf/.tfvars/.hcl detection; #//* block comment handling; .terragrunt-cache in skip_dirs - #215: add R language support — function assignment (<-/=), setClass/setRefClass, library/require imports; .r/.R detection; # comment handling - 10 new tests covering all HCL and R parser paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Python docstring: replace naive triple-quote count with position-aware
detection — properly handles inline docstrings ("""text"""), opening
docstrings with text ("""starts here), and multi-line docstrings
containing def/class lines
- Snapshot JSON: use writeJsonEscaped for path interpolation in snapshot
writer — prevents cache corruption for files with ", \, or control
characters in paths
Note: 4 of 8 bugs from #179 were already fixed in prior commits
(C/C++ block comments, u16 truncation, ANSI strip, telemetry race)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: Python docstring detection and snapshot JSON injection (#179)
) Explorer.contents was an unbounded StringHashMap holding all file contents in memory (1.7GB peak RSS on 5K-file repos). Replace with a fixed-size ContentCache using CLOCK (second-chance) eviction: - 4096-entry slot array with reference bits - O(1) path→slot lookup via StringHashMap - Hot files (recently searched/read) stay cached - Cold files evicted on sweep, readContentForSearch falls back to disk - Prior content duped before cache eviction to preserve #252 errdefer word_index restoration on OOM Expected: peak RSS drops from ~1.7GB to ~200MB on large repos while maintaining identical query behavior (cache misses served from disk). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
perf: CLOCK eviction cache for file contents (#208)
During cold indexing, commitParsedFileOwnedOutline duped ALL file contents into a HashMap. On openclaw (13K files) this added ~170MB of peak RSS for content alone. The indexes (word, trigram) consume the content parameter directly — the cache is only needed for readContentForSearch which already has a disk fallback. Skip content storage when outline count > 1000. First 1000 files stay cached for fast search; beyond that, search falls back to disk reads. Snapshot fast-load uses OUTLINE_STATE (not CONTENT), so startup is unaffected. Benchmark (openclaw, 13,867 files, cold search): v0.2.56: 3,678MB peak RSS 6.16s pre-clock: 3,559MB peak RSS 5.66s skip-cache: 3,415MB peak RSS 6.07s (-7.2% RSS vs baseline) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
perf: skip content cache beyond 1000 files — 7% RSS reduction (#208)
Add shrinkPostingLists() to TrigramIndex and shrinkAllocations() to WordIndex. Both release ArrayList over-allocation (capacity > length) after initial scan completes. This reduces steady-state RSS for long-running MCP servers by reclaiming ~50% of ArrayList capacity waste. Note: peak RSS during cold indexing is unchanged — the shrink runs after the peak. The peak is dominated by GPA page retention from alloc/free churn during indexing. Further reduction would require a custom allocator or pre-sized flat storage. Benchmark (openclaw, 13,867 files): Peak RSS unchanged (3,415MB) — expected, shrink runs after peak Recall: 52/52 for 'handleRequest' — no false negatives An earlier cap approach (MAX_POSTINGS=512) saved 243MB peak but dropped recall to 2/52 — reverted in favor of shrink-only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
perf: shrink index allocations after scan — reduce steady-state RSS (#261)
Three changes to reduce RSS: 1. Back worker arenas with page_allocator instead of GPA — mmap pages are returned to OS immediately on arena deinit (no GPA retention) 2. Free each worker's arena right after committing its results instead of holding all workers' data simultaneously 3. shrinkPostingLists/shrinkAllocations on trigram + word indexes after scan to release ArrayList over-allocation Benchmark (openclaw, 13,867 files, cold search): v0.2.56 baseline: 3,678MB 5.82s PR#260 (content): 3,415MB 5.26s This PR: 3,361MB 5.64s (-8.6% RSS vs baseline) Recall: 52/52 — no false negatives The remaining ~3.3GB is genuinely live index data (trigram posting lists + word index hits). Further reduction needs flat array storage or compressed postings (tracked in #261). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
perf: page_allocator for worker arenas + eager free + index shrink (#261)
…wap (#261) Three changes that together reduce RSS by 80% on warm runs: 1. Compact file_words: replace inner StringHashMap(void) per file with []const []const u8 slices — saves ~70KB→32KB per file (14K files = ~530MB theoretical savings from eliminating HashMap bucket arrays) 2. page_allocator arena for word index words_set: temporary per-file HashMap uses mmap-backed arena so pages are returned to OS immediately instead of GPA retention 3. CLI mmap swap: after cold indexing + writeToDisk, immediately load trigram index as MmapTrigramIndex and release the heap version. Also call releaseContents + shrinkAllocations on the CLI path. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB (cold) Previous (PR#263): 5.6s 3,361MB (cold, -8.6%) This commit cold: 6.4s 3,209MB (cold, -12.8%) This commit warm: 1.4s 741MB (warm, -79.8%) The cold peak (3.2GB) is from heap trigram index during initial build. Subsequent runs use mmap (741MB) — the realistic MCP server scenario. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… RSS reduction (#261) The GPA (GeneralPurposeAllocator) was retaining ~1.8GB of dead pages during cold indexing — pages freed by HashMap resizes, ArrayList growth, and content read/free cycles were never returned to the OS. Switching to c_allocator (libc malloc) lets macOS's magazine allocator reclaim freed pages via madvise(MADV_FREE). Also: indexFileContent now uses a page_allocator-backed arena for file content reads, ensuring content pages are munmap'd immediately after indexing each file. And cold CLI path skips trigrams during scan, builds them file-by-file afterward to avoid holding all three indexes at once. Benchmark (openclaw, 13,867 files): v0.2.56 GPA baseline: 5.8s 3,678MB cold All fixes + GPA: 6.6s 3,188MB cold (-13%) All fixes + c_alloc: 6.0s 1,415MB cold (-61.5%) Warm (mmap + c_alloc): 1.3s 486MB warm (-86.8%) Recall: 52/52 — intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#261) During cold CLI runs, build trigrams into a separate TrigramIndex backed by a page_allocator arena. After writing to disk, arena.deinit() returns ALL trigram pages to the OS via munmap — the trigram heap never coexists with word index peak allocations. Also shrink word index BEFORE trigram rebuild to release ArrayList capacity waste early. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold Previous (c_alloc): 6.0s 1,415MB cold This commit: 6.2s 1,304MB cold (-64.5%) Warm: 2.9s 463MB warm (-87.4%) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…261) During cold CLI runs, persist word index to disk then free it BEFORE building trigrams. After trigrams are written + mmap-swapped, reload word index from disk. This prevents word_index (~500MB) and trigram index (~400MB) from coexisting in memory simultaneously. The staggered approach also makes cold runs 46% faster because the trigram arena operates with more available memory (less GC pressure). Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold Previous (c_alloc): 6.2s 1,304MB cold This commit: 3.1s 1,078MB cold (-70.7% vs baseline) Warm: 1.3s 464MB warm (-87.4% vs baseline) Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…reduction (#261) Two changes that slash cold peak RSS to 617MB (from 3,678MB baseline): 1. Use c_allocator (not ArenaAllocator) for the temporary TrigramIndex during cold trigram rebuild. ArenaAllocator never frees intermediate allocations (HashMap resizes, ArrayList growth), accumulating ~2x the actual data. c_allocator returns freed pages to OS on every resize. 2. Skip file_words tracking during bulk scan (skip_file_words flag). file_words maps every file→words for removeFile support, but during initial scan no files are removed. Saves ~450MB of compact slices. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold Previous: 3.1s 1,078MB cold This commit: 3.0s 617MB cold (-83.2%) Warm: 1.2s 423MB warm (-88.5%) Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minor additional squeeze: give word index c_allocator during scan for consistent page reclamation, and skip_file_words during bulk indexing. Final benchmarks (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold This session total: 2.8s 606MB cold (-83.5%, -52% speed) Warm (mmap): 1.2s 423MB warm (-88.5%) Remaining 606MB = ~43KB/file (outlines + word_index + c_allocator overhead). Floor without word index is 595MB. Memory breakdown of optimizations: GPA → c_allocator: -1,773MB (page retention eliminated) Stagger word/trigram: -337MB (never coexist in memory) Content cache limit: -170MB (skip dupes beyond 1000) Trigram c_allocator: -472MB (vs ArenaAllocator 2x waste) Skip file_words: -11MB (marginal, balanced by other phase) page_allocator workers: -54MB (content reads munmap'd) Other (shrink, etc): -155MB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Increase default scan worker cap from min(cpu,4) to min(cpu,8). 8 workers slightly faster and uses LESS RSS than 4 (smaller per-worker arenas): 2.47s/608MB → 2.42s/575MB on openclaw. - Refactor trigram rebuild to collect paths into ArrayList first (prep for future parallel trigram build). Final stable benchmarks (openclaw, 13,867 files, 3 runs averaged): v0.2.56: 6.4s 3,678MB (cold) NOW: 2.4s 597MB (cold) -63% speed, -84% RSS Warm: 1.2s 423MB Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone thread-safe trigram extraction and sequential insertion API. extractTrigrams builds a local HashMap(Trigram, PostingMask) from content with no shared state; insertExtracted inserts pre-extracted results. Note: parallel trigram rebuild was tested but caused 8x regression (2.4s→19s) due to per-file HashMap overhead and thread management. Sequential rebuild is already fast because OS caches files from scan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace WordHit { path: []const u8, line_num: u32 } (24 bytes) with
WordHit { doc_id: u32, line_num: u32 } (8 bytes). Add path_to_id +
id_to_path mapping to WordIndex, similar to TrigramIndex.
This saves 16 bytes per word hit. On openclaw (13,867 files, ~21M hits),
warm RSS drops from 423MB to 288MB. Cold RSS unchanged (word index is
freed before trigram peak in staggered build).
Benchmark (openclaw, 13,867 files):
v0.2.56 baseline: 6.4s 3,678MB cold
NOW cold: 2.3s 620MB cold (-83% RSS, -64% speed)
NOW warm: 1.2s 288MB warm (-92% RSS)
Recall: 52/52 intact
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
search/find/tree/outline use trigram index and outlines — the word index is only needed for the `word` command. Skip building it during scan for other commands, eliminating ~0.5s of tokenization + HashMap work. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 6.4s 3,678MB cold Previous: 2.3s 620MB cold / 288MB warm NOW cold: 1.8s 600MB cold (-72% speed vs baseline) NOW warm: 0.7s 219MB warm (-94% RSS vs baseline) Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For `codedb search` cold runs, build trigrams during the initial scan commit loop using worker-arena content instead of re-reading files from disk in a separate pass. Saves ~0.15s of file I/O. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 6.4s 3,678MB cold Previous: 1.8s 600MB cold NOW cold: 1.65s 605MB cold (-74% speed vs baseline) Warm: 0.68s 219MB warm (-94% RSS vs baseline) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For cold `search`, workers just read files without outline parsing (no Explorer creation, no line-by-line symbol extraction). Saves ~53MB RSS and avoids outline allocation overhead. Benchmark (openclaw, 13,867 files, cold search): v0.2.56: 6.4s 3,678MB NOW: 1.49s 552MB (-77% speed, -85% RSS) Warm: 0.68s 219MB (-94% RSS) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-allocate trigram index HashMap to 131K capacity and path_to_id to file count, avoiding resize copies during bulk insert. Also add indexFileReuse that takes a caller-provided local HashMap, reusing it across files via clearRetainingCapacity (eliminates 14K alloc/free cycles for the per-file trigram extraction map). Benchmark (openclaw, 13,867 files, cold search): v0.2.56: 6.4s 3,678MB NOW: 1.43s 547MB (-78% speed, -85% RSS) Warm: 0.68s 219MB (-94% RSS) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… search Workers now read files AND extract trigrams in parallel (trigramExtractWorker), instead of reading in parallel then extracting sequentially. The main thread only does the lean global index merge (insertBulkNew) which skips removeFile and file_trigrams tracking — both unnecessary for cold builds. Cold search: 1.43s/547MB → 0.73s/507MB (2x faster, 8% less RSS) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 161d39e3da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } else { | ||
| self.trigram_index.removeFile(stable_path); | ||
| self.sparse_ngram_index.removeFile(stable_path); | ||
| try self.skip_trigram_files.put(stable_path, {}); |
There was a problem hiding this comment.
Remove skipped-file keys when deleting indexed files
skip_trigram_files.put(stable_path, {}) stores a borrowed slice to stable_path, but Explorer.removeFile() never removes that key before freeing the outline path. searchContent() later iterates skip_trigram_files and dereferences those stale slices, so deleting/renaming a file that was indexed with skip_trigram=true can leave dangling pointers that produce undefined paths and unstable search behavior (including potential crashes).
Useful? React with 👍 / 👎.
| // Skip file_words tracking during bulk scan — saves ~450MB. | ||
| // Only needed for removeFile (incremental re-indexing), not initial scan. | ||
| explorer.word_index.skip_file_words = true; | ||
| if (!needs_word_index) explorer.word_index.enabled = false; |
There was a problem hiding this comment.
Keep word index enabled for serve/snapshot startup scans
needs_word_index is only true for the CLI word command, so this disables word indexing for serve, snapshot, and other startup scans. On a cold codedb serve run (no usable snapshot), /explore/word queries run against an empty index; and codedb snapshot can persist an empty on-disk word index because wordIndexIsComplete() remains true and no rebuild is triggered.
Useful? React with 👍 / 👎.
| // For other commands: outline-only scan, trigrams from disk or rebuild. | ||
| const is_search = std.mem.eql(u8, cmd, "search"); | ||
| if (is_search and !heads_match) { | ||
| const tmp_tri = try watcher.initialScanWithTrigrams(&store, &explorer, root, allocator, std.heap.c_allocator, true); |
There was a problem hiding this comment.
Preserve outline fallback on cold search indexing path
For codedb search when the trigram cache is stale (!heads_match), this call passes skip_outlines=true, so the scan builds only trigrams and leaves explorer.outlines empty. searchContent() depends on outlines for fallback scanning (e.g., short queries and files excluded from trigram indexing such as >64KB files), so cold searches can miss valid matches until a later full scan path is taken.
Useful? React with 👍 / 👎.
Summary
Key optimizations in this release branch (35 commits):
Test plan
zig build testpasses🤖 Generated with Claude Code