perf: 8.8x faster cold search, 7.3x less memory by justrach · Pull Request #264 · justrach/codedb

justrach · 2026-04-13T16:51:20Z

Summary

Cold search: 6.4s / 3,678MB → 0.73s / 507MB (8.8x faster, 7.3x less RAM)
Warm search: 0.68s / 219MB (16.8x less RAM vs v0.2.56)
Recall: 52/52 intact, all tests pass

Key optimizations in this release branch (35 commits):

Switch to c_allocator (libc malloc) for better page reclamation
Compact WordHit from 24B → 8B (92% warm RSS reduction)
Lazy word index — skip for commands that don't need it
Single-pass scan+trigram — eliminate file re-reads
Fast read-only workers — skip outline parsing for search
Pre-size trigram HashMap + reusable local map
Parallel trigram extraction — workers read AND extract trigrams in parallel
Lean cold insert (insertBulkNew) — skip removeFile + file_trigrams for cold builds

Test plan

zig build test passes
Recall 10/10 queries match between old and new binary
Cold/warm benchmarks on openclaw-bench (13,867 files)
Binary installed and tested interactively

🤖 Generated with Claude Code

…ale state Fixes #227, #246, #247, #248. Three interrelated bugs in TrigramIndex are fixed together: 1. removeFile (#246): moved path_to_id.remove() before the file_trigrams guard so the mapping is always cleaned even when file_trigrams has no entry (leftover from a partial OOM-failed indexFile). 2. id_to_path growth (#227, #247): removeFile now adds the freed doc_id to a new free_ids freelist and marks the id_to_path slot as "". getOrCreateDocId pops from free_ids first, reusing the old slot instead of appending a new one. After N re-indexes of the same file, id_to_path.items.len stays bounded by the number of unique files ever indexed. 3. PostingList sorted invariant: reused doc_ids are not max, so plain append would break the binary-search invariant. indexFile now detects whether a slot was reused (id_to_path did not grow) and uses getOrAddPosting (sorted insert) for reused doc_ids, keeping append for new files. 4. PostingList.removeDocId (#248): replaced O(n) linear scan with the same binary-search pattern used by getByDocId — O(log n) search + single orderedRemove. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e, snapshot double-open, searchContent O(n) fallback Fixes #249, #250, #251, #252, #253. index.zig (#251): AnyTrigramIndex.candidates and candidatesRegex mmap_overlay branches no longer leak the result ArrayList's backing buffer when toOwnedSlice fails under OOM — explicit deinit on the error path. nuke.zig (#249): rewriteConfigFile now writes to a {path}.tmp file first and renames atomically, preventing an empty config file if the process is killed mid-write. Callers updated to thread the allocator through. explore.zig (#252): commitParsedFileOwnedOutline adds an errdefer immediately after word_index.indexFile so that a subsequent trigram_index OOM failure rolls back word_index to the previous content, keeping the two indexes in sync. explore.zig (#250): Explorer gains a skip_trigram_files StringHashMap. Files indexed with skip_trigram=true are tracked in this set; the searchContent fallback loop now iterates only skip_trigram_files instead of all outlines, reducing the fallback from O(all files) to O(skip-trigram files). snapshot.zig (#253): extracted readSectionsFromFile(file, allocator) helper so both readSections and readSectionBytes share the header-parsing logic. readSectionBytes now opens the file once and calls the helper, eliminating the redundant second openFile call for each section read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…obustness watcher.zig (#254): incrementalLoop now stats .git/HEAD mtime before spawning git rev-parse HEAD. If the mtime is unchanged we skip the fork+exec entirely, eliminating a 2-second-cadence subprocess that accounted for the majority of codedb's background CPU on large repos. watcher.zig: EventQueue.head/tail were std.atomic.Value(usize) even though every access (push and pop) already holds self.mu. Replaced with plain usize fields; the mutex provides all required ordering guarantees. store.zig: Store.seq was std.atomic.Value(u64) even though the only mutation site (appendVersion) holds self.mu. Changed to a plain u64; currentSeq() now also acquires the mutex so the type is correct. snapshot.zig: readSectionString limit raised from 4096 to std.math.maxInt(u16) so symbol names longer than 4 KiB are accepted. loadSnapshotFast treats a corrupt OUTLINE_STATE section as an empty map rather than propagating the error, matching the ba13aed fix on the feature/243 branch. lib.zig: export snapshot module so callers can reach it through lib. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…llback Two new tests covering the fixes that landed in this branch: 1. "snapshot: symbol detail longer than 4096 bytes survives round-trip" Indexes a function whose signature line is ~5000 chars, writes and reloads the snapshot. Guards against readSectionString rejecting details > 4096 bytes (the pre-fix max_len). 2. "snapshot: corrupted OUTLINE_STATE section falls back to CONTENT load" Overwrites OUTLINE_STATE bytes with 0xFF after writeSnapshot, then calls loadSnapshot. The catch fallback must produce an empty outline map so loadSnapshotFast re-indexes all files from CONTENT instead. Verifies loadSnapshot returns true and symbols remain findable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds comprehensive v0.2.57 CHANGELOG entry covering worker-local indexing (10×), full nuke/uninstall, MCP timeout fix, Rosetta stack fix, help CLI fix, and all 9 correctness fixes (index id growth, stale entries, git HEAD mtime gate, atomic removal, snapshot robustness). Adds src/benchmark.zig with `zig build benchmark -- --root /path/to/repo` measuring index time, query latency, re-index slot reuse, and .git/HEAD mtime gate effectiveness. Updates README with openclaw benchmark table. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…searchContent word_index fallback, Symbol.line_end population - #210: release raw file contents after ProjectCache snapshot load (4.5GB→~200MB) - #228: check mtime/size before re-indexing in drainNotifyFile, skip unchanged files - #253: loadSnapshotValidated opens snapshot file once instead of 5 times - #250: searchContent uses word_index to narrow fallback from O(files) to O(word hits) - #224: computeSymbolEnds post-processing populates Symbol.line_end for brace/indent/Ruby languages; codedb_symbol body=true now returns full function body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: resolve 5 remaining P2 issues (#210, #224, #228, #250, #253)

…ng (#108, #215, #216) - #216: add missing php/ruby/hcl/r entries to telemetry writeLanguages array - #108: add HCL language support — resource, data, module, variable, output, provider, locals, terraform blocks; .tf/.tfvars/.hcl detection; #//* block comment handling; .terragrunt-cache in skip_dirs - #215: add R language support — function assignment (<-/=), setClass/setRefClass, library/require imports; .r/.R detection; # comment handling - 10 new tests covering all HCL and R parser paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…etry feat: HCL/Terraform + R language support, telemetry fix (#108, #215, #216)

- Python docstring: replace naive triple-quote count with position-aware detection — properly handles inline docstrings ("""text"""), opening docstrings with text ("""starts here), and multi-line docstrings containing def/class lines - Snapshot JSON: use writeJsonEscaped for path interpolation in snapshot writer — prevents cache corruption for files with ", \, or control characters in paths Note: 4 of 8 bugs from #179 were already fixed in prior commits (C/C++ block comments, u16 truncation, ANSI strip, telemetry race) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: Python docstring detection and snapshot JSON injection (#179)

) Explorer.contents was an unbounded StringHashMap holding all file contents in memory (1.7GB peak RSS on 5K-file repos). Replace with a fixed-size ContentCache using CLOCK (second-chance) eviction: - 4096-entry slot array with reference bits - O(1) path→slot lookup via StringHashMap - Hot files (recently searched/read) stay cached - Cold files evicted on sweep, readContentForSearch falls back to disk - Prior content duped before cache eviction to preserve #252 errdefer word_index restoration on OOM Expected: peak RSS drops from ~1.7GB to ~200MB on large repos while maintaining identical query behavior (cache misses served from disk). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

perf: CLOCK eviction cache for file contents (#208)

This reverts commit 40183a0, reversing changes made to 3473138.

During cold indexing, commitParsedFileOwnedOutline duped ALL file contents into a HashMap. On openclaw (13K files) this added ~170MB of peak RSS for content alone. The indexes (word, trigram) consume the content parameter directly — the cache is only needed for readContentForSearch which already has a disk fallback. Skip content storage when outline count > 1000. First 1000 files stay cached for fast search; beyond that, search falls back to disk reads. Snapshot fast-load uses OUTLINE_STATE (not CONTENT), so startup is unaffected. Benchmark (openclaw, 13,867 files, cold search): v0.2.56: 3,678MB peak RSS 6.16s pre-clock: 3,559MB peak RSS 5.66s skip-cache: 3,415MB peak RSS 6.07s (-7.2% RSS vs baseline) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

perf: skip content cache beyond 1000 files — 7% RSS reduction (#208)

Add shrinkPostingLists() to TrigramIndex and shrinkAllocations() to WordIndex. Both release ArrayList over-allocation (capacity > length) after initial scan completes. This reduces steady-state RSS for long-running MCP servers by reclaiming ~50% of ArrayList capacity waste. Note: peak RSS during cold indexing is unchanged — the shrink runs after the peak. The peak is dominated by GPA page retention from alloc/free churn during indexing. Further reduction would require a custom allocator or pre-sized flat storage. Benchmark (openclaw, 13,867 files): Peak RSS unchanged (3,415MB) — expected, shrink runs after peak Recall: 52/52 for 'handleRequest' — no false negatives An earlier cap approach (MAX_POSTINGS=512) saved 243MB peak but dropped recall to 2/52 — reverted in favor of shrink-only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

perf: shrink index allocations after scan — reduce steady-state RSS (#261)

Three changes to reduce RSS: 1. Back worker arenas with page_allocator instead of GPA — mmap pages are returned to OS immediately on arena deinit (no GPA retention) 2. Free each worker's arena right after committing its results instead of holding all workers' data simultaneously 3. shrinkPostingLists/shrinkAllocations on trigram + word indexes after scan to release ArrayList over-allocation Benchmark (openclaw, 13,867 files, cold search): v0.2.56 baseline: 3,678MB 5.82s PR#260 (content): 3,415MB 5.26s This PR: 3,361MB 5.64s (-8.6% RSS vs baseline) Recall: 52/52 — no false negatives The remaining ~3.3GB is genuinely live index data (trigram posting lists + word index hits). Further reduction needs flat array storage or compressed postings (tracked in #261). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

perf: page_allocator for worker arenas + eager free + index shrink (#261)

…wap (#261) Three changes that together reduce RSS by 80% on warm runs: 1. Compact file_words: replace inner StringHashMap(void) per file with []const []const u8 slices — saves ~70KB→32KB per file (14K files = ~530MB theoretical savings from eliminating HashMap bucket arrays) 2. page_allocator arena for word index words_set: temporary per-file HashMap uses mmap-backed arena so pages are returned to OS immediately instead of GPA retention 3. CLI mmap swap: after cold indexing + writeToDisk, immediately load trigram index as MmapTrigramIndex and release the heap version. Also call releaseContents + shrinkAllocations on the CLI path. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB (cold) Previous (PR#263): 5.6s 3,361MB (cold, -8.6%) This commit cold: 6.4s 3,209MB (cold, -12.8%) This commit warm: 1.4s 741MB (warm, -79.8%) The cold peak (3.2GB) is from heap trigram index during initial build. Subsequent runs use mmap (741MB) — the realistic MCP server scenario. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… RSS reduction (#261) The GPA (GeneralPurposeAllocator) was retaining ~1.8GB of dead pages during cold indexing — pages freed by HashMap resizes, ArrayList growth, and content read/free cycles were never returned to the OS. Switching to c_allocator (libc malloc) lets macOS's magazine allocator reclaim freed pages via madvise(MADV_FREE). Also: indexFileContent now uses a page_allocator-backed arena for file content reads, ensuring content pages are munmap'd immediately after indexing each file. And cold CLI path skips trigrams during scan, builds them file-by-file afterward to avoid holding all three indexes at once. Benchmark (openclaw, 13,867 files): v0.2.56 GPA baseline: 5.8s 3,678MB cold All fixes + GPA: 6.6s 3,188MB cold (-13%) All fixes + c_alloc: 6.0s 1,415MB cold (-61.5%) Warm (mmap + c_alloc): 1.3s 486MB warm (-86.8%) Recall: 52/52 — intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#261) During cold CLI runs, build trigrams into a separate TrigramIndex backed by a page_allocator arena. After writing to disk, arena.deinit() returns ALL trigram pages to the OS via munmap — the trigram heap never coexists with word index peak allocations. Also shrink word index BEFORE trigram rebuild to release ArrayList capacity waste early. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold Previous (c_alloc): 6.0s 1,415MB cold This commit: 6.2s 1,304MB cold (-64.5%) Warm: 2.9s 463MB warm (-87.4%) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…261) During cold CLI runs, persist word index to disk then free it BEFORE building trigrams. After trigrams are written + mmap-swapped, reload word index from disk. This prevents word_index (~500MB) and trigram index (~400MB) from coexisting in memory simultaneously. The staggered approach also makes cold runs 46% faster because the trigram arena operates with more available memory (less GC pressure). Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold Previous (c_alloc): 6.2s 1,304MB cold This commit: 3.1s 1,078MB cold (-70.7% vs baseline) Warm: 1.3s 464MB warm (-87.4% vs baseline) Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…reduction (#261) Two changes that slash cold peak RSS to 617MB (from 3,678MB baseline): 1. Use c_allocator (not ArenaAllocator) for the temporary TrigramIndex during cold trigram rebuild. ArenaAllocator never frees intermediate allocations (HashMap resizes, ArrayList growth), accumulating ~2x the actual data. c_allocator returns freed pages to OS on every resize. 2. Skip file_words tracking during bulk scan (skip_file_words flag). file_words maps every file→words for removeFile support, but during initial scan no files are removed. Saves ~450MB of compact slices. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold Previous: 3.1s 1,078MB cold This commit: 3.0s 617MB cold (-83.2%) Warm: 1.2s 423MB warm (-88.5%) Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Minor additional squeeze: give word index c_allocator during scan for consistent page reclamation, and skip_file_words during bulk indexing. Final benchmarks (openclaw, 13,867 files): v0.2.56 baseline: 5.8s 3,678MB cold This session total: 2.8s 606MB cold (-83.5%, -52% speed) Warm (mmap): 1.2s 423MB warm (-88.5%) Remaining 606MB = ~43KB/file (outlines + word_index + c_allocator overhead). Floor without word index is 595MB. Memory breakdown of optimizations: GPA → c_allocator: -1,773MB (page retention eliminated) Stagger word/trigram: -337MB (never coexist in memory) Content cache limit: -170MB (skip dupes beyond 1000) Trigram c_allocator: -472MB (vs ArenaAllocator 2x waste) Skip file_words: -11MB (marginal, balanced by other phase) page_allocator workers: -54MB (content reads munmap'd) Other (shrink, etc): -155MB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Increase default scan worker cap from min(cpu,4) to min(cpu,8). 8 workers slightly faster and uses LESS RSS than 4 (smaller per-worker arenas): 2.47s/608MB → 2.42s/575MB on openclaw. - Refactor trigram rebuild to collect paths into ArrayList first (prep for future parallel trigram build). Final stable benchmarks (openclaw, 13,867 files, 3 runs averaged): v0.2.56: 6.4s 3,678MB (cold) NOW: 2.4s 597MB (cold) -63% speed, -84% RSS Warm: 1.2s 423MB Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Standalone thread-safe trigram extraction and sequential insertion API. extractTrigrams builds a local HashMap(Trigram, PostingMask) from content with no shared state; insertExtracted inserts pre-extracted results. Note: parallel trigram rebuild was tested but caused 8x regression (2.4s→19s) due to per-file HashMap overhead and thread management. Sequential rebuild is already fast because OS caches files from scan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace WordHit { path: []const u8, line_num: u32 } (24 bytes) with WordHit { doc_id: u32, line_num: u32 } (8 bytes). Add path_to_id + id_to_path mapping to WordIndex, similar to TrigramIndex. This saves 16 bytes per word hit. On openclaw (13,867 files, ~21M hits), warm RSS drops from 423MB to 288MB. Cold RSS unchanged (word index is freed before trigram peak in staggered build). Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 6.4s 3,678MB cold NOW cold: 2.3s 620MB cold (-83% RSS, -64% speed) NOW warm: 1.2s 288MB warm (-92% RSS) Recall: 52/52 intact Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

search/find/tree/outline use trigram index and outlines — the word index is only needed for the `word` command. Skip building it during scan for other commands, eliminating ~0.5s of tokenization + HashMap work. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 6.4s 3,678MB cold Previous: 2.3s 620MB cold / 288MB warm NOW cold: 1.8s 600MB cold (-72% speed vs baseline) NOW warm: 0.7s 219MB warm (-94% RSS vs baseline) Recall: 52/52 intact. All tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

For `codedb search` cold runs, build trigrams during the initial scan commit loop using worker-arena content instead of re-reading files from disk in a separate pass. Saves ~0.15s of file I/O. Benchmark (openclaw, 13,867 files): v0.2.56 baseline: 6.4s 3,678MB cold Previous: 1.8s 600MB cold NOW cold: 1.65s 605MB cold (-74% speed vs baseline) Warm: 0.68s 219MB warm (-94% RSS vs baseline) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

For cold `search`, workers just read files without outline parsing (no Explorer creation, no line-by-line symbol extraction). Saves ~53MB RSS and avoids outline allocation overhead. Benchmark (openclaw, 13,867 files, cold search): v0.2.56: 6.4s 3,678MB NOW: 1.49s 552MB (-77% speed, -85% RSS) Warm: 0.68s 219MB (-94% RSS) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pre-allocate trigram index HashMap to 131K capacity and path_to_id to file count, avoiding resize copies during bulk insert. Also add indexFileReuse that takes a caller-provided local HashMap, reusing it across files via clearRetainingCapacity (eliminates 14K alloc/free cycles for the per-file trigram extraction map). Benchmark (openclaw, 13,867 files, cold search): v0.2.56: 6.4s 3,678MB NOW: 1.43s 547MB (-78% speed, -85% RSS) Warm: 0.68s 219MB (-94% RSS) Recall: 52/52 intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… search Workers now read files AND extract trigrams in parallel (trigramExtractWorker), instead of reading in parallel then extracting sequentially. The main thread only does the lean global index merge (insertBulkNew) which skips removeFile and file_trigrams tracking — both unnecessary for cold builds. Cold search: 1.43s/547MB → 0.73s/507MB (2x faster, 8% less RSS) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-13T16:53:05Z

Benchmark Regression Report

Threshold: 10.00%

Tool	Base (ns)	Head (ns)	Delta	Status
`codedb_bundle`	748520	707088	-5.54%	OK
`codedb_changes`	120173	120164	-0.01%	OK
`codedb_deps`	33781	33962	+0.54%	OK
`codedb_edit`	20118	26307	+30.76%	FAIL
`codedb_find`	155668	152289	-2.17%	OK
`codedb_hot`	160089	161455	+0.85%	OK
`codedb_outline`	449450	463993	+3.24%	OK
`codedb_read`	167590	159275	-4.96%	OK
`codedb_search`	311956	312327	+0.12%	OK
`codedb_snapshot`	4267795	4624880	+8.37%	OK
`codedb_status`	157792	169644	+7.51%	OK
`codedb_symbol`	69681	67984	-2.44%	OK
`codedb_tree`	94182	141470	+50.21%	FAIL
`codedb_word`	103230	132474	+28.33%	FAIL

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 161d39e3da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-13T16:57:03Z

src/explore.zig

            } else {
                self.trigram_index.removeFile(stable_path);
                self.sparse_ngram_index.removeFile(stable_path);
+                try self.skip_trigram_files.put(stable_path, {});


Remove skipped-file keys when deleting indexed files

skip_trigram_files.put(stable_path, {}) stores a borrowed slice to stable_path, but Explorer.removeFile() never removes that key before freeing the outline path. searchContent() later iterates skip_trigram_files and dereferences those stale slices, so deleting/renaming a file that was indexed with skip_trigram=true can leave dangling pointers that produce undefined paths and unstable search behavior (including potential crashes).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-13T16:57:03Z

src/main.zig

+            // Skip file_words tracking during bulk scan — saves ~450MB.
+            // Only needed for removeFile (incremental re-indexing), not initial scan.
+            explorer.word_index.skip_file_words = true;
+            if (!needs_word_index) explorer.word_index.enabled = false;


Keep word index enabled for serve/snapshot startup scans

needs_word_index is only true for the CLI word command, so this disables word indexing for serve, snapshot, and other startup scans. On a cold codedb serve run (no usable snapshot), /explore/word queries run against an empty index; and codedb snapshot can persist an empty on-disk word index because wordIndexIsComplete() remains true and no rebuild is triggered.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-13T16:57:04Z

src/main.zig

+            // For other commands: outline-only scan, trigrams from disk or rebuild.
+            const is_search = std.mem.eql(u8, cmd, "search");
+            if (is_search and !heads_match) {
+                const tmp_tri = try watcher.initialScanWithTrigrams(&store, &explorer, root, allocator, std.heap.c_allocator, true);


Preserve outline fallback on cold search indexing path

For codedb search when the trigram cache is stale (!heads_match), this call passes skip_outlines=true, so the scan builds only trigrams and leaves explorer.outlines empty. searchContent() depends on outlines for fallback scanning (e.g., short queries and files excluded from trigram indexing such as >64KB files), so cold searches can miss valid matches until a later full scan path is taken.

Useful? React with 👍 / 👎.

justrach and others added 30 commits April 12, 2026 09:46

fix: harden self-update and scope nuke cleanup

ae8b099

Merge pull request #256 from justrach/fix/remaining-p2-issues

ea1d520

fix: resolve 5 remaining P2 issues (#210, #224, #228, #250, #253)

Merge pull request #257 from justrach/feat/language-support-and-telem…

3859284

…etry feat: HCL/Terraform + R language support, telemetry fix (#108, #215, #216)

Merge pull request #258 from justrach/fix/179-bugs-batch

3473138

fix: Python docstring detection and snapshot JSON injection (#179)

Merge pull request #259 from justrach/perf/208-clock-cache

40183a0

perf: CLOCK eviction cache for file contents (#208)

Revert "Merge pull request #259 from justrach/perf/208-clock-cache"

5a5ee8a

This reverts commit 40183a0, reversing changes made to 3473138.

Merge pull request #260 from justrach/perf/208-skip-content-cache

49382dd

perf: skip content cache beyond 1000 files — 7% RSS reduction (#208)

Merge pull request #262 from justrach/perf/261-trigram-rss

42e8700

perf: shrink index allocations after scan — reduce steady-state RSS (#261)

Merge pull request #263 from justrach/perf/261-trigram-rss

cdcdaa3

perf: page_allocator for worker arenas + eager free + index shrink (#261)

justrach and others added 5 commits April 13, 2026 16:34

chatgpt-codex-connector bot reviewed Apr 13, 2026

View reviewed changes

justrach merged commit 0a73acf into main Apr 13, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: 8.8x faster cold search, 7.3x less memory#264

perf: 8.8x faster cold search, 7.3x less memory#264
justrach merged 35 commits intomainfrom
release/v0.2.57

justrach commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Apr 13, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 13, 2026

Benchmark Regression Report

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant