perf: skip content cache beyond 1000 files — 7% RSS reduction (#208) by justrach · Pull Request #260 · justrach/codedb

justrach · 2026-04-13T06:28:07Z

Summary

During cold indexing, commitParsedFileOwnedOutline duplicated ALL file contents into a HashMap. The indexes (word, trigram, sparse) consume the content parameter directly — the cache was only needed for readContentForSearch which already falls back to disk.

Skip content storage when outlines.count() > 1000. First 1000 files stay cached for fast search; beyond that, search uses disk reads. Snapshot fast-load uses OUTLINE_STATE (not CONTENT), so startup is unaffected.

Benchmark (openclaw, 13,867 files, cold search)

Version	Time	Peak RSS	Delta
v0.2.56 baseline	6.16s	3,678MB	—
PR#258 (current)	5.66s	3,559MB	-3.2%
This PR	6.07s	3,415MB	-7.2%
Zero cache (limit=0)	5.90s	3,390MB	floor

Content cache accounts for ~170MB. Remaining ~3.3GB is from trigram index posting lists — would need flat array (#208 original scope) or compressed postings for further reduction.

Test plan

All existing tests pass
Benchmarked on openclaw: 3,415MB peak RSS (-7.2% vs baseline)
No speed regression (within variance)
readContentForSearch disk fallback verified working
Snapshot load unaffected (uses OUTLINE_STATE fast path)

🤖 Generated with Claude Code

During cold indexing, commitParsedFileOwnedOutline duped ALL file contents into a HashMap. On openclaw (13K files) this added ~170MB of peak RSS for content alone. The indexes (word, trigram) consume the content parameter directly — the cache is only needed for readContentForSearch which already has a disk fallback. Skip content storage when outline count > 1000. First 1000 files stay cached for fast search; beyond that, search falls back to disk reads. Snapshot fast-load uses OUTLINE_STATE (not CONTENT), so startup is unaffected. Benchmark (openclaw, 13,867 files, cold search): v0.2.56: 3,678MB peak RSS 6.16s pre-clock: 3,559MB peak RSS 5.66s skip-cache: 3,415MB peak RSS 6.07s (-7.2% RSS vs baseline) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-13T06:29:53Z

Benchmark Regression Report

Threshold: 10.00%

Tool	Base (ns)	Head (ns)	Delta	Status
`codedb_bundle`	661116	664396	+0.50%	OK
`codedb_changes`	110276	108506	-1.61%	OK
`codedb_deps`	29908	29494	-1.38%	OK
`codedb_edit`	24764	23026	-7.02%	OK
`codedb_find`	138477	141168	+1.94%	OK
`codedb_hot`	147832	148196	+0.25%	OK
`codedb_outline`	462079	461798	-0.06%	OK
`codedb_read`	142738	144358	+1.13%	OK
`codedb_search`	282027	286296	+1.51%	OK
`codedb_snapshot`	4517878	4496363	-0.48%	OK
`codedb_status`	263375	256919	-2.45%	OK
`codedb_symbol`	64627	64493	-0.21%	OK
`codedb_tree`	87323	86963	-0.41%	OK
`codedb_word`	90962	99448	+9.33%	OK

justrach merged commit 49382dd into release/v0.2.57 Apr 13, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: skip content cache beyond 1000 files — 7% RSS reduction (#208)#260

perf: skip content cache beyond 1000 files — 7% RSS reduction (#208)#260
justrach merged 1 commit intorelease/v0.2.57from
perf/208-skip-content-cache

justrach commented Apr 13, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Apr 13, 2026

Summary

Benchmark (openclaw, 13,867 files, cold search)

Test plan

Uh oh!

Uh oh!

github-actions bot commented Apr 13, 2026

Benchmark Regression Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant