Conversation
Multi-line Zig function where line_end should reach the closing brace. Currently line_end == line_start because parsers in src/explore.zig hardcode line_end = line_num at the signature line, so getSymbolBody (used by codedb_symbol body=true) only returns the signature.
Per-language parsers in explore.zig only recorded line_start, leaving
line_end == line_num at the signature line. This made codedb_symbol
body=true return a single-line slice instead of the symbol body.
Add computeSymbolEnds(), run after the streaming parse loop in
indexFileInner and before the global lock. Three strategies:
- Brace-balance (Zig/C/C++/TS/JS/Rust/Go/PHP): scan forward from the
signature, skipping "..." strings and // or /* */ comments, until
the first { is closed. Bail out after 10 lines without an opener
so abstract/forward declarations stay single-line.
- Indent (Python): first contiguous run of lines with indent greater
than the signature's. Tolerates multi-line signatures and blank
or comment lines inside the body.
- Indent + end (Ruby): same as Python, plus snap to a standalone
end at sig indent.
The brace scanner additionally tracks angle-bracket depth for generics
so that '{' / '}' inside '<...>' (e.g. Promise<{ ok: boolean }>, a
generic constraint like <T extends { id: string }>) are ignored. The
filters '=>', '->', '<=', '>=' prevent arrow types and comparisons
from skewing the angle count. Finally, a '}' that drops depth to 0 is
only accepted as the body close once the block has survived at least
one newline — same-line balanced '{...}' are treated as type literals,
destructured defaults, or one-liners that are not the function body.
Naturally single-line kinds (import, variable, constant, type_alias,
macro_def, comment_block) are skipped. struct_def/enum_def/union_def in
Zig go through brace scanning as expected.
Tests: 16 cases in tests.zig covering the original regression plus
one per language (Zig fn, Zig struct_def, Python def, Python class,
TS function, Rust fn, Go func, PHP function, Ruby def), an abstract
signature edge case, and 7 TypeScript/Rust cases exercising the angle
depth + newline-crossing rules (Promise<{...}> return, bare object
return, destructured typed param, generic constraint with object
literal, arrow-type parameter, nested Array<Array<T>> generics, and
Rust trailing return type). zig build test: 340/340 passing.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 34486b6478
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Two bugs in the post-parse block scanner identified in PR review:
- scanBraceBlock incremented angle_depth on every `<`, so code like
`if (a < b) { ... }` entered phantom generic mode and the inner `{`
was skipped, truncating line_end. Now `<` only opens a generic when
the previous char is an identifier (Vec<T>, Map<K,V>, impl<T>,
Foo<{a:string}>), which also covers the `<=`/`<<` cases naturally.
- findPythonBlockEnd classified indented parameter continuation lines
(`def foo(\n a,\n b,\n) -> int:`) as body, and the dedented
`):` line ended the scan prematurely. It now walks past the signature
until it finds the line ending in `:` (ignoring inline comments and
strings via pythonLineEndsWithColon) before starting the body scan.
Regression tests added for Zig/Rust `if a < b { ... }` and for Python
multi-line parenthesized signatures.
justrach
left a comment
There was a problem hiding this comment.
I found two blocking issues that need to be addressed before this should merge.
-
Existing snapshots keep the broken
line_endvalues, so the fix does not actually apply for many upgraded users.computeSymbolEnds()only runs during indexing, but snapshots serialize and restoreline_endunchanged. Startup and MCP project loading both prefercodedb.snapshotwhen it matches the current git HEAD, so unchanged repos can continue serving single-line symbol bodies indefinitely until a manual reindex. Please either repairline_endon snapshot load / invalidate old snapshots / force recomputation somewhere on load, or otherwise make the upgrade path actually apply the fix. -
The brace scanner still truncates bodies on valid syntax because it explicitly does not track backticks or single-quoted literals. I reproduced this locally against the PR branch: a TypeScript function containing
const s = `}`;and a Zig function containingconst c: u8 = '}';both producedline_end = 2instead of the closing brace line. That meanscodedb_symbol body=trueis still wrong on real code, not just pathological input. Please either hardenscanBraceBlock()for these cases or narrow the claimed language coverage and add regression tests for the remaining supported cases.
Once those are fixed, I’m happy to re-review.
@codex can you take a look as well?
…Block scanBraceBlock now skips content inside '...' (char literals) and `...` (template literals) with escape handling. Previously, a closing brace inside a Zig char literal (const c: u8 = '}') or a TS template literal (const s = `}`) was miscounted as the block end, truncating line_end. Regression tests added for both cases.
e1ba6f2 to
a37360f
Compare
|
@justrach For For the stale snapshots issue, I have a local fix ready but wanted your input before pushing. I see a few options:
Which approach do you prefer? |
Fixes #224.
Summary
src/explore.zighardcodedSymbol.line_end = line_numat the signature line, socodedb_symbol body=true(which callsgetSymbolBody(path, line_start, line_end, ...)) always returned a single-line slice instead of the symbol body. Affected all 7 languages (Zig, Python, TS/JS, Rust, PHP, Go, Ruby).computeSymbolEnds()as a post-parse pass inindexFileInner, run between the streaming loop and the global lock whilecontentis still in scope. Three strategies: brace-balance for C-family languages (with string/comment hygiene), indent for Python, indent +endsnapping for Ruby. Parsers themselves are untouched.import,variable,constant,type_alias,macro_def,comment_block) are short-circuited. Forward declarations / abstract methods (no{within 10 lines of the signature) stay withline_end == line_start.Commits
9aeca60test(issue-224): add failing test for codedb_symbol body=true8919fb8fix(issue-224): populate Symbol.line_end via post-parse block scanThe failing test is intentionally on its own commit per the repo's issue workflow (
CLAUDE.md).Test plan
zig build test --summary all→ 333/333 passing (324 before + 9 new).do/end, and an abstract-signature edge case that must stay single-line.codedb_symbol name=computeSymbolEnds body=truereturns the full 36-line body of the new helper (L1939–L1974 insrc/explore.zig), not just the signature.codedb_outline,codedb_read, andgetSymbolBodystill pass (no changes to those code paths).Notes for reviewers
parseXLine. The parsers are line-streaming — giving them the rest of the file would require restructuring all 7, and any future language would need the same care. One fix location covers every current and future parser.scanBraceBlocktracks"..."strings and////* */comments, but intentionally does not track single-quoted strings (to avoid breaking Rust lifetimes / Zig char literals) or JS template literals. A'{'inside a char literal or`${`inside a template string could, in theory, throw off the count — accepted as rare imprecision in exchange for a ~40-line implementation. Indent-based fallback for Python/Ruby also trades a small amount of precision (e.g., Ruby heredocs) for simplicity.codedb.snapshotfiles on disk were written withline_end == line_start. Users will need to re-index after pulling this fix to see full bodies — worth a line in the next changelog.handleSymbolinsrc/mcp.zig:764only frees the outer slice fromfindAllSymbols, leaking the dupedpath/name/detailstrings of each result. It's a separate bug in code this PR doesn't touch — happy to open a follow-up issue.