fix/issue 224 symbol line end by ocordeiro · Pull Request #225 · justrach/codedb

ocordeiro · 2026-04-09T15:31:41Z

Fixes #224.

Disclaimer: I'm not a Zig programmer. This patch was produced with Claude Code (Opus 4.6) — I designed the approach, reviewed the code and test output, and validated the fix end-to-end via MCP, but the Zig itself was written by the model under my direction. Please review with that in mind; I'll happily iterate on any idioms, naming, or error handling that don't match the project's conventions.

Summary

Parsers in src/explore.zig hardcoded Symbol.line_end = line_num at the signature line, so codedb_symbol body=true (which calls getSymbolBody(path, line_start, line_end, ...)) always returned a single-line slice instead of the symbol body. Affected all 7 languages (Zig, Python, TS/JS, Rust, PHP, Go, Ruby).
Added computeSymbolEnds() as a post-parse pass in indexFileInner, run between the streaming loop and the global lock while content is still in scope. Three strategies: brace-balance for C-family languages (with string/comment hygiene), indent for Python, indent + end snapping for Ruby. Parsers themselves are untouched.
Naturally single-line kinds (import, variable, constant, type_alias, macro_def, comment_block) are short-circuited. Forward declarations / abstract methods (no { within 10 lines of the signature) stay with line_end == line_start.

Commits

9aeca60 test(issue-224): add failing test for codedb_symbol body=true
8919fb8 fix(issue-224): populate Symbol.line_end via post-parse block scan

The failing test is intentionally on its own commit per the repo's issue workflow (CLAUDE.md).

Test plan

zig build test --summary all → 333/333 passing (324 before + 9 new).
New tests cover: Zig fn, Zig struct_def, Python def, Python class, TS function, Rust fn, Go func, PHP function, Ruby def with nested do/end, and an abstract-signature edge case that must stay single-line.
End-to-end smoke test via MCP against the built binary: codedb_symbol name=computeSymbolEnds body=true returns the full 36-line body of the new helper (L1939–L1974 in src/explore.zig), not just the signature.
Existing tests for codedb_outline, codedb_read, and getSymbolBody still pass (no changes to those code paths).

Notes for reviewers

Design choice: a single post-parse pass instead of refactoring each parseXLine. The parsers are line-streaming — giving them the rest of the file would require restructuring all 7, and any future language would need the same care. One fix location covers every current and future parser.
Accuracy trade-offs: scanBraceBlock tracks "..." strings and // / /* */ comments, but intentionally does not track single-quoted strings (to avoid breaking Rust lifetimes / Zig char literals) or JS template literals. A '{' inside a char literal or `${` inside a template string could, in theory, throw off the count — accepted as rare imprecision in exchange for a ~40-line implementation. Indent-based fallback for Python/Ruby also trades a small amount of precision (e.g., Ruby heredocs) for simplicity.
Stale snapshots: existing codedb.snapshot files on disk were written with line_end == line_start. Users will need to re-index after pulling this fix to see full bodies — worth a line in the next changelog.
Pre-existing leak (not in scope): while smoke-testing, I noticed handleSymbol in src/mcp.zig:764 only frees the outer slice from findAllSymbols, leaking the duped path/name/detail strings of each result. It's a separate bug in code this PR doesn't touch — happy to open a follow-up issue.

Multi-line Zig function where line_end should reach the closing brace. Currently line_end == line_start because parsers in src/explore.zig hardcode line_end = line_num at the signature line, so getSymbolBody (used by codedb_symbol body=true) only returns the signature.

Per-language parsers in explore.zig only recorded line_start, leaving line_end == line_num at the signature line. This made codedb_symbol body=true return a single-line slice instead of the symbol body. Add computeSymbolEnds(), run after the streaming parse loop in indexFileInner and before the global lock. Three strategies: - Brace-balance (Zig/C/C++/TS/JS/Rust/Go/PHP): scan forward from the signature, skipping "..." strings and // or /* */ comments, until the first { is closed. Bail out after 10 lines without an opener so abstract/forward declarations stay single-line. - Indent (Python): first contiguous run of lines with indent greater than the signature's. Tolerates multi-line signatures and blank or comment lines inside the body. - Indent + end (Ruby): same as Python, plus snap to a standalone end at sig indent. The brace scanner additionally tracks angle-bracket depth for generics so that '{' / '}' inside '<...>' (e.g. Promise<{ ok: boolean }>, a generic constraint like <T extends { id: string }>) are ignored. The filters '=>', '->', '<=', '>=' prevent arrow types and comparisons from skewing the angle count. Finally, a '}' that drops depth to 0 is only accepted as the body close once the block has survived at least one newline — same-line balanced '{...}' are treated as type literals, destructured defaults, or one-liners that are not the function body. Naturally single-line kinds (import, variable, constant, type_alias, macro_def, comment_block) are skipped. struct_def/enum_def/union_def in Zig go through brace scanning as expected. Tests: 16 cases in tests.zig covering the original regression plus one per language (Zig fn, Zig struct_def, Python def, Python class, TS function, Rust fn, Go func, PHP function, Ruby def), an abstract signature edge case, and 7 TypeScript/Rust cases exercising the angle depth + newline-crossing rules (Promise<{...}> return, bare object return, destructured typed param, generic constraint with object literal, arrow-type parameter, nested Array<Array<T>> generics, and Rust trailing return type). zig build test: 340/340 passing.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34486b6478

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/explore.zig

Two bugs in the post-parse block scanner identified in PR review: - scanBraceBlock incremented angle_depth on every `<`, so code like `if (a < b) { ... }` entered phantom generic mode and the inner `{` was skipped, truncating line_end. Now `<` only opens a generic when the previous char is an identifier (Vec<T>, Map<K,V>, impl<T>, Foo<{a:string}>), which also covers the `<=`/`<<` cases naturally. - findPythonBlockEnd classified indented parameter continuation lines (`def foo(\n a,\n b,\n) -> int:`) as body, and the dedented `):` line ended the scan prematurely. It now walks past the signature until it finds the line ending in `:` (ignoring inline comments and strings via pythonLineEndsWithColon) before starting the body scan. Regression tests added for Zig/Rust `if a < b { ... }` and for Python multi-line parenthesized signatures.

justrach

I found two blocking issues that need to be addressed before this should merge.

Existing snapshots keep the broken line_end values, so the fix does not actually apply for many upgraded users. computeSymbolEnds() only runs during indexing, but snapshots serialize and restore line_end unchanged. Startup and MCP project loading both prefer codedb.snapshot when it matches the current git HEAD, so unchanged repos can continue serving single-line symbol bodies indefinitely until a manual reindex. Please either repair line_end on snapshot load / invalidate old snapshots / force recomputation somewhere on load, or otherwise make the upgrade path actually apply the fix.
The brace scanner still truncates bodies on valid syntax because it explicitly does not track backticks or single-quoted literals. I reproduced this locally against the PR branch: a TypeScript function containing const s = `}`; and a Zig function containing const c: u8 = '}'; both produced line_end = 2 instead of the closing brace line. That means codedb_symbol body=true is still wrong on real code, not just pathological input. Please either harden scanBraceBlock() for these cases or narrow the claimed language coverage and add regression tests for the remaining supported cases.

Once those are fixed, I’m happy to re-review.

@codex can you take a look as well?

…Block scanBraceBlock now skips content inside '...' (char literals) and `...` (template literals) with escape handling. Previously, a closing brace inside a Zig char literal (const c: u8 = '}') or a TS template literal (const s = `}`) was miscounted as the block end, truncating line_end. Regression tests added for both cases.

ocordeiro · 2026-04-10T14:59:41Z

@justrach
Thanks for the thorough review!

For scanBraceBlock, I've added single-quote and backtick tracking with escape handling, along with regression tests for Zig char literals ('}') and TS template literals (`}`), in commit a37360f.

For the stale snapshots issue, I have a local fix ready but wanted your input before pushing. I see a few options:

Bump FORMAT_VERSION from 2 to 3: rejects old snapshots and forces a full reindex on next startup. Clean but heavy-handed since the user loses all cached state.
Call computeSymbolEnds inside insertRestoredFile: fixes line_end on load for unchanged files. Targeted, but adds recomputation to a function designed as a fast path.
Post-load sweep in loadSnapshotFast: after restoring all files, iterate the outlines and call computeSymbolEnds for each. Runs once on startup, doesn't touch insertRestoredFile, and corrected values get persisted when the snapshot is saved again.

Which approach do you prefer?

Alan Cordeiro added 2 commits April 9, 2026 08:48

chatgpt-codex-connector bot reviewed Apr 9, 2026

View reviewed changes

src/explore.zig Outdated Show resolved Hide resolved

src/explore.zig Show resolved Hide resolved

justrach requested changes Apr 10, 2026

View reviewed changes

ocordeiro force-pushed the fix/issue-224-symbol-line-end branch from e1ba6f2 to a37360f Compare April 10, 2026 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix/issue 224 symbol line end#225

fix/issue 224 symbol line end#225
ocordeiro wants to merge 4 commits intojustrach:mainfrom
ocordeiro:fix/issue-224-symbol-line-end

ocordeiro commented Apr 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

justrach left a comment

Uh oh!

ocordeiro commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ocordeiro commented Apr 9, 2026

Summary

Commits

Test plan

Notes for reviewers

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

justrach left a comment

Choose a reason for hiding this comment

Uh oh!

ocordeiro commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ocordeiro commented Apr 10, 2026 •

edited

Loading