Skip to content

fix/issue 224 symbol line end#225

Open
ocordeiro wants to merge 4 commits intojustrach:mainfrom
ocordeiro:fix/issue-224-symbol-line-end
Open

fix/issue 224 symbol line end#225
ocordeiro wants to merge 4 commits intojustrach:mainfrom
ocordeiro:fix/issue-224-symbol-line-end

Conversation

@ocordeiro
Copy link
Copy Markdown

Fixes #224.

Disclaimer: I'm not a Zig programmer. This patch was produced with Claude Code (Opus 4.6) — I designed the approach, reviewed the code and test output, and validated the fix end-to-end via MCP, but the Zig itself was written by the model under my direction. Please review with that in mind; I'll happily iterate on any idioms, naming, or error handling that don't match the project's conventions.

Summary

  • Parsers in src/explore.zig hardcoded Symbol.line_end = line_num at the signature line, so codedb_symbol body=true (which calls getSymbolBody(path, line_start, line_end, ...)) always returned a single-line slice instead of the symbol body. Affected all 7 languages (Zig, Python, TS/JS, Rust, PHP, Go, Ruby).
  • Added computeSymbolEnds() as a post-parse pass in indexFileInner, run between the streaming loop and the global lock while content is still in scope. Three strategies: brace-balance for C-family languages (with string/comment hygiene), indent for Python, indent + end snapping for Ruby. Parsers themselves are untouched.
  • Naturally single-line kinds (import, variable, constant, type_alias, macro_def, comment_block) are short-circuited. Forward declarations / abstract methods (no { within 10 lines of the signature) stay with line_end == line_start.

Commits

  • 9aeca60 test(issue-224): add failing test for codedb_symbol body=true
  • 8919fb8 fix(issue-224): populate Symbol.line_end via post-parse block scan

The failing test is intentionally on its own commit per the repo's issue workflow (CLAUDE.md).

Test plan

  • zig build test --summary all333/333 passing (324 before + 9 new).
  • New tests cover: Zig fn, Zig struct_def, Python def, Python class, TS function, Rust fn, Go func, PHP function, Ruby def with nested do/end, and an abstract-signature edge case that must stay single-line.
  • End-to-end smoke test via MCP against the built binary: codedb_symbol name=computeSymbolEnds body=true returns the full 36-line body of the new helper (L1939–L1974 in src/explore.zig), not just the signature.
  • Existing tests for codedb_outline, codedb_read, and getSymbolBody still pass (no changes to those code paths).

Notes for reviewers

  • Design choice: a single post-parse pass instead of refactoring each parseXLine. The parsers are line-streaming — giving them the rest of the file would require restructuring all 7, and any future language would need the same care. One fix location covers every current and future parser.
  • Accuracy trade-offs: scanBraceBlock tracks "..." strings and // / /* */ comments, but intentionally does not track single-quoted strings (to avoid breaking Rust lifetimes / Zig char literals) or JS template literals. A '{' inside a char literal or `${` inside a template string could, in theory, throw off the count — accepted as rare imprecision in exchange for a ~40-line implementation. Indent-based fallback for Python/Ruby also trades a small amount of precision (e.g., Ruby heredocs) for simplicity.
  • Stale snapshots: existing codedb.snapshot files on disk were written with line_end == line_start. Users will need to re-index after pulling this fix to see full bodies — worth a line in the next changelog.
  • Pre-existing leak (not in scope): while smoke-testing, I noticed handleSymbol in src/mcp.zig:764 only frees the outer slice from findAllSymbols, leaking the duped path/name/detail strings of each result. It's a separate bug in code this PR doesn't touch — happy to open a follow-up issue.

Alan Cordeiro added 2 commits April 9, 2026 08:48
Multi-line Zig function where line_end should reach the closing brace.
Currently line_end == line_start because parsers in src/explore.zig
hardcode line_end = line_num at the signature line, so getSymbolBody
(used by codedb_symbol body=true) only returns the signature.
Per-language parsers in explore.zig only recorded line_start, leaving
line_end == line_num at the signature line. This made codedb_symbol
body=true return a single-line slice instead of the symbol body.

Add computeSymbolEnds(), run after the streaming parse loop in
indexFileInner and before the global lock. Three strategies:

  - Brace-balance (Zig/C/C++/TS/JS/Rust/Go/PHP): scan forward from the
    signature, skipping "..." strings and // or /* */ comments, until
    the first { is closed. Bail out after 10 lines without an opener
    so abstract/forward declarations stay single-line.
  - Indent (Python): first contiguous run of lines with indent greater
    than the signature's. Tolerates multi-line signatures and blank
    or comment lines inside the body.
  - Indent + end (Ruby): same as Python, plus snap to a standalone
    end at sig indent.

The brace scanner additionally tracks angle-bracket depth for generics
so that '{' / '}' inside '<...>' (e.g. Promise<{ ok: boolean }>, a
generic constraint like <T extends { id: string }>) are ignored. The
filters '=>', '->', '<=', '>=' prevent arrow types and comparisons
from skewing the angle count. Finally, a '}' that drops depth to 0 is
only accepted as the body close once the block has survived at least
one newline — same-line balanced '{...}' are treated as type literals,
destructured defaults, or one-liners that are not the function body.

Naturally single-line kinds (import, variable, constant, type_alias,
macro_def, comment_block) are skipped. struct_def/enum_def/union_def in
Zig go through brace scanning as expected.

Tests: 16 cases in tests.zig covering the original regression plus
one per language (Zig fn, Zig struct_def, Python def, Python class,
TS function, Rust fn, Go func, PHP function, Ruby def), an abstract
signature edge case, and 7 TypeScript/Rust cases exercising the angle
depth + newline-crossing rules (Promise<{...}> return, bare object
return, destructured typed param, generic constraint with object
literal, arrow-type parameter, nested Array<Array<T>> generics, and
Rust trailing return type). zig build test: 340/340 passing.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34486b6478

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Two bugs in the post-parse block scanner identified in PR review:

- scanBraceBlock incremented angle_depth on every `<`, so code like
  `if (a < b) { ... }` entered phantom generic mode and the inner `{`
  was skipped, truncating line_end. Now `<` only opens a generic when
  the previous char is an identifier (Vec<T>, Map<K,V>, impl<T>,
  Foo<{a:string}>), which also covers the `<=`/`<<` cases naturally.

- findPythonBlockEnd classified indented parameter continuation lines
  (`def foo(\n    a,\n    b,\n) -> int:`) as body, and the dedented
  `):` line ended the scan prematurely. It now walks past the signature
  until it finds the line ending in `:` (ignoring inline comments and
  strings via pythonLineEndsWithColon) before starting the body scan.

Regression tests added for Zig/Rust `if a < b { ... }` and for Python
multi-line parenthesized signatures.
Copy link
Copy Markdown
Owner

@justrach justrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two blocking issues that need to be addressed before this should merge.

  1. Existing snapshots keep the broken line_end values, so the fix does not actually apply for many upgraded users. computeSymbolEnds() only runs during indexing, but snapshots serialize and restore line_end unchanged. Startup and MCP project loading both prefer codedb.snapshot when it matches the current git HEAD, so unchanged repos can continue serving single-line symbol bodies indefinitely until a manual reindex. Please either repair line_end on snapshot load / invalidate old snapshots / force recomputation somewhere on load, or otherwise make the upgrade path actually apply the fix.

  2. The brace scanner still truncates bodies on valid syntax because it explicitly does not track backticks or single-quoted literals. I reproduced this locally against the PR branch: a TypeScript function containing const s = `}`; and a Zig function containing const c: u8 = '}'; both produced line_end = 2 instead of the closing brace line. That means codedb_symbol body=true is still wrong on real code, not just pathological input. Please either harden scanBraceBlock() for these cases or narrow the claimed language coverage and add regression tests for the remaining supported cases.

Once those are fixed, I’m happy to re-review.

@codex can you take a look as well?

…Block

scanBraceBlock now skips content inside '...' (char literals) and
`...` (template literals) with escape handling. Previously, a closing
brace inside a Zig char literal (const c: u8 = '}') or a TS template
literal (const s = `}`) was miscounted as the block end, truncating
line_end.

Regression tests added for both cases.
@ocordeiro ocordeiro force-pushed the fix/issue-224-symbol-line-end branch from e1ba6f2 to a37360f Compare April 10, 2026 14:53
@ocordeiro
Copy link
Copy Markdown
Author

ocordeiro commented Apr 10, 2026

@justrach
Thanks for the thorough review!

For scanBraceBlock, I've added single-quote and backtick tracking with escape handling, along with regression tests for Zig char literals ('}') and TS template literals (`}`), in commit a37360f.

For the stale snapshots issue, I have a local fix ready but wanted your input before pushing. I see a few options:

  1. Bump FORMAT_VERSION from 2 to 3: rejects old snapshots and forces a full reindex on next startup. Clean but heavy-handed since the user loses all cached state.
  2. Call computeSymbolEnds inside insertRestoredFile: fixes line_end on load for unchanged files. Targeted, but adds recomputation to a function designed as a fast path.
  3. Post-load sweep in loadSnapshotFast: after restoring all files, iterate the outlines and call computeSymbolEnds for each. Runs once on startup, doesn't touch insertRestoredFile, and corrected values get persisted when the snapshot is saved again.

Which approach do you prefer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: codedb_symbol body=true returns only the signature line — Symbol.line_end is never populated

2 participants