MCP T15 — Tree-sitter analyzer base class refactor

## Context

Today, `PythonAnalyzer`, `JavaScriptAnalyzer`, and `KotlinAnalyzer` all use tree-sitter, but each is hand-rolled with duplicated parser setup, query helpers, and traversal logic. Adding a new tree-sitter language (T16: Go, Rust, TypeScript, Ruby, C++) means copying and editing ~300 lines per language. This refactor extracts the shared scaffolding into a base class so each new language is a small subclass declaring only what's actually language-specific.

This is a **strictly non-functional refactor** — existing analyzer behavior and graph outputs must be byte-identical.

## Scope (in)

1. **New `api/analyzers/tree_sitter_base.py`** — `TreeSitterAnalyzer(AbstractAnalyzer)` base class exposing hooks each subclass fills in:
   - `language: tree_sitter.Language`
   - `node_type_to_label: dict[str, str]` — tree-sitter node type → entity label
   - `query_find_calls: str`, `query_find_classes: str`, `query_find_imports: str` — tree-sitter query templates
   - `extract_docstring(node) -> str | None`
2. **Migrate the 3 existing analyzers** onto the base class:
   - `api/analyzers/python/analyzer.py`
   - `api/analyzers/javascript/analyzer.py`
   - `api/analyzers/kotlin/analyzer.py`
3. **Documentation** — base class docstring describes the contract subclasses must implement.
4. **Regression guard** — new test that indexes a tiny multi-language project and asserts each analyzer produces the same node/edge counts as a recorded baseline.

## Scope (out)

- New languages (T16).
- Re-enabling C analyzer (T16).
- Changing graph schema or analyzer outputs.
- Performance optimization.

## Files

- new `api/analyzers/tree_sitter_base.py`
- modified `api/analyzers/python/analyzer.py`
- modified `api/analyzers/javascript/analyzer.py`
- modified `api/analyzers/kotlin/analyzer.py`
- new `tests/analyzers/test_tree_sitter_base.py`

## Acceptance criteria

- [ ] All existing analyzer tests in `tests/` pass unchanged.
- [ ] Each migrated analyzer file is shorter than before and contains no parser-setup boilerplate.
- [ ] New base class is documented with a clear docstring describing the subclass contract.
- [ ] Regression test indexes a tiny multi-language fixture and asserts node/edge counts match the recorded baseline (catches any silent behavior change).
- [ ] `make lint` and `make test` clean.
- [ ] No changes to graph schema (labels, relations) — verifiable by diffing fixture-graph snapshots before/after.

## Dependencies

- #648 (T1 — scaffold) — soft dep, only to avoid merge conflicts with the MCP module work.

## Notes for the implementer

- Start by reading the three existing analyzers side-by-side and listing the duplicated patterns. The base class should absorb exactly those patterns and nothing more — no speculative hooks.
- Run `make test` after each analyzer migration, not just at the end. If Python migrates cleanly but JS doesn't, you want to know which step broke it.
- Snapshot the node/edge counts of the test fixtures **before** starting the refactor; that's your regression baseline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP T15 — Tree-sitter analyzer base class refactor #663

Context

Scope (in)

Scope (out)

Files

Acceptance criteria

Dependencies

Notes for the implementer

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MCP T15 — Tree-sitter analyzer base class refactor #663

Description

Context

Scope (in)

Scope (out)

Files

Acceptance criteria

Dependencies

Notes for the implementer

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions