Context
Today, PythonAnalyzer, JavaScriptAnalyzer, and KotlinAnalyzer all use tree-sitter, but each is hand-rolled with duplicated parser setup, query helpers, and traversal logic. Adding a new tree-sitter language (T16: Go, Rust, TypeScript, Ruby, C++) means copying and editing ~300 lines per language. This refactor extracts the shared scaffolding into a base class so each new language is a small subclass declaring only what's actually language-specific.
This is a strictly non-functional refactor — existing analyzer behavior and graph outputs must be byte-identical.
Scope (in)
- New
api/analyzers/tree_sitter_base.py — TreeSitterAnalyzer(AbstractAnalyzer) base class exposing hooks each subclass fills in:
language: tree_sitter.Language
node_type_to_label: dict[str, str] — tree-sitter node type → entity label
query_find_calls: str, query_find_classes: str, query_find_imports: str — tree-sitter query templates
extract_docstring(node) -> str | None
- Migrate the 3 existing analyzers onto the base class:
api/analyzers/python/analyzer.py
api/analyzers/javascript/analyzer.py
api/analyzers/kotlin/analyzer.py
- Documentation — base class docstring describes the contract subclasses must implement.
- Regression guard — new test that indexes a tiny multi-language project and asserts each analyzer produces the same node/edge counts as a recorded baseline.
Scope (out)
- New languages (T16).
- Re-enabling C analyzer (T16).
- Changing graph schema or analyzer outputs.
- Performance optimization.
Files
- new
api/analyzers/tree_sitter_base.py
- modified
api/analyzers/python/analyzer.py
- modified
api/analyzers/javascript/analyzer.py
- modified
api/analyzers/kotlin/analyzer.py
- new
tests/analyzers/test_tree_sitter_base.py
Acceptance criteria
Dependencies
Notes for the implementer
- Start by reading the three existing analyzers side-by-side and listing the duplicated patterns. The base class should absorb exactly those patterns and nothing more — no speculative hooks.
- Run
make test after each analyzer migration, not just at the end. If Python migrates cleanly but JS doesn't, you want to know which step broke it.
- Snapshot the node/edge counts of the test fixtures before starting the refactor; that's your regression baseline.
Context
Today,
PythonAnalyzer,JavaScriptAnalyzer, andKotlinAnalyzerall use tree-sitter, but each is hand-rolled with duplicated parser setup, query helpers, and traversal logic. Adding a new tree-sitter language (T16: Go, Rust, TypeScript, Ruby, C++) means copying and editing ~300 lines per language. This refactor extracts the shared scaffolding into a base class so each new language is a small subclass declaring only what's actually language-specific.This is a strictly non-functional refactor — existing analyzer behavior and graph outputs must be byte-identical.
Scope (in)
api/analyzers/tree_sitter_base.py—TreeSitterAnalyzer(AbstractAnalyzer)base class exposing hooks each subclass fills in:language: tree_sitter.Languagenode_type_to_label: dict[str, str]— tree-sitter node type → entity labelquery_find_calls: str,query_find_classes: str,query_find_imports: str— tree-sitter query templatesextract_docstring(node) -> str | Noneapi/analyzers/python/analyzer.pyapi/analyzers/javascript/analyzer.pyapi/analyzers/kotlin/analyzer.pyScope (out)
Files
api/analyzers/tree_sitter_base.pyapi/analyzers/python/analyzer.pyapi/analyzers/javascript/analyzer.pyapi/analyzers/kotlin/analyzer.pytests/analyzers/test_tree_sitter_base.pyAcceptance criteria
tests/pass unchanged.make lintandmake testclean.Dependencies
Notes for the implementer
make testafter each analyzer migration, not just at the end. If Python migrates cleanly but JS doesn't, you want to know which step broke it.