Skip to content

[MCP T9] GraphRAG init module (reuse existing ontology) #657

@DvirDukhan

Description

@DvirDukhan

Phase 1 ticket T9. Depends on #648 (T1 scaffold).

Context

The MCP ask tool (T11) needs a KnowledgeGraph per (project, branch) to call kg.ask(). Building one is non-trivial: it needs an ontology, a model, and prompt configuration. All three already exist in api/llm.py, but the entry point is private and the existing module-level singleton bakes in a single project. This ticket extracts and exposes the construction logic so the MCP tool can call it cleanly with caching.

Critical reuse requirement: The existing hand-coded ontology in api/llm.py:_define_ontology() (lines 26–233) is 200+ lines describing File/Class/Function/Interface entities with rich attributes and DEFINES/CALLS/EXTENDS relations. Do NOT replace it with Ontology.from_kg_graph() — the auto-extracted version loses descriptions and attribute hints that the LLM uses to generate good Cypher.

Scope

In:

  • Refactor api/llm.py: rename _define_ontologydefine_ontology (drop the leading underscore so it's importable). Update the module-level ontology = _define_ontology() and any other internal callers (likely _create_kg_agent at api/llm.py:238-258, and any caller in api/index.py).
  • New api/mcp/graphrag_init.py exposing:
    def get_or_create_kg(project_name: str, branch: str = "_default") -> KnowledgeGraph: ...
    • Caches KnowledgeGraph instances per (project, branch) in a module-level dict.
    • Constructs LiteModel(model_name=os.getenv("MODEL_NAME", "gemini/gemini-flash-lite-latest")).
    • Wraps in KnowledgeGraphModelConfig.with_model(model).
    • Imports define_ontology from api.llm (the renamed function).
    • Imports prompts from T10's api/mcp/code_prompts.py. Until T10 lands, import directly from api/prompts.py and add a TODO comment to swap in T11.
    • Uses graph name code:{project_name}:{branch} to match the per-branch convention from T17.
  • Tests in tests/mcp/test_graphrag_init.py:
    • Unit test: get_or_create_kg("sample", "_default") returns a KnowledgeGraph configured with the existing ontology and a LiteModel (mocked).
    • Cache test: second call with the same args returns the same instance (identity check).
    • Different (project, branch) yields different instances.
    • Existing api/llm.py chat path still works after the rename — verified by running existing tests in tests/.

Out:

  • The ask tool itself (T11).
  • Prompt overrides (T10).
  • Per-branch awareness in the existing FastAPI /api/chat endpoint (out of scope; that's a separate refactor).

Files to create / modify

  • api/llm.py — rename _define_ontologydefine_ontology; update internal callers
  • api/index.py — update any callers of _define_ontology if present
  • new api/mcp/graphrag_init.py
  • new tests/mcp/test_graphrag_init.py

Acceptance criteria

  • api/llm.py exports define_ontology (no leading underscore).
  • All existing tests in tests/ pass after the rename — no regression to the existing chat endpoint.
  • get_or_create_kg("foo", "_default") returns a KnowledgeGraph constructed with the hand-coded ontology and a LiteModel.
  • Second call with the same (project, branch) returns the cached instance (identity check passes).
  • Different (project, branch) arguments yield different cached instances.
  • Unit tests run with mocked LiteModel so no network calls in CI.
  • CI workflow [MCP T2] CI workflow with FalkorDB service for MCP tests #649 green.

Dependencies

Out of scope (do NOT do in this PR)

  • The ask MCP tool itself (T11).
  • Prompt overrides (T10).
  • Refactoring api/llm.py's existing chat endpoint to use the new factory.
  • Real-LLM smoke test (Phase 1.5 nightly).

Notes for the implementer

  • After the rename, grep for _define_ontology repo-wide and update every caller. Internal-only function is the only constraint relaxed.
  • The cache key is (project_name, branch) — a tuple. Use dict[tuple[str, str], KnowledgeGraph].
  • KnowledgeGraph construction requires passing the FalkorDB connection details (host/port/user/pass) via env vars — the SDK reads them automatically from the same env vars api/llm.py uses, so no extra wiring needed. Verify by tracing how _create_kg_agent does it today.
  • The name parameter passed to KnowledgeGraph(name=...) should be code:{project}:{branch} to match T17.
  • T10 hasn't landed yet when this ticket is implemented — that's fine. Import prompts directly from api/prompts.py for now and leave a TODO(T10): swap in api.mcp.code_prompts comment. T10 will replace the import.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmcpMCP server (model context protocol) work

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions