terraphim · AlexMikhalev · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/.github/workflows/ci-main.yml b/.github/workflows/ci-main.yml
@@ -228,7 +228,9 @@ jobs:
         run: |
           # Run unit and integration tests (exclude integration-signing which requires zitsign CLI)
           # zlob feature required: fff-search build.rs panics when CI=true without zlob.
-          /home/alex/.local/bin/rch exec -- cargo test --release --target ${{ matrix.target }} --workspace --features "self_update/signatures,zlob"
+          # terraphim_automata/medical required: SHA-256 checksum verification tests for the
+          # deserialize_unchecked safety precondition (security finding #1313 P1-1).
+          /home/alex/.local/bin/rch exec -- cargo test --release --target ${{ matrix.target }} --workspace --features "self_update/signatures,zlob,terraphim_automata/medical"
 
       - name: sccache stats
         if: always()

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,6 +9,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- **Intra-doc link fixes** resolved broken rustdoc links and unclosed HTML tag warnings across `terraphim_orchestrator`, `terraphim_types`, `terraphim_tracker` — cargo doc now produces zero warnings on all core crates
+- **Unique tempdir** in `test_tool_index_save_and_load` to eliminate cross-run state pollution (Refs #1340)
+- **Module-level rustdoc** added to `terraphim_dsm` and `terraphim_github_runner_server` — the final two binary crates lacking a crate-level `//!` comment
+- **Module-level rustdoc** added to 17 previously undocumented crates: `terraphim_service`, `terraphim_settings`, `terraphim_agent`, `terraphim_file_search`, `terraphim_kg_linter`, `terraphim_ccusage`, `terraphim_usage`, `terraphim_build_args`, `terraphim_lsp`, `terraphim_automata_py`, `terraphim_rolegraph_py`, `terraphim-markdown-parser`, `haystack_core`, `haystack_atlassian`, `haystack_discourse`, `haystack_grepapp`, `haystack_jmap` — all workspace crates now have crate-level `//!` documentation
+- **Module-level rustdoc** added to five previously undocumented crates: `terraphim_persistence`, `terraphim_mcp_server`, `terraphim_config`, `terraphim_rolegraph`, `terraphim_middleware`
+- **`DeviceStorage` struct doc** explaining singleton pattern, operator ordering, and cache write-back target
+- **`TerraphimMcpError` enum doc** describing the four failure domains covered by MCP server errors
+- **Security checklist** shard checksum verification before `deserialize_unchecked` (Refs #1313)
+- **ADR-0001** Ollama trust boundary decision documented (Refs #1313, #1318)
+- **CI** `terraphim_automata` medical feature added to workspace test run (Refs #1313)
 - **Session debouncing** for `SessionConnector::watch()` to eliminate duplicate emissions (Refs #815)
 - **LLM pre/post hooks** wired in agent command handlers for multi-agent coordination (Refs #451)
 - **Self-Documentation API** exposed via robot CLI subcommand (Refs #1011)
@@ -30,7 +40,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Fixed
 
-- **RUSTSEC-2026-0049** eliminated by switching serenity to native-tls (Refs #418)
+- **World-readable sensitive config files** now emit tracing error/warn at load time via `warn_if_world_readable()` in orchestrator config and all `conf.d` include files (Refs #826)- **RUSTSEC-2026-0049** eliminated by switching serenity to native-tls (Refs #418)
 - **Spec gaps** addressed and resolved across ADF orchestrator templates (Refs #1040)
 - **Global concurrency limits** enforced in orchestrator to prevent task/memory exhaustion (Refs #664)
 - **listen_mode test assertion** updated to match clap error output (Refs #1044)
@@ -60,6 +70,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **Test ranking knowledge graph fixture** added for agent testing
 - **LLM cost tracking** foundation with genai fork integration (Refs #1075)
 - **Spec validation** report for 2026-04-29 documenting 3 fixed, 5 remaining gaps
+- **Spec validation** report for 2026-05-07 -- FAIL: 2 persistent gaps (single-agent-listener operational gap, meta-coordinator spec coverage gap); 0 new gaps; provider_probe.rs hardening assessed as in-scope for #1233
+- **Documentation gap report v2** generated for 2026-05-07 extending scan to include `pub fn` -- 395 total gaps (agent: 189, orchestrator: 61, types: 48, service: 39, automata: 23, persistence: 12, config: 11, middleware: 8, rolegraph: 4); service and persistence API entry points identified as critical
+- **Documentation gap report** generated for 2026-05-07 identifying 307 missing docs across 9 crates (agent: 139, types: 76, orchestrator: 54, automata: 23, config: 15) -- 45% reduction from 2026-04-29 baseline of 564
 - **Documentation gap report** generated for 2026-05-05 identifying 1,058 missing docs across 12 crates (orchestrator: 445, server: 138, service: 114, agent: 99, types: 98)
 - **GITEA_URL injection** from project config into agent spawn context for orchestrator
 - **Streaming output log drain** for reliable agent output capture (Refs #1219)

diff --git a/Cargo.lock b/Cargo.lock
diff --git a/crates/haystack_atlassian/src/lib.rs b/crates/haystack_atlassian/src/lib.rs
@@ -1,3 +1,7 @@
+//! Haystack integration for Atlassian products (Confluence, Jira).
+//!
+//! Implements [`HaystackProvider`] over the Confluence REST API, enabling
+//! full-text search of Confluence spaces as a Terraphim haystack source.
 use anyhow::Result;
 use haystack_core::HaystackProvider;
 use terraphim_types::{Document, SearchQuery};

diff --git a/crates/haystack_core/src/lib.rs b/crates/haystack_core/src/lib.rs
@@ -1,3 +1,8 @@
+//! Core abstraction for haystack search providers.
+//!
+//! Defines the [`HaystackProvider`] trait that all data-source integrations
+//! (Ripgrep, Atlassian, Discourse, JMAP, …) implement to expose a uniform
+//! async search interface over heterogeneous backends.
 use terraphim_types::{Document, SearchQuery};
 
 pub trait HaystackProvider {

diff --git a/crates/haystack_discourse/src/lib.rs b/crates/haystack_discourse/src/lib.rs
@@ -1,3 +1,7 @@
+//! Haystack integration for Discourse forums.
+//!
+//! Implements [`haystack_core::HaystackProvider`] over the Discourse search
+//! API, allowing forum topics and posts to be indexed as Terraphim documents.
 mod client;
 mod models;
 

diff --git a/crates/haystack_grepapp/src/lib.rs b/crates/haystack_grepapp/src/lib.rs
@@ -1,3 +1,7 @@
+//! Haystack integration for grep.app code-search.
+//!
+//! Implements [`HaystackProvider`] over the grep.app API, exposing public
+//! code-search results as Terraphim documents.
 use anyhow::Result;
 use haystack_core::HaystackProvider;
 use terraphim_types::{Document, SearchQuery};

diff --git a/crates/haystack_jmap/src/lib.rs b/crates/haystack_jmap/src/lib.rs
@@ -1,3 +1,7 @@
+//! Haystack integration for email via JMAP.
+//!
+//! Implements [`HaystackProvider`] over a JMAP mail server, allowing email
+//! messages and threads to be searched as Terraphim haystack documents.
 use anyhow::{Context, Result};
 use serde::{Deserialize, Serialize};
 use std::collections::HashMap;

diff --git a/crates/terraphim-markdown-parser/src/lib.rs b/crates/terraphim-markdown-parser/src/lib.rs
@@ -1,3 +1,7 @@
+//! Markdown parser for Terraphim knowledge-graph documents.
+//!
+//! Converts markdown files into typed [`terraphim_types::Document`] values,
+//! extracting titles, tags, and body text for indexing and search.
 use std::collections::HashSet;
 use std::ops::Range;
 use std::str::FromStr;

diff --git a/crates/terraphim_agent/src/lib.rs b/crates/terraphim_agent/src/lib.rs
@@ -1,3 +1,8 @@
+//! Terraphim agent library — TUI, robot mode, and multi-agent coordination.
+//!
+//! Bundles the interactive REPL, robot-mode JSON output, forgiving CLI parser,
+//! MCP tool index, onboarding workflows, and optional shared-learning store.
+//! Feature flags gate heavier subsystems: `server`, `repl`, `shared-learning`.
 #[cfg(feature = "server")]
 pub mod client;
 pub mod onboarding;

diff --git a/crates/terraphim_agent/src/mcp_tool_index.rs b/crates/terraphim_agent/src/mcp_tool_index.rs
@@ -280,7 +280,11 @@ mod tests {
     #[test]
     fn test_tool_index_save_and_load() {
         let temp_dir = std::env::temp_dir();
-        let index_path = temp_dir.join("test-mcp-index.json");
+        let unique = std::time::SystemTime::now()
+            .duration_since(std::time::UNIX_EPOCH)
+            .unwrap()
+            .subsec_nanos();
+        let index_path = temp_dir.join(format!("test-mcp-index-{unique}.json"));
 
         // Create and save
         {

diff --git a/crates/terraphim_automata/Cargo.toml b/crates/terraphim_automata/Cargo.toml
@@ -42,6 +42,7 @@ walkdir = "2.5"
 daachorse = { version = "1.0", optional = true }
 zstd = { version = "0.13", optional = true }
 anyhow = { workspace = true, optional = true }
+sha2 = { version = "0.10", optional = true }
 
 
 # WASM-specific dependencies
@@ -58,7 +59,7 @@ remote-loading = ["tokio", "reqwest"]
 tokio-runtime = ["tokio"]
 typescript = ["tsify", "dep:wasm-bindgen"]
 wasm = ["typescript", "dep:wasm-bindgen", "dep:wasm-bindgen-futures"]
-medical = ["daachorse", "zstd", "anyhow"]
+medical = ["daachorse", "zstd", "anyhow", "sha2"]
 
 [dev-dependencies]
 criterion = "0.8"

diff --git a/crates/terraphim_automata/src/medical_artifact.rs b/crates/terraphim_automata/src/medical_artifact.rs
@@ -8,6 +8,12 @@
 //!   [header_bytes: bincode(ArtifactHeader)]
 //!   for each shard in header.shard_byte_lengths:
 //!     [shard_bytes: raw daachorse bytes]
+//!
+//! Integrity: `ArtifactHeader.shard_checksums` holds one SHA-256 digest per
+//! shard. `load_umls_artifact` verifies every shard before returning it, so
+//! callers of `deserialize_unchecked` can rely on byte provenance.
+
+use sha2::{Digest, Sha256};
 
 use serde::{Deserialize, Serialize};
 use std::collections::HashMap;
@@ -38,9 +44,15 @@ pub struct ArtifactHeader {
     pub total_patterns: usize,
     /// Raw byte length of each daachorse shard (order matches shard_metadata)
     pub shard_byte_lengths: Vec<usize>,
+    /// SHA-256 digest of each shard's raw bytes; verified before
+    /// `deserialize_unchecked` is called on the bytes.
+    pub shard_checksums: Vec<[u8; 32]>,
 }
 
-/// Save a UMLS artifact: header (bincode) + shard bytes, compressed with zstd
+/// Save a UMLS artifact: header (bincode) + shard bytes, compressed with zstd.
+///
+/// Computes SHA-256 of each shard and stores digests in the header so that
+/// `load_umls_artifact` can verify integrity before any unsafe deserialization.
 pub fn save_umls_artifact(
     header: &ArtifactHeader,
     shard_bytes: &[Vec<u8>],
@@ -51,6 +63,11 @@ pub fn save_umls_artifact(
         shard_bytes.len(),
         "shard_byte_lengths must match shard_bytes count"
     );
+    assert_eq!(
+        header.shard_checksums.len(),
+        shard_bytes.len(),
+        "shard_checksums must match shard_bytes count"
+    );
 
     // Encode header with bincode
     let header_encoded = bincode::serialize(header)?;
@@ -107,10 +124,27 @@ pub fn load_umls_artifact(path: &Path) -> anyhow::Result<(ArtifactHeader, Vec<Ve
     // Deserialize header
     let header: ArtifactHeader = bincode::deserialize(&raw[8..8 + header_len])?;
 
-    // Read each shard's raw bytes
+    // Validate checksum count matches shard count
+    if header.shard_checksums.len() != header.shard_byte_lengths.len() {
+        anyhow::bail!(
+            "Artifact corrupt: {} checksums for {} shards",
+            header.shard_checksums.len(),
+            header.shard_byte_lengths.len()
+        );
+    }
+
+    // Read each shard's raw bytes and verify SHA-256 integrity before returning.
+    // This establishes the safety precondition for the caller's
+    // `deserialize_unchecked`: bytes that pass verification were produced by
+    // `serialize()` on the same machine and have not been tampered with.
     let mut offset = 8 + header_len;
     let mut shard_bytes = Vec::with_capacity(header.shard_byte_lengths.len());
-    for (i, &shard_len) in header.shard_byte_lengths.iter().enumerate() {
+    for (i, (&shard_len, expected_checksum)) in header
+        .shard_byte_lengths
+        .iter()
+        .zip(header.shard_checksums.iter())
+        .enumerate()
+    {
         if offset + shard_len > raw.len() {
             anyhow::bail!(
                 "Shard {} truncated: expected {} bytes at offset {}, have {}",
@@ -120,7 +154,15 @@ pub fn load_umls_artifact(path: &Path) -> anyhow::Result<(ArtifactHeader, Vec<Ve
                 raw.len() - offset
             );
         }
-        shard_bytes.push(raw[offset..offset + shard_len].to_vec());
+        let shard_slice = &raw[offset..offset + shard_len];
+        let actual_checksum: [u8; 32] = Sha256::digest(shard_slice).into();
+        if &actual_checksum != expected_checksum {
+            anyhow::bail!(
+                "Shard {} checksum mismatch: artifact may be corrupt or tampered with",
+                i
+            );
+        }
+        shard_bytes.push(shard_slice.to_vec());
         offset += shard_len;
     }
 
@@ -144,7 +186,7 @@ mod tests {
     use super::*;
     use tempfile::tempdir;
 
-    fn make_test_header() -> ArtifactHeader {
+    fn make_test_header(shard_bytes: &[Vec<u8>]) -> ArtifactHeader {
         ArtifactHeader {
             shard_metadata: vec![
                 vec![
@@ -175,7 +217,11 @@ mod tests {
                 m
             },
             total_patterns: 3,
-            shard_byte_lengths: vec![10, 8],
+            shard_byte_lengths: shard_bytes.iter().map(|b| b.len()).collect(),
+            shard_checksums: shard_bytes
+                .iter()
+                .map(|b| Sha256::digest(b).into())
+                .collect(),
         }
     }
 
@@ -184,8 +230,8 @@ mod tests {
         let dir = tempdir().unwrap();
         let path = dir.path().join("umls.bin.zst");
 
-        let header = make_test_header();
         let shard_bytes = vec![vec![1u8; 10], vec![2u8; 8]];
+        let header = make_test_header(&shard_bytes);
 
         save_umls_artifact(&header, &shard_bytes, &path).unwrap();
         assert!(path.exists());
@@ -199,14 +245,39 @@ mod tests {
         assert!(loaded_header.concept_index.contains_key("C0000001"));
     }
 
+    #[test]
+    fn test_artifact_checksum_mismatch_rejected() {
+        let dir = tempdir().unwrap();
+        let path = dir.path().join("tampered.bin.zst");
+
+        let shard_bytes = vec![vec![1u8; 10], vec![2u8; 8]];
+        let header = make_test_header(&shard_bytes);
+        save_umls_artifact(&header, &shard_bytes, &path).unwrap();
+
+        // Load, tamper with shard bytes in the decompressed payload, recompress
+        let compressed = std::fs::read(&path).unwrap();
+        let mut raw = zstd::decode_all(&compressed[..]).unwrap();
+        // Flip one byte in the first shard (after header)
+        let header_len = u64::from_le_bytes(raw[..8].try_into().unwrap()) as usize;
+        raw[8 + header_len] ^= 0xFF;
+        let recompressed = zstd::encode_all(&raw[..], 3).unwrap();
+        std::fs::write(&path, recompressed).unwrap();
+
+        let result = load_umls_artifact(&path);
+        assert!(result.is_err(), "tampered artifact must be rejected");
+        let msg = result.err().unwrap().to_string();
+        assert!(msg.contains("checksum mismatch"), "error: {}", msg);
+    }
+
     #[test]
     fn test_artifact_exists() {
         let dir = tempdir().unwrap();
         let path = dir.path().join("test.bin.zst");
         assert!(!artifact_exists(&path));
 
-        let header = make_test_header();
-        save_umls_artifact(&header, &[vec![0u8; 10], vec![0u8; 8]], &path).unwrap();
+        let shard_bytes = vec![vec![0u8; 10], vec![0u8; 8]];
+        let header = make_test_header(&shard_bytes);
+        save_umls_artifact(&header, &shard_bytes, &path).unwrap();
         assert!(artifact_exists(&path));
     }
 }
diff --git a/crates/terraphim_automata/src/sharded_extractor.rs b/crates/terraphim_automata/src/sharded_extractor.rs
@@ -8,6 +8,7 @@
 //! that takes <100ms vs ~842s build time from raw TSV.
 
 use daachorse::DoubleArrayAhoCorasick;
+use sha2::{Digest, Sha256};
 use std::collections::HashMap;
 
 use crate::medical_artifact::{
@@ -186,11 +187,17 @@ impl ShardedUmlsExtractor {
             })
             .collect();
 
+        let shard_checksums: Vec<[u8; 32]> = shard_bytes
+            .iter()
+            .map(|b| Sha256::digest(b).into())
+            .collect();
+
         let header = ArtifactHeader {
             shard_metadata,
             concept_index: self.concept_index.clone(),
             total_patterns: self.total_patterns,
             shard_byte_lengths: shard_bytes.iter().map(|b: &Vec<u8>| b.len()).collect(),
+            shard_checksums,
         };
 
         save_umls_artifact(&header, &shard_bytes, path)

diff --git a/crates/terraphim_automata_py/src/lib.rs b/crates/terraphim_automata_py/src/lib.rs
@@ -1,3 +1,8 @@
+//! Python bindings for `terraphim_automata` via PyO3.
+//!
+//! Exposes autocomplete, fuzzy search, and Aho-Corasick text-matching
+//! functions to Python, enabling use of Terraphim's automata engine
+//! from Python scripts and notebooks.
 use ::terraphim_automata::autocomplete::{
     AutocompleteConfig, AutocompleteIndex, AutocompleteResult, autocomplete_search,
     build_autocomplete_index, deserialize_autocomplete_index, fuzzy_autocomplete_search,

diff --git a/crates/terraphim_build_args/src/lib.rs b/crates/terraphim_build_args/src/lib.rs
@@ -1,3 +1,7 @@
+//! Build argument management for Terraphim AI.
+//!
+//! Centralises configuration of build features, targets, and deployment
+//! options so they can be shared across binaries and integration scripts.
 pub mod cli;
 /// Terraphim Build Argument Management
 ///

diff --git a/crates/terraphim_ccusage/src/lib.rs b/crates/terraphim_ccusage/src/lib.rs
@@ -1,3 +1,8 @@
+//! Claude Code usage tracking and cost reporting for Terraphim AI.
+//!
+//! Parses Claude Code session JSONL files, aggregates token counts and
+//! costs by project and session, and formats reports for the terminal or
+//! robot-mode JSON output.
 use serde::{Deserialize, Serialize};
 use std::collections::HashMap;
 use std::path::PathBuf;

diff --git a/crates/terraphim_config/src/lib.rs b/crates/terraphim_config/src/lib.rs
@@ -1,3 +1,21 @@
+//! Configuration management for Terraphim AI.
+//!
+//! Provides role-based configuration where each [`Role`] describes a user profile with
+//! a set of [`Haystack`]s (data sources), a relevance function, and optional LLM settings.
+//!
+//! # Loading Priority
+//!
+//! 1. Explicit path via `TERRAPHIM_CONFIG` environment variable
+//! 2. Saved config retrieved from the persistence layer
+//! 3. Hard-coded defaults in `terraphim_server/default/`
+//!
+//! # Key Types
+//!
+//! - [`Config`] -- top-level configuration holding all roles
+//! - [`Role`] -- user profile with haystacks, relevance function, and theme
+//! - [`Haystack`] -- a data source descriptor (path, service type, extra parameters)
+//! - [`ServiceType`] -- enum of supported haystack backends
+
 use std::{path::PathBuf, sync::Arc};
 
 use terraphim_automata::{

diff --git a/crates/terraphim_dsm/src/main.rs b/crates/terraphim_dsm/src/main.rs
@@ -1,3 +1,5 @@
+//! CLI tool that groups Rust module paths into semantic categories using the Terraphim knowledge graph.
+
 mod knowledge;
 mod models;
 

diff --git a/crates/terraphim_file_search/src/kg_scorer.rs b/crates/terraphim_file_search/src/kg_scorer.rs
@@ -10,7 +10,7 @@ use fff_search::types::FileItem;
 /// Scores files by counting knowledge-graph concept matches in their path.
 ///
 /// Implements [`ExternalScorer`] so it can be plugged directly into a
-/// `fff-search` [`ScoringContext`].  The scorer reads the file's
+/// `fff-search` `ScoringContext`.  The scorer reads the file's
 /// `relative_path`, runs it through the Aho-Corasick automata built from
 /// the thesaurus, and returns `min(unique_matches * weight_per_term,
 /// max_boost)`.