Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/ci-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,9 @@ jobs:
run: |
# Run unit and integration tests (exclude integration-signing which requires zitsign CLI)
# zlob feature required: fff-search build.rs panics when CI=true without zlob.
/home/alex/.local/bin/rch exec -- cargo test --release --target ${{ matrix.target }} --workspace --features "self_update/signatures,zlob"
# terraphim_automata/medical required: SHA-256 checksum verification tests for the
# deserialize_unchecked safety precondition (security finding #1313 P1-1).
/home/alex/.local/bin/rch exec -- cargo test --release --target ${{ matrix.target }} --workspace --features "self_update/signatures,zlob,terraphim_automata/medical"

- name: sccache stats
if: always()
Expand Down
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- **Intra-doc link fixes** resolved broken rustdoc links and unclosed HTML tag warnings across `terraphim_orchestrator`, `terraphim_types`, `terraphim_tracker` — cargo doc now produces zero warnings on all core crates
- **Unique tempdir** in `test_tool_index_save_and_load` to eliminate cross-run state pollution (Refs #1340)
- **Module-level rustdoc** added to `terraphim_dsm` and `terraphim_github_runner_server` — the final two binary crates lacking a crate-level `//!` comment
- **Module-level rustdoc** added to 17 previously undocumented crates: `terraphim_service`, `terraphim_settings`, `terraphim_agent`, `terraphim_file_search`, `terraphim_kg_linter`, `terraphim_ccusage`, `terraphim_usage`, `terraphim_build_args`, `terraphim_lsp`, `terraphim_automata_py`, `terraphim_rolegraph_py`, `terraphim-markdown-parser`, `haystack_core`, `haystack_atlassian`, `haystack_discourse`, `haystack_grepapp`, `haystack_jmap` — all workspace crates now have crate-level `//!` documentation
- **Module-level rustdoc** added to five previously undocumented crates: `terraphim_persistence`, `terraphim_mcp_server`, `terraphim_config`, `terraphim_rolegraph`, `terraphim_middleware`
- **`DeviceStorage` struct doc** explaining singleton pattern, operator ordering, and cache write-back target
- **`TerraphimMcpError` enum doc** describing the four failure domains covered by MCP server errors
- **Security checklist** shard checksum verification before `deserialize_unchecked` (Refs #1313)
- **ADR-0001** Ollama trust boundary decision documented (Refs #1313, #1318)
- **CI** `terraphim_automata` medical feature added to workspace test run (Refs #1313)
- **Session debouncing** for `SessionConnector::watch()` to eliminate duplicate emissions (Refs #815)
- **LLM pre/post hooks** wired in agent command handlers for multi-agent coordination (Refs #451)
- **Self-Documentation API** exposed via robot CLI subcommand (Refs #1011)
Expand All @@ -30,7 +40,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed

- **RUSTSEC-2026-0049** eliminated by switching serenity to native-tls (Refs #418)
- **World-readable sensitive config files** now emit tracing error/warn at load time via `warn_if_world_readable()` in orchestrator config and all `conf.d` include files (Refs #826)- **RUSTSEC-2026-0049** eliminated by switching serenity to native-tls (Refs #418)
- **Spec gaps** addressed and resolved across ADF orchestrator templates (Refs #1040)
- **Global concurrency limits** enforced in orchestrator to prevent task/memory exhaustion (Refs #664)
- **listen_mode test assertion** updated to match clap error output (Refs #1044)
Expand Down Expand Up @@ -60,6 +70,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Test ranking knowledge graph fixture** added for agent testing
- **LLM cost tracking** foundation with genai fork integration (Refs #1075)
- **Spec validation** report for 2026-04-29 documenting 3 fixed, 5 remaining gaps
- **Spec validation** report for 2026-05-07 -- FAIL: 2 persistent gaps (single-agent-listener operational gap, meta-coordinator spec coverage gap); 0 new gaps; provider_probe.rs hardening assessed as in-scope for #1233
- **Documentation gap report v2** generated for 2026-05-07 extending scan to include `pub fn` -- 395 total gaps (agent: 189, orchestrator: 61, types: 48, service: 39, automata: 23, persistence: 12, config: 11, middleware: 8, rolegraph: 4); service and persistence API entry points identified as critical
- **Documentation gap report** generated for 2026-05-07 identifying 307 missing docs across 9 crates (agent: 139, types: 76, orchestrator: 54, automata: 23, config: 15) -- 45% reduction from 2026-04-29 baseline of 564
- **Documentation gap report** generated for 2026-05-05 identifying 1,058 missing docs across 12 crates (orchestrator: 445, server: 138, service: 114, agent: 99, types: 98)
- **GITEA_URL injection** from project config into agent spawn context for orchestrator
- **Streaming output log drain** for reliable agent output capture (Refs #1219)
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions crates/haystack_atlassian/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! Haystack integration for Atlassian products (Confluence, Jira).
//!
//! Implements [`HaystackProvider`] over the Confluence REST API, enabling
//! full-text search of Confluence spaces as a Terraphim haystack source.
use anyhow::Result;
use haystack_core::HaystackProvider;
use terraphim_types::{Document, SearchQuery};
Expand Down
5 changes: 5 additions & 0 deletions crates/haystack_core/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Core abstraction for haystack search providers.
//!
//! Defines the [`HaystackProvider`] trait that all data-source integrations
//! (Ripgrep, Atlassian, Discourse, JMAP, …) implement to expose a uniform
//! async search interface over heterogeneous backends.
use terraphim_types::{Document, SearchQuery};

pub trait HaystackProvider {
Expand Down
4 changes: 4 additions & 0 deletions crates/haystack_discourse/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! Haystack integration for Discourse forums.
//!
//! Implements [`haystack_core::HaystackProvider`] over the Discourse search
//! API, allowing forum topics and posts to be indexed as Terraphim documents.
mod client;
mod models;

Expand Down
4 changes: 4 additions & 0 deletions crates/haystack_grepapp/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! Haystack integration for grep.app code-search.
//!
//! Implements [`HaystackProvider`] over the grep.app API, exposing public
//! code-search results as Terraphim documents.
use anyhow::Result;
use haystack_core::HaystackProvider;
use terraphim_types::{Document, SearchQuery};
Expand Down
4 changes: 4 additions & 0 deletions crates/haystack_jmap/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! Haystack integration for email via JMAP.
//!
//! Implements [`HaystackProvider`] over a JMAP mail server, allowing email
//! messages and threads to be searched as Terraphim haystack documents.
use anyhow::{Context, Result};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
Expand Down
4 changes: 4 additions & 0 deletions crates/terraphim-markdown-parser/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! Markdown parser for Terraphim knowledge-graph documents.
//!
//! Converts markdown files into typed [`terraphim_types::Document`] values,
//! extracting titles, tags, and body text for indexing and search.
use std::collections::HashSet;
use std::ops::Range;
use std::str::FromStr;
Expand Down
5 changes: 5 additions & 0 deletions crates/terraphim_agent/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Terraphim agent library — TUI, robot mode, and multi-agent coordination.
//!
//! Bundles the interactive REPL, robot-mode JSON output, forgiving CLI parser,
//! MCP tool index, onboarding workflows, and optional shared-learning store.
//! Feature flags gate heavier subsystems: `server`, `repl`, `shared-learning`.
#[cfg(feature = "server")]
pub mod client;
pub mod onboarding;
Expand Down
6 changes: 5 additions & 1 deletion crates/terraphim_agent/src/mcp_tool_index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,11 @@ mod tests {
#[test]
fn test_tool_index_save_and_load() {
let temp_dir = std::env::temp_dir();
let index_path = temp_dir.join("test-mcp-index.json");
let unique = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.subsec_nanos();
let index_path = temp_dir.join(format!("test-mcp-index-{unique}.json"));

// Create and save
{
Expand Down
3 changes: 2 additions & 1 deletion crates/terraphim_automata/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ walkdir = "2.5"
daachorse = { version = "1.0", optional = true }
zstd = { version = "0.13", optional = true }
anyhow = { workspace = true, optional = true }
sha2 = { version = "0.10", optional = true }


# WASM-specific dependencies
Expand All @@ -58,7 +59,7 @@ remote-loading = ["tokio", "reqwest"]
tokio-runtime = ["tokio"]
typescript = ["tsify", "dep:wasm-bindgen"]
wasm = ["typescript", "dep:wasm-bindgen", "dep:wasm-bindgen-futures"]
medical = ["daachorse", "zstd", "anyhow"]
medical = ["daachorse", "zstd", "anyhow", "sha2"]

[dev-dependencies]
criterion = "0.8"
Expand Down
89 changes: 80 additions & 9 deletions crates/terraphim_automata/src/medical_artifact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@
//! [header_bytes: bincode(ArtifactHeader)]
//! for each shard in header.shard_byte_lengths:
//! [shard_bytes: raw daachorse bytes]
//!
//! Integrity: `ArtifactHeader.shard_checksums` holds one SHA-256 digest per
//! shard. `load_umls_artifact` verifies every shard before returning it, so
//! callers of `deserialize_unchecked` can rely on byte provenance.

use sha2::{Digest, Sha256};

use serde::{Deserialize, Serialize};
use std::collections::HashMap;
Expand Down Expand Up @@ -38,9 +44,15 @@ pub struct ArtifactHeader {
pub total_patterns: usize,
/// Raw byte length of each daachorse shard (order matches shard_metadata)
pub shard_byte_lengths: Vec<usize>,
/// SHA-256 digest of each shard's raw bytes; verified before
/// `deserialize_unchecked` is called on the bytes.
pub shard_checksums: Vec<[u8; 32]>,
}

/// Save a UMLS artifact: header (bincode) + shard bytes, compressed with zstd
/// Save a UMLS artifact: header (bincode) + shard bytes, compressed with zstd.
///
/// Computes SHA-256 of each shard and stores digests in the header so that
/// `load_umls_artifact` can verify integrity before any unsafe deserialization.
pub fn save_umls_artifact(
header: &ArtifactHeader,
shard_bytes: &[Vec<u8>],
Expand All @@ -51,6 +63,11 @@ pub fn save_umls_artifact(
shard_bytes.len(),
"shard_byte_lengths must match shard_bytes count"
);
assert_eq!(
header.shard_checksums.len(),
shard_bytes.len(),
"shard_checksums must match shard_bytes count"
);

// Encode header with bincode
let header_encoded = bincode::serialize(header)?;
Expand Down Expand Up @@ -107,10 +124,27 @@ pub fn load_umls_artifact(path: &Path) -> anyhow::Result<(ArtifactHeader, Vec<Ve
// Deserialize header
let header: ArtifactHeader = bincode::deserialize(&raw[8..8 + header_len])?;

// Read each shard's raw bytes
// Validate checksum count matches shard count
if header.shard_checksums.len() != header.shard_byte_lengths.len() {
anyhow::bail!(
"Artifact corrupt: {} checksums for {} shards",
header.shard_checksums.len(),
header.shard_byte_lengths.len()
);
}

// Read each shard's raw bytes and verify SHA-256 integrity before returning.
// This establishes the safety precondition for the caller's
// `deserialize_unchecked`: bytes that pass verification were produced by
// `serialize()` on the same machine and have not been tampered with.
let mut offset = 8 + header_len;
let mut shard_bytes = Vec::with_capacity(header.shard_byte_lengths.len());
for (i, &shard_len) in header.shard_byte_lengths.iter().enumerate() {
for (i, (&shard_len, expected_checksum)) in header
.shard_byte_lengths
.iter()
.zip(header.shard_checksums.iter())
.enumerate()
{
if offset + shard_len > raw.len() {
anyhow::bail!(
"Shard {} truncated: expected {} bytes at offset {}, have {}",
Expand All @@ -120,7 +154,15 @@ pub fn load_umls_artifact(path: &Path) -> anyhow::Result<(ArtifactHeader, Vec<Ve
raw.len() - offset
);
}
shard_bytes.push(raw[offset..offset + shard_len].to_vec());
let shard_slice = &raw[offset..offset + shard_len];
let actual_checksum: [u8; 32] = Sha256::digest(shard_slice).into();
if &actual_checksum != expected_checksum {
anyhow::bail!(
"Shard {} checksum mismatch: artifact may be corrupt or tampered with",
i
);
}
shard_bytes.push(shard_slice.to_vec());
offset += shard_len;
}

Expand All @@ -144,7 +186,7 @@ mod tests {
use super::*;
use tempfile::tempdir;

fn make_test_header() -> ArtifactHeader {
fn make_test_header(shard_bytes: &[Vec<u8>]) -> ArtifactHeader {
ArtifactHeader {
shard_metadata: vec![
vec![
Expand Down Expand Up @@ -175,7 +217,11 @@ mod tests {
m
},
total_patterns: 3,
shard_byte_lengths: vec![10, 8],
shard_byte_lengths: shard_bytes.iter().map(|b| b.len()).collect(),
shard_checksums: shard_bytes
.iter()
.map(|b| Sha256::digest(b).into())
.collect(),
}
}

Expand All @@ -184,8 +230,8 @@ mod tests {
let dir = tempdir().unwrap();
let path = dir.path().join("umls.bin.zst");

let header = make_test_header();
let shard_bytes = vec![vec![1u8; 10], vec![2u8; 8]];
let header = make_test_header(&shard_bytes);

save_umls_artifact(&header, &shard_bytes, &path).unwrap();
assert!(path.exists());
Expand All @@ -199,14 +245,39 @@ mod tests {
assert!(loaded_header.concept_index.contains_key("C0000001"));
}

#[test]
fn test_artifact_checksum_mismatch_rejected() {
let dir = tempdir().unwrap();
let path = dir.path().join("tampered.bin.zst");

let shard_bytes = vec![vec![1u8; 10], vec![2u8; 8]];
let header = make_test_header(&shard_bytes);
save_umls_artifact(&header, &shard_bytes, &path).unwrap();

// Load, tamper with shard bytes in the decompressed payload, recompress
let compressed = std::fs::read(&path).unwrap();
let mut raw = zstd::decode_all(&compressed[..]).unwrap();
// Flip one byte in the first shard (after header)
let header_len = u64::from_le_bytes(raw[..8].try_into().unwrap()) as usize;
raw[8 + header_len] ^= 0xFF;
let recompressed = zstd::encode_all(&raw[..], 3).unwrap();
std::fs::write(&path, recompressed).unwrap();

let result = load_umls_artifact(&path);
assert!(result.is_err(), "tampered artifact must be rejected");
let msg = result.err().unwrap().to_string();
assert!(msg.contains("checksum mismatch"), "error: {}", msg);
}

#[test]
fn test_artifact_exists() {
let dir = tempdir().unwrap();
let path = dir.path().join("test.bin.zst");
assert!(!artifact_exists(&path));

let header = make_test_header();
save_umls_artifact(&header, &[vec![0u8; 10], vec![0u8; 8]], &path).unwrap();
let shard_bytes = vec![vec![0u8; 10], vec![0u8; 8]];
let header = make_test_header(&shard_bytes);
save_umls_artifact(&header, &shard_bytes, &path).unwrap();
assert!(artifact_exists(&path));
}
}
7 changes: 7 additions & 0 deletions crates/terraphim_automata/src/sharded_extractor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
//! that takes <100ms vs ~842s build time from raw TSV.

use daachorse::DoubleArrayAhoCorasick;
use sha2::{Digest, Sha256};
use std::collections::HashMap;

use crate::medical_artifact::{
Expand Down Expand Up @@ -186,11 +187,17 @@ impl ShardedUmlsExtractor {
})
.collect();

let shard_checksums: Vec<[u8; 32]> = shard_bytes
.iter()
.map(|b| Sha256::digest(b).into())
.collect();

let header = ArtifactHeader {
shard_metadata,
concept_index: self.concept_index.clone(),
total_patterns: self.total_patterns,
shard_byte_lengths: shard_bytes.iter().map(|b: &Vec<u8>| b.len()).collect(),
shard_checksums,
};

save_umls_artifact(&header, &shard_bytes, path)
Expand Down
5 changes: 5 additions & 0 deletions crates/terraphim_automata_py/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Python bindings for `terraphim_automata` via PyO3.
//!
//! Exposes autocomplete, fuzzy search, and Aho-Corasick text-matching
//! functions to Python, enabling use of Terraphim's automata engine
//! from Python scripts and notebooks.
use ::terraphim_automata::autocomplete::{
AutocompleteConfig, AutocompleteIndex, AutocompleteResult, autocomplete_search,
build_autocomplete_index, deserialize_autocomplete_index, fuzzy_autocomplete_search,
Expand Down
4 changes: 4 additions & 0 deletions crates/terraphim_build_args/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! Build argument management for Terraphim AI.
//!
//! Centralises configuration of build features, targets, and deployment
//! options so they can be shared across binaries and integration scripts.
pub mod cli;
/// Terraphim Build Argument Management
///
Expand Down
5 changes: 5 additions & 0 deletions crates/terraphim_ccusage/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Claude Code usage tracking and cost reporting for Terraphim AI.
//!
//! Parses Claude Code session JSONL files, aggregates token counts and
//! costs by project and session, and formats reports for the terminal or
//! robot-mode JSON output.
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::path::PathBuf;
Expand Down
18 changes: 18 additions & 0 deletions crates/terraphim_config/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
//! Configuration management for Terraphim AI.
//!
//! Provides role-based configuration where each [`Role`] describes a user profile with
//! a set of [`Haystack`]s (data sources), a relevance function, and optional LLM settings.
//!
//! # Loading Priority
//!
//! 1. Explicit path via `TERRAPHIM_CONFIG` environment variable
//! 2. Saved config retrieved from the persistence layer
//! 3. Hard-coded defaults in `terraphim_server/default/`
//!
//! # Key Types
//!
//! - [`Config`] -- top-level configuration holding all roles
//! - [`Role`] -- user profile with haystacks, relevance function, and theme
//! - [`Haystack`] -- a data source descriptor (path, service type, extra parameters)
//! - [`ServiceType`] -- enum of supported haystack backends

use std::{path::PathBuf, sync::Arc};

use terraphim_automata::{
Expand Down
2 changes: 2 additions & 0 deletions crates/terraphim_dsm/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
//! CLI tool that groups Rust module paths into semantic categories using the Terraphim knowledge graph.

mod knowledge;
mod models;

Expand Down
2 changes: 1 addition & 1 deletion crates/terraphim_file_search/src/kg_scorer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ use fff_search::types::FileItem;
/// Scores files by counting knowledge-graph concept matches in their path.
///
/// Implements [`ExternalScorer`] so it can be plugged directly into a
/// `fff-search` [`ScoringContext`]. The scorer reads the file's
/// `fff-search` `ScoringContext`. The scorer reads the file's
/// `relative_path`, runs it through the Aho-Corasick automata built from
/// the thesaurus, and returns `min(unique_matches * weight_per_term,
/// max_boost)`.
Expand Down
Loading
Loading