feat(loops): add observe and substrate loop proofs#194
Merged
Conversation
…loop The connective tissue that turns a one-way driver→worker pipe into a feedback loop. A worker can't see itself; observe() reads its TRACE and produces: - findings + an operator report (what to fix — split agent vs operator), fed back DOWN as a steer and OUT to the operator - durable corpus facts the NEXT run reads back (continuous self-improvement) Findings are trace-derived, never judge-derived (derived_from_judge:false) — the selector≠judge firewall. Harness-agnostic: reads a trace + output, so it watches opencode/codex/hermes/BYO identically. Built on agent-eval's ChatClient + AnalystFinding; persists to the existing Corpus. bench/src/fleet.mts: the whole vision end to end, runnable from a laptop — a thin local driver fans out N workers to CLOUD sandboxes, observes each trace, reports what to fix, banks learnings; run twice and the second run injects the first's learnings into the workers. Proven live (opencode × 2 cloud workers): the observer caught a real inefficiency (unbatched bash calls) and banked it.
The red-team flagged two FATAL design flaws: (1) parallel cloud workers share no filesystem, so accumulating loops (migration) integrate to nothing; (2) resume restores decisions, not the mutated workspace. This proves the fix — a git-backed durable workspace — on 3 dependency-ordered modules: - durable workspace = a bare git repo (models a remote branch); each worker is a FRESH clone (a fresh box's empty FS), torn down after commit+push. - PROVEN (a): worker b's fresh clone finds a.py ON DISK (git carried it, not a string of names); c finds b.py; the integration test imports a<-b<-c, links. - PROVEN (b): KILL after b, RESUME → a+b skipped (durable git has them), only c re-runs against a clone that already contains a+b's committed code, links. Verdict shift: migration moves from 'cut from the pitch' to 'buildable on a ctx.workspace handle'. The seam is git; the durable layer survives box teardown by construction. Next: a cloud variant uses a GitHub branch as the workspace.
tangletools
approved these changes
Jun 8, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
Approved after local release-gate verification: typecheck, tests, build, lint, package export verification, and merge-tree against origin/main. Runtime hook surface is additive; delegate harness/model support is covered by tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
observe()as the trace-derived third-person watcher that turns worker behavior into findings, reports, and optional corpus factsgitWorkspace/Shellport for clone/commit/push durable workspace loopsbench/src/observe-steer-workspace-loop.mts, the local substrate proof for: Supervisor/Scope -> coordination MCP tools -> git workspace ->observe()finding ->steer_worker/Scope.send-> corrective worker -> fresh-clone integration passScope, Supervisor,runLoop, validators, journals, MCP coordination, git workspace, andobservedefineLoopfacade/protocol, its exports, docs, example, and testsQuestion,QuestionDecision,QuestionPolicy,CoordinationEvent; no facade-eraLoopQuestiontypesAGENTS.md/CLAUDE.mdnow point to canonicaldocs/BUILDING.mdanddocs/ANTI_PATTERNS.mdloop-writer, the facade postmortem, and the docs index with the proof command and the "do not relocate protocol and call it simplification" guardrailDesign Notes
The post-audit API stance is deliberate: loops are not a new runtime grammar. They are ordinary agent code over the existing substrate.
observe()is the new load-bearing primitive;gitWorkspaceis the durable workspace seam; MCP coordination is the sandbox binding for the sameScopeverbs.The decisive local join is now proven by:
pnpm exec tsx bench/src/observe-steer-workspace-loop.mtsThat proof uses a mock
ChatClienttransport for the observer model call and local BYO worker executors so it is reproducible without cloud credentials. Honest remaining proof: run the same shape withopenSandboxRunworkers and a remote branch that a sandbox can clone and push.Docs placement is now explicit:
AGENTS.mdandCLAUDE.mdare bootloaders; durable build rules live indocs/BUILDING.md; named failure modes live indocs/ANTI_PATTERNS.md; evidence and postmortems stay underdocs/research/*/ memory.Validation
pnpm lintpnpm typecheckpnpm test -- --runInBand(67 files / 678 tests)pnpm exec vitest run tests/loops/workspace.test.ts tests/loops/coordination.test.ts --reporter=dotpnpm exec vitest run tests/loops/workspace.test.ts --reporter=dotpnpm exec tsx bench/src/observe-steer-workspace-loop.mts(INTEGRATION OK)pnpm buildpnpm verify:packagepython3 /home/drew/.codex/skills/.system/skill-creator/scripts/quick_validate.py skills/loop-writergit diff --checkgit fetch origin main && git merge-tree --write-tree origin/main HEADNotes
This PR remains draft. The next proof should be the cloud variant:
openSandboxRunworker plus remote git branch, without adding a new loop facade first.