Skip to content

feat(loops): add observe and substrate loop proofs#194

Merged
drewstone merged 11 commits into
mainfrom
feat/observe-closed-loop
Jun 8, 2026
Merged

feat(loops): add observe and substrate loop proofs#194
drewstone merged 11 commits into
mainfrom
feat/observe-closed-loop

Conversation

@drewstone

@drewstone drewstone commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add observe() as the trace-derived third-person watcher that turns worker behavior into findings, reports, and optional corpus facts
  • add the narrow gitWorkspace/Shell port for clone/commit/push durable workspace loops
  • add bench/src/observe-steer-workspace-loop.mts, the local substrate proof for: Supervisor/Scope -> coordination MCP tools -> git workspace -> observe() finding -> steer_worker/Scope.send -> corrective worker -> fresh-clone integration pass
  • keep loop authoring substrate-first: Scope, Supervisor, runLoop, validators, journals, MCP coordination, git workspace, and observe
  • remove the experimental defineLoop facade/protocol, its exports, docs, example, and tests
  • tighten MCP coordination back to substrate names: Question, QuestionDecision, QuestionPolicy, CoordinationEvent; no facade-era LoopQuestion types
  • split process rules out of agent bootloaders: AGENTS.md/CLAUDE.md now point to canonical docs/BUILDING.md and docs/ANTI_PATTERNS.md
  • update loop-writer, the facade postmortem, and the docs index with the proof command and the "do not relocate protocol and call it simplification" guardrail

Design Notes

The post-audit API stance is deliberate: loops are not a new runtime grammar. They are ordinary agent code over the existing substrate. observe() is the new load-bearing primitive; gitWorkspace is the durable workspace seam; MCP coordination is the sandbox binding for the same Scope verbs.

The decisive local join is now proven by:

pnpm exec tsx bench/src/observe-steer-workspace-loop.mts

That proof uses a mock ChatClient transport for the observer model call and local BYO worker executors so it is reproducible without cloud credentials. Honest remaining proof: run the same shape with openSandboxRun workers and a remote branch that a sandbox can clone and push.

Docs placement is now explicit: AGENTS.md and CLAUDE.md are bootloaders; durable build rules live in docs/BUILDING.md; named failure modes live in docs/ANTI_PATTERNS.md; evidence and postmortems stay under docs/research/* / memory.

Validation

  • pnpm lint
  • pnpm typecheck
  • pnpm test -- --runInBand (67 files / 678 tests)
  • pnpm exec vitest run tests/loops/workspace.test.ts tests/loops/coordination.test.ts --reporter=dot
  • pnpm exec vitest run tests/loops/workspace.test.ts --reporter=dot
  • pnpm exec tsx bench/src/observe-steer-workspace-loop.mts (INTEGRATION OK)
  • pnpm build
  • pnpm verify:package
  • python3 /home/drew/.codex/skills/.system/skill-creator/scripts/quick_validate.py skills/loop-writer
  • git diff --check
  • git fetch origin main && git merge-tree --write-tree origin/main HEAD

Notes

This PR remains draft. The next proof should be the cloud variant: openSandboxRun worker plus remote git branch, without adding a new loop facade first.

drewstone added 5 commits June 7, 2026 16:55
…loop

The connective tissue that turns a one-way driver→worker pipe into a feedback
loop. A worker can't see itself; observe() reads its TRACE and produces:
- findings + an operator report (what to fix — split agent vs operator), fed
  back DOWN as a steer and OUT to the operator
- durable corpus facts the NEXT run reads back (continuous self-improvement)

Findings are trace-derived, never judge-derived (derived_from_judge:false) —
the selector≠judge firewall. Harness-agnostic: reads a trace + output, so it
watches opencode/codex/hermes/BYO identically. Built on agent-eval's ChatClient
+ AnalystFinding; persists to the existing Corpus.

bench/src/fleet.mts: the whole vision end to end, runnable from a laptop —
a thin local driver fans out N workers to CLOUD sandboxes, observes each
trace, reports what to fix, banks learnings; run twice and the second run
injects the first's learnings into the workers. Proven live (opencode × 2
cloud workers): the observer caught a real inefficiency (unbatched bash calls)
and banked it.
The red-team flagged two FATAL design flaws: (1) parallel cloud workers share
no filesystem, so accumulating loops (migration) integrate to nothing; (2)
resume restores decisions, not the mutated workspace. This proves the fix —
a git-backed durable workspace — on 3 dependency-ordered modules:

- durable workspace = a bare git repo (models a remote branch); each worker is
  a FRESH clone (a fresh box's empty FS), torn down after commit+push.
- PROVEN (a): worker b's fresh clone finds a.py ON DISK (git carried it, not a
  string of names); c finds b.py; the integration test imports a<-b<-c, links.
- PROVEN (b): KILL after b, RESUME → a+b skipped (durable git has them), only c
  re-runs against a clone that already contains a+b's committed code, links.

Verdict shift: migration moves from 'cut from the pitch' to 'buildable on a
ctx.workspace handle'. The seam is git; the durable layer survives box teardown
by construction. Next: a cloud variant uses a GitHub branch as the workspace.
@drewstone drewstone changed the title feat(loops): add defineLoop authoring surface feat(loops): add observe and substrate loop proofs Jun 8, 2026
@drewstone drewstone marked this pull request as ready for review June 8, 2026 12:42

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved after local release-gate verification: typecheck, tests, build, lint, package export verification, and merge-tree against origin/main. Runtime hook surface is additive; delegate harness/model support is covered by tests.

@drewstone drewstone merged commit 9c371b8 into main Jun 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants