Skip to content

refactor(bugfix): convert commands to skills, add CLAUDE.md, clean up#42

Open
jwm4 wants to merge 28 commits intoambient-code:mainfrom
jwm4:feature/bugfix-workflow-skills-refactor
Open

refactor(bugfix): convert commands to skills, add CLAUDE.md, clean up#42
jwm4 wants to merge 28 commits intoambient-code:mainfrom
jwm4:feature/bugfix-workflow-skills-refactor

Conversation

@jwm4
Copy link
Contributor

@jwm4 jwm4 commented Feb 10, 2026

Bugfix workflow: skill-based architecture with workflow controller

Problem

The bugfix workflow had all process logic inline in command files. This meant users only got the workflow's carefully designed process if they typed the exact slash command — saying "fix this bug" in natural language would bypass all of it. Phase transitions were unreliable, the agent would auto-advance without pausing, and the PR creation process failed in predictable ways that weren't handled.

Changes

Architecture: commands → skills → controller

Refactored from a command-heavy design to a three-layer architecture:

  • Controller skill (.claude/skills/controller/SKILL.md) — owns all phase transitions, recommendations, and flow logic. The single source of truth for "what happens next."
  • Phase skills (.claude/skills/{name}/SKILL.md) — contain the full multi-step process for each phase. Report findings only; never recommend next steps.
  • Command wrappers (.claude/commands/*.md) — thin pointers (~5 lines each) that route to the controller.

This means users get the workflow's benefit whether they type /fix or "fix this bug."

New phases:

  • /assess — initial phase that reads the bug report, summarizes understanding, identifies gaps, and proposes a plan before jumping into reproduction
  • /review — optional phase after /test that critically evaluates the fix and tests, looking for gaps and missed edge cases
  • /pr — systematic, failure-resistant pull request creation with fork workflows, bot identity handling, and a 4-rung fallback ladder

New files:

  • workflows/bugfix/CLAUDE.md — behavioral guidelines (principles, hard limits, safety, quality, escalation)
  • docs/adr/2026-02-12-bugfix-lessons-learned.md — 16 lessons learned about prompt engineering for multi-phase AI workflows

Deleted files:

  • .claude/agents/amber.md — redundant with systemPrompt; useful content moved to CLAUDE.md
  • AMBER_RESEARCH.md, FIELD_REFERENCE.md, TEMPLATE_FEEDBACK.md — supplementary docs not used by the workflow
  • .ambient/ambient.clean.json — unused duplicate

Simplified systemPrompt:

Reduced from ~100 lines to ~10 lines. The systemPrompt now just identifies the agent, points at the controller, and lists the workspace layout. All behavioral rules live in CLAUDE.md; all process logic lives in skills.

Key design decisions

  • Skills trigger controller re-reads. Each skill ends with "Then re-read the controller for next-step guidance." This is the mechanism that prevents the agent from getting stuck in the previous skill's context after a phase completes.
  • Controller owns all recommendations. Skills report findings; the controller decides what to suggest next. This prevents skills from recommending skipping phases (e.g., fix → PR, skipping test and review).
  • Flexible recommendations over rigid routing. The controller has guidance for skipping forward, going back, and ending early — not a fixed phase-to-phase table.
  • PR skill handles bot identity. When running as a GitHub App, the skill resolves the real user via /installation/repositories, uses the correct jq fields for fork detection, and treats gh pr create 403 as an expected outcome (provides a GitHub compare URL) rather than an error to debug.

Testing

Iteratively tested on the Ambient Code Platform across multiple sessions. The ADR documents all issues encountered and how they were resolved.

jwm4 added 28 commits February 10, 2026 15:36
- Convert all 5 commands (reproduce, diagnose, fix, test, document) into
  detailed skills in .claude/skills/{name}/SKILL.md
- Rewrite commands as thin wrappers that delegate to skills, passing
  arguments and session context
- Add CLAUDE.md with behavioral guidelines extracted from amber.md
  (engineering discipline, safety guardrails, quality standards)
- Remove .claude/agents/amber.md (redundant with systemPrompt persona)
- Remove supplementary files (AMBER_RESEARCH.md, FIELD_REFERENCE.md,
  TEMPLATE_FEEDBACK.md)
- Update systemPrompt to reference skills alongside commands
- Update README to reflect new structure
- Add .claude/skills/pr/SKILL.md with systematic fork-based PR workflow
- Add .claude/commands/pr.md thin wrapper
- Handles pre-flight checks, fork setup, branch/commit/push, and PR creation
- Includes error recovery for common failures (no push access, sandbox
  restrictions, missing fork)
- Update systemPrompt and startupPrompt to include /pr as phase 6
- Update README with PR phase documentation
Make CLAUDE.md self-contained by including workflow phases, commands,
skill locations, and artifact path directly instead of referencing
ambient.json (which is platform plumbing, not model-visible).

Also fix markdown lint warnings (emphasis-as-heading, code block language).
- Add .claude/skills/review/SKILL.md — critically evaluates fix and tests,
  issues verdict (inadequate fix / incomplete tests / solid), recommends
  next steps
- Add .claude/commands/review.md thin wrapper
- Update /test skill to report verification.md path and summarize results
  inline to the user
- Update systemPrompt, startupPrompt, CLAUDE.md, README with /review as
  optional phase 5 between test and document
- systemPrompt: add PHASE TRANSITIONS section requiring agent to stop
  and summarize at end of each phase instead of auto-advancing
- CLAUDE.md: add Flow Control section reinforcing pause behavior,
  trim to ~70 lines per best practices
- pr/SKILL.md: add placeholders table defining GH_USER, FORK_OWNER,
  UPSTREAM_OWNER/REPO, REPO, and BRANCH_NAME with sources
- pr/SKILL.md: rewrite Step 2 to ask user before creating fork,
  wait for confirmation, never silently skip ahead
- pr/SKILL.md: replace flat fallback with 4-rung fallback ladder,
  patch file is now absolute last resort
- ambient.json: remove rubric (deferred to future PR),
  remove startupPrompt and results fields
- Delete ambient.clean.json (no longer needed)
- AGENTS.md: add sandbox restrictions table
- All command wrappers now say 'Read the file ... now and follow it step
  by step' instead of 'Using the X skill from ...' which the agent was
  interpreting as conceptual guidance rather than a file-read instruction
- systemPrompt: add CRITICAL prefix to skill-reading instruction, explicitly
  say 'use the Read tool to open the referenced SKILL.md file'
- pr/SKILL.md: add 'IMPORTANT: Follow This Skill Exactly' section at top
  with anti-improvisation rules
- pr/SKILL.md: add pre-flight summary checkpoint between Steps 1 and 2
- pr/SKILL.md: upgrade 'no fork' path from 'STOP' to 'HARD STOP' with
  explicit instructions to wait for user twice (before fork attempt,
  after fork attempt fails)
- pr/SKILL.md: add fork verification command before proceeding to Step 3
- Add critical rules: never attempt gh repo fork without asking user first,
  never fall back to patch files without exhausting all other options
- systemPrompt: add step to tell user which skill is being invoked
  before reading it, so user can correct if wrong phase was picked
- CLAUDE.md: reinforce skill announcement policy
- Step 1a: don't dead-end on missing gh auth — continue pre-flight to
  gather info from git, then present options
- Step 1b: add fallback git config when gh is unavailable
- Step 1d: add git-remote-based fallback for upstream repo identification
- Add decision point after pre-flight: present 3 clear options to user
  (set up auth, provide fork URL, or prepare for manual push) and WAIT
  for response instead of dumping all manual instructions at once
Every skill now ends with a 'When This Phase Is Done' section that:
- Tells the agent to summarize findings to the user
- Recommends the natural next step in the workflow
- Explicitly says to stop and wait for the user

This reinforces the phase-transition pause from the systemPrompt and
CLAUDE.md at the skill level, so the agent gets the instruction to stop
regardless of whether it's following the systemPrompt guidance or
reading the skill directly.

Next-step recommendations follow the natural flow:
reproduce → diagnose → fix → test → review → document → pr
- New /assess phase (phase 1): reads the bug report, presents understanding,
  identifies gaps, proposes reproduction plan, and waits for user confirmation
  before taking any action. No code execution or cloning in this phase.
- systemPrompt: updated to 8-phase flow starting with ASSESS, added explicit
  instruction to start with assess when user provides a bug report
- CLAUDE.md: updated phase list to include assess
- pr/SKILL.md: clarified that 'never push to upstream' applies even to org
  bots and GitHub Apps — no exceptions based on account type
- pr/SKILL.md: added reminder at pre-flight summary not to skip Step 2
- systemPrompt: scope 'start with ASSESS' to initial bug report only
- systemPrompt: add 'INTERPRETING USER RESPONSES' section so agent
  maps phase names and confirmations to the correct next phase
- assess skill: allow cloning and reading code (but not executing)
- assess skill: add Step 2 to check for repo and clone if missing,
  read referenced PRs/files to inform assessment
- assess skill: update output to note repo is cloned for later phases
- Add awareness of remaining phases (review, document, pr) so agent
  recommends the full path instead of jumping to PR
… via Task tool

Instead of the orchestrator reading and executing skill files directly
(which caused context bleed and skill-selection failures), Amber now
dispatches each phase to a subagent using the Task tool.

Changes:
- systemPrompt declares Amber as an orchestrator with dispatch protocol
- Command wrappers instruct dispatch via Task tool (not direct execution)
- Skill 'When This Phase Is Done' sections return results (not user-facing)
- CLAUDE.md reinforces orchestrator role and subagent dispatch pattern
Replace systemPrompt-based orchestration with a dedicated controller skill
(.claude/skills/controller/SKILL.md) that the model reads and follows.

The controller has:
- Phase table mapping commands to skill paths
- Explicit dispatch protocol via Task tool
- User response interpretation table (yes → last recommendation, not repeat)
- Critical rule: never re-dispatch the phase that just finished

systemPrompt is now minimal: identity + pointer to controller.
Command wrappers all route through the controller.
CLAUDE.md references the controller instead of restating flow rules.
…subagents

Switch from Task tool dispatch to direct execution so users see full
progress in the platform UI. The controller re-reads itself after each
phase to keep transition rules fresh and prevent getting stuck.

This preserves the correct transition behavior while restoring visibility.
Move all next-step logic out of individual skills and into the controller.
Skills now only report findings; the controller decides what to recommend.

Recommendations are flexible: the controller considers skipping forward,
going back, or ending early based on what actually happened, and presents
multiple options instead of a single rigid next step.
… descriptions

- Remove next-step recommendations from all 8 skills (they just report findings)
- Controller owns all recommendation logic with flexible guidance
- Add one-sentence descriptions to each phase in the controller
- Remove response-interpretation table (model handles this naturally)
…ncements

- systemPrompt and CLAUDE.md: re-read controller when choosing/starting a phase
- Controller: explicit phase announcement with example before execution
- Avoids re-reading on every message (context budget concern)
- pr/SKILL.md: use .parent.owner.login and .parent.name instead of
  non-existent .parent.nameWithOwner in fork detection query
- ADR: add lesson 13 (test API responses against real data), lesson 5
  (controller re-read timing), lesson 6 (phase descriptions improve
  recommendations); update lesson 2 (drop response-interpretation table);
  renumber to 15 total lessons
…ontroller

- pr/SKILL.md: recognize bot 403 as expected, use GitHub compare URL for
  manual PR creation instead of bare fork URL, prevent patch-file spiral
- review/SKILL.md: remove A/B/C verdict labels (meaningless to users)
- controller/SKILL.md: add 'continue to next step' as default recommendation,
  clarify recommendation timing
- ADR: add lesson 14 (GitHub App on user ≠ permission on upstream), now 16 total
- All 8 skills now end with 'Then re-read the controller for next-step guidance'
- Removed redundant re-read instructions from controller, systemPrompt, CLAUDE.md
- Controller Step 4 now acknowledges the skill-triggered re-read instead of duplicating it
- Net effect: one place triggers the re-read (end of skill), one place handles it (controller)
@jwm4 jwm4 marked this pull request as ready for review February 14, 2026 18:58
@jeremyeder
Copy link
Contributor

This is solid, I'll load it as a custom workflow and see how we do. Thanks!

@jwm4
Copy link
Contributor Author

jwm4 commented Feb 15, 2026

Oh, I forgot to mention this in the PR description. Here are three things that I think should be in this workflow but are not in this PR because I wanted to break up the work a little:

  1. YOLO mode, i.e., some way to tell the model to just run the entire workflow and without stopping and asking questions. I don't think this is hard to do, but I do think it will take some iterating to get it to work well because the current workflow has pretty strong guidance to not YOLO, getting it to YOLO only when it's supposed to (and making sure it doesn't forget whether it's in YOLO mode or not) might be a little tricky. I have some thoughts on how to do it, but I want to take the time to get it right.
  2. Context7 and/or DeepWiki MCP servers: Since these are free hosted servers that don't require any credentials or set up, it would be super easy to include them in the workflow I think. I haven't tried that out, but I think it would work. I would be inclined to start with just Context7 to start because I've gotten a lot of good results from that one and furthermore it's usually better to limit how many different options you give a model. The workflow is fairly complex already so if we're going to add an extra tools I would want to do so sparingly.
  3. The Ambient-specific rubric mechanism: Right now the workflow has a skill for assessing the quality of a fix. That skill could probably be complimented by having a formal rubric schema in ambient.json and a corresponding rubric.md, but I don't really have good intuition as to what guidance would go in the skill versus what guidance would go in the rubric.md versus what would be implicit in the schema. I also don't really understand what's being done with either the schema or the rubric.md. So I think a lot more work would be needed to figure out how to make this skill and the Ambient-specific rubric mechanism work well together. I'm also a little concerned that if we do this and we wind up limiting the scope of the skill because it's covered well through the Ambient-specific rubric mechanism then the workflow becomes less portable, i.e., you can't easily just drop it into Claude Code because the skill is specifically designed to complement the Ambient-specific rubric mechanism and thus not to work well as a standalone skill. So I'm a little on the fence about whether this is even a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments