Skip to content

Consolidate MCPRegistry status to match MCPServer pattern#4643

Merged
ChrisJBurns merged 4 commits intomainfrom
chrisburns/mcpregistry-status-consolidation
Apr 8, 2026
Merged

Consolidate MCPRegistry status to match MCPServer pattern#4643
ChrisJBurns merged 4 commits intomainfrom
chrisburns/mcpregistry-status-consolidation

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns commented Apr 7, 2026

Summary

MCPRegistry had an over-engineered three-phase status model with separate SyncStatus and APIStatus sub-objects, a DeriveOverallPhase() method, and a StatusCollector batching pattern. With sync operations moved to the registry server itself, only deployment readiness remains as a status source — making this complexity unnecessary and coupling the operator to internal registry server state.

This flattens MCPRegistryStatus to the standard Kubernetes workload pattern used by MCPServer: Phase + Ready condition + ReadyReplicas + URL. This enables kubectl wait --for=condition=Ready and consistent tooling across all ToolHive workloads.

Closes #4251
Closes #4269
Closes #4270
Closes #4510

Type of change

  • Refactoring (no behavior change)

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Changes

File Change
api/v1alpha1/mcpregistry_types.go Flatten MCPRegistryStatus to Phase/Message/URL/ReadyReplicas/Conditions/ObservedGeneration. Remove SyncStatus, APIStatus, StorageReference, SyncPhase, APIPhase types. Simplify MCPRegistryPhase to Pending/Running/Failed/Terminating. Update printer columns to match MCPServer.
controllers/mcpregistry_controller.go Rewrite status logic to set fields directly — no StatusManager/StatusDeriver. Add updateRegistryStatus() with refetch-before-update. Add setRegistryReadyCondition() using shared ConditionTypeReady. Set ObservedGeneration on conditions.
pkg/registryapi/types.go Relocate Error type from deleted mcpregistrystatus package. Add GetReadyReplicas() to Manager interface.
pkg/registryapi/manager.go Update Error references, replace ConditionAPIReady with ConditionTypeReady, add GetReadyReplicas implementation.
pkg/mcpregistrystatus/ Deleted — entire package (7 files). StatusCollector, StatusDeriver, APIStatusCollector patterns removed.
pkg/validation/image_validation.go MCPRegistryPhaseReadyMCPRegistryPhaseRunning
test-integration/mcp-registry/ Remove sync-related status helpers. Update all phase/condition references across 6 test files.
DESIGN.md Document simplified status model with history of removed StatusCollector pattern.
CRD YAMLs Regenerated to reflect new status schema and printer columns.

Does this introduce a user-facing change?

Yes — MCPRegistry status is simplified:

  • .status.phase values change: Ready is now Running, Syncing is removed
  • .status.syncStatus and .status.apiStatus fields are removed
  • New fields: .status.url and .status.readyReplicas
  • kubectl get mcpregistries columns change from Phase/Sync/API/Servers/Last Sync/Age to Status/Ready/Replicas/URL/Age
  • kubectl wait --for=condition=Ready now works

This is a breaking API change to v1alpha1 which carries no stability guarantee.

Large PR justification

This PR touches 26 files (+325/-2270) but the actual review surface is small:

  • Production code additions: 170 lines across 5 modified files — well under the 400-line / 10-file limits
  • ~1,005 lines are deletions from removing the entire mcpregistrystatus/ package and its tests
  • ~350 lines are regenerated CRD YAML and zz_generated.deepcopy.go
  • ~140 lines are mechanical test updates (phase/condition renames)

This is an atomic refactoring that cannot be split: the type changes in mcpregistry_types.go cascade to the controller, registryapi, validation, and CRDs. Splitting would create intermediate states that don't compile.

Key files for review (the substantive changes):

File Lines What to review
api/v1alpha1/mcpregistry_types.go +22/-201 New status shape: Phase/Message/URL/ReadyReplicas/Conditions
controllers/mcpregistry_controller.go +95/-148 updateRegistryStatus() and setRegistryReadyCondition() replacing StatusCollector
pkg/registryapi/manager.go +32/-12 GetReadyReplicas() addition and Error type relocation

Special notes for reviewers

  • Breaking CRD change: v1alpha1 has no stability guarantee, no conversion webhook needed. Orphaned etcd fields will be overwritten on next reconciliation.
  • Stopped phase intentionally omitted — MCPServer has it for scale-to-zero but MCPRegistry has no replicas spec field.
  • Pre-existing test failures: TestMCPRemoteProxyStatusProgression has a data race unrelated to this PR. The vmcp task gen mockgen failures are a Go version mismatch (go1.24 mockgen vs go1.26 runtime) also pre-existing.

Generated with Claude Code

MCPRegistry had an over-engineered three-phase status model with separate
SyncStatus and APIStatus sub-objects, a DeriveOverallPhase() method, and
a StatusCollector batching pattern. With sync operations moved to the
registry server itself, only deployment readiness remains as a status
source, making this complexity unnecessary.

Flatten MCPRegistryStatus to the standard Kubernetes workload pattern
used by MCPServer: Phase + Ready condition + ReadyReplicas + URL. This
enables kubectl wait --for=condition=Ready and consistent tooling across
all ToolHive workloads.

Closes #4251
Closes #4269
Closes #4270
Closes #4510

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 7, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 62.09677% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.84%. Comparing base (f8b6051) to head (cf86298).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/registryapi/manager.go 6.66% 28 Missing ⚠️
...thv-operator/controllers/mcpregistry_controller.go 80.89% 11 Missing and 6 partials ⚠️
cmd/thv-operator/pkg/registryapi/types.go 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4643      +/-   ##
==========================================
- Coverage   68.89%   68.84%   -0.06%     
==========================================
  Files         505      506       +1     
  Lines       52425    52597     +172     
==========================================
+ Hits        36118    36210      +92     
- Misses      13514    13585      +71     
- Partials     2793     2802       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude Please review this using the agents

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 7, 2026

Claude finished @ChrisJBurns's task in 3m 15s —— View job


PR Review: Consolidate MCPRegistry status to match MCPServer pattern

  • Gather context and understand the PR
  • Read key changed files
  • Run code-reviewer agent for Go/conventions analysis
  • Run kubernetes-expert agent for operator patterns
  • Synthesize findings and post review

Overview

The refactoring goal is sound and the overall design is correct. The MCPRegistryStatus struct now properly matches the MCPServer pattern (Phase + Ready condition + ReadyReplicas + URL), kubectl wait --for=condition=Ready works, list markers are correct, and the CRD change is handled appropriately for v1alpha1. Several concrete issues are worth addressing before merge.


Issues to Fix

1. Stale URL not cleared on Failed/Pending transitions
cmd/thv-operator/controllers/mcpregistry_controller.go — error branch (~line 217) and not-ready branch (~line 242)

When a registry transitions from RunningFailed or RunningPending, latest.Status.URL is never cleared. A kubectl get mcpregistry would show Phase: Failed alongside a valid-looking URL, misleading users into thinking the registry is accessible.

// In the not-ready branch:
latest.Status.Phase = mcpv1alpha1.MCPRegistryPhasePending
latest.Status.URL = ""   // add this
// In the error branch:
latest.Status.Phase = mcpv1alpha1.MCPRegistryPhaseFailed
latest.Status.URL = ""   // add this

Fix this →

2. IsAPIReady called twice per reconcile — TOCTOU + wasted API call
cmd/thv-operator/controllers/mcpregistry_controller.go — called inside updateRegistryStatus (~line 230) and again in the requeue check (~line 173)

Each call makes a live client.Get against the Kubernetes API. If the deployment state changes between the two calls (TOCTOU), the status could say Phase: Pending but the requeue skips the 30s wait (or vice versa). The .Times(2) expectations in tests document the duplication rather than fixing it.

The fix is to return the readiness result from updateRegistryStatus and reuse it:

func (r *MCPRegistryReconciler) updateRegistryStatus(...) (bool, error) { // return isReady
    ...
    isReady := r.registryAPIManager.IsAPIReady(ctx, mcpRegistry)
    ...
    return isReady, r.Status().Update(ctx, latest)
}

Fix this →

3. Redundant ReadyReplicas assignment in not-ready branch
cmd/thv-operator/controllers/mcpregistry_controller.go — not-ready branch

latest.Status.ReadyReplicas = readyReplicas is assigned before the if isReady branch and then redundantly re-assigned inside the else branch. Remove the second assignment.

4. Status update failure silently ignored on valid PodTemplateSpec path
cmd/thv-operator/controllers/mcpregistry_controller.govalidateAndUpdatePodTemplateStatus (~line 326)

The invalid-spec path returns false on status update failure, but the valid-spec path logs the error and returns true anyway — silently swallowing the condition. The PodTemplateValid=True condition is lost if the write fails. Align the valid-spec path to also return false on update failure.

5. Use registryapi.RegistryAPIPort instead of magic number 8080
cmd/thv-operator/controllers/mcpregistry_controller.go — ~line 235

The constant RegistryAPIPort = 8080 already exists in cmd/thv-operator/pkg/registryapi/types.go. Using the magic number creates a silent drift risk if the port ever changes.

endpoint := fmt.Sprintf("http://%s.%s:%d",
    mcpRegistry.GetAPIResourceName(), mcpRegistry.Namespace, registryapi.RegistryAPIPort)

Suggestions (non-blocking)

6. Missing tests for IsAPIReady and GetReadyReplicas in manager.go

Codecov reports manager.go at 13% patch coverage. The two new methods IsAPIReady and GetReadyReplicas are untested — both make API calls and have distinct failure modes (deployment not found, found-but-not-ready). Worth covering at minimum: (a) deployment not found → 0/false, (b) deployment found and ready → true/N.

7. Duplicate condition reason values
cmd/thv-operator/api/v1alpha1/mcpregistry_types.goConditionReasonRegistryReady = "Ready" and ConditionReasonRegistryNotReady = "NotReady"

These values are identical to ConditionReasonReady / ConditionReasonNotReady already defined in mcpserver_types.go:27-30. Consider either reusing the existing constants or adding a comment explaining why registry has its own names.

8. Unconditional Status().Update() in updateRegistryStatus

The status is always written even when nothing changed. This causes a minor reconcile-loop amplification (triggers a new watch event → re-enqueues). This is pre-existing debt shared with the MCPServer controller, so not a regression — worth a follow-up issue.

9. Large PR justification section needed

The automated review bot (github-actions) requested a ## Large PR Justification section in the PR description to dismiss its review. The PR description already contains the motivation clearly (large because it's an atomic refactor deleting 7 files of an entire package), just needs the explicit section:

## Large PR Justification
- Deletes the entire `pkg/mcpregistrystatus/` package (7 files, ~1100 lines) atomically — splitting would leave the codebase in an inconsistent state between old and new status models
- CRD and deepcopy files are generated and cannot be partially split

Kubernetes Operator Pattern Verification

The kubernetes-expert agent confirmed:

  • MCPRegistryStatus fields now match MCPServer exactly (Conditions, ObservedGeneration, Phase, Message, URL, ReadyReplicas)
  • kubectl wait --for=condition=Ready correctly supported — ConditionTypeReady = "Ready" with metav1.ConditionTrue/False values
  • +listType=map / +listMapKey=type markers on Conditions are correct
  • MCPRegistryPhase enum constraint correctly placed on type definition
  • No conversion webhook needed for v1alpha1 — correct call; orphaned etcd fields will be pruned on next write
  • Shared ConditionPodTemplateValid constants correctly defined once in mcpserver_types.go and reused

@github-actions github-actions bot dismissed their stale review April 8, 2026 15:51

Large PR justification has been provided. Thank you!

@github-actions github-actions bot removed the size/XL Extra large PR: 1000+ lines changed label Apr 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 8, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
Copy link
Copy Markdown
Collaborator Author

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: kubernetes-expert, code-reviewer, toolhive-expert

Consensus Summary

# Finding Consensus Severity Action
1 Architecture docs have stale refs to deleted package 10/10 HIGH Fix
2 ObservedGeneration uses stale object 8/10 MEDIUM Fix
3 3 redundant Deployment GETs per reconcile 10/10 MEDIUM Fix
4 Double status update from PodTemplate validation 10/10 MEDIUM Discuss
5 Error.ConditionType field is dead code 7/10 MEDIUM Fix
6 Redundant ReadyReplicas assignment 7/10 LOW Fix

Overall

This is a well-executed simplification that removes ~2,000 lines of unnecessary abstraction. The core design is sound — with sync operations moved to the registry server, the StatusCollector/Deriver pattern was genuinely over-engineered. The new code is cleaner, consistent with MCPServer, and enables standard kubectl wait --for=condition=Ready.

The findings are about polishing the new implementation, not questioning the approach.

Documentation (Finding 1 — HIGH)

Per CLAUDE.md: "When making changes that affect architecture, update relevant docs in docs/arch/."

Two architecture docs still reference the deleted mcpregistrystatus package, old status fields, and removed phases:

  • docs/arch/06-registry-system.md lines 636-654: Shows syncStatus/apiStatus YAML, lists Syncing/Ready phases, references cmd/thv-operator/pkg/mcpregistrystatus/
  • docs/arch/09-operator-architecture.md lines 485-549: References "Batched updates via StatusCollector", cmd/thv-operator/pkg/mcpregistrystatus/, and SyncStatus field (twice)

These need to be updated to reflect the new flat status model (Phase/Ready condition/ReadyReplicas/URL).


Generated with Claude Code

- Use latest.Generation instead of stale mcpRegistry.Generation for
  ObservedGeneration (Finding 2)
- Add GetAPIStatus() combining IsAPIReady + GetReadyReplicas into a
  single Deployment fetch, and return isReady from updateRegistryStatus
  to eliminate 3 redundant GETs per reconcile (Finding 3)
- Defer PodTemplate condition to updateRegistryStatus instead of writing
  status twice per reconciliation on the valid path (Finding 4)
- Remove dead ConditionType field from registryapi.Error (Finding 5)
- Remove redundant ReadyReplicas assignment (Finding 6)
- Update architecture docs to reflect simplified status model (Finding 1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

Finding 1 (docs) addressed in 3cd01b4 — updated docs/arch/06-registry-system.md and docs/arch/09-operator-architecture.md to reflect the simplified status model. Removed references to deleted mcpregistrystatus/ package, StatusCollector pattern, SyncStatus field, and old phases.

@ChrisJBurns ChrisJBurns marked this pull request as ready for review April 8, 2026 19:03
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
@ChrisJBurns ChrisJBurns merged commit d68ea47 into main Apr 8, 2026
40 checks passed
@ChrisJBurns ChrisJBurns deleted the chrisburns/mcpregistry-status-consolidation branch April 8, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

2 participants