Skip to content

Align MCPRegistry CRD with registry server v2 config format#4653

Merged
rdimitrov merged 13 commits intomainfrom
rdimitrov/mcpregistry-v2-crd-types
Apr 8, 2026
Merged

Align MCPRegistry CRD with registry server v2 config format#4653
rdimitrov merged 13 commits intomainfrom
rdimitrov/mcpregistry-v2-crd-types

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

@rdimitrov rdimitrov commented Apr 7, 2026

Summary

The registry server (toolhive-registry-server) moved to a v2 config format that separates data sources from registry views and adds claims-based authorization. The operator's MCPRegistry CRD was still generating v1 config (flat registries[] with inline source configs), so operators deploying via the CRD got none of the v2 authorization features.

This updates the MCPRegistry CRD and config generation to produce v2-compatible config with full parity:

  • Split spec.registries into spec.sources (data source definitions) and spec.registries (lightweight views referencing sources by name)
  • Add claims support on both sources and registries via apiextensionsv1.JSON
  • Add ManagedSource, KubernetesSource (with namespace selectors), and URLSource as new source types
  • Add authz (role-based access control) and publicPaths to auth config
  • Add maxMetaSize and dynamicAuth (AWS RDS IAM) to database config
  • Add telemetryConfig with tracing (sampling) and metrics sub-configs
  • Remove PVC source type (not a supported use case)

Closes #4572

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Deployed updated CRDs to a Kind cluster, applied a v2 MCPRegistry resource with sources + registries + claims, ran the operator locally, and verified the generated ConfigMap produces correct v2 YAML with sources[] and registries[].

Changes

File Change
cmd/thv-operator/api/v1alpha1/mcpregistry_types.go New MCPRegistrySourceConfig, MCPRegistryViewConfig, ManagedSource, KubernetesSource, URLSource, MCPRegistryAuthzConfig, MCPRegistryRolesConfig, MCPRegistryTelemetryConfig, MCPRegistryDynamicAuthConfig; claims via apiextensionsv1.JSON; PublicPaths on auth; MaxMetaSize/DynamicAuth on database; removed PVCSource
cmd/thv-operator/api/v1alpha1/zz_generated.deepcopy.go Regenerated
cmd/thv-operator/pkg/registryapi/config/config.go v2 SourceConfig/RegistryConfig structs, claims deserialization, authz mapping, telemetry/database config mapping, URL source support, removed PVC config building
cmd/thv-operator/pkg/registryapi/config/config_test.go All tests migrated to v2 types, removed PVC tests
cmd/thv-operator/pkg/registryapi/deployment.go Iterate Spec.Sources for git auth mounts
cmd/thv-operator/pkg/registryapi/podtemplatespec.go WithRegistrySourceMounts takes []MCPRegistrySourceConfig, removed PVC volume mounting
cmd/thv-operator/pkg/registryapi/{*_test.go} All tests migrated to v2 types
cmd/thv-operator/test-integration/mcp-registry/*.go Test helpers and integration tests migrated; builder syncs source names; removed PVC test file
deploy/charts/operator-crds/** Regenerated CRD manifests
docs/operator/crd-api.md Regenerated CRD API reference docs
examples/operator/mcp-registries/*.yaml Example manifests updated to v2 format; removed PVC and multi-source PVC examples

Does this introduce a user-facing change?

Yes. This is a breaking CRD changespec.registries changes meaning from source definitions to registry views. Since this is v1alpha1 (explicitly unstable), an in-place update is acceptable. Existing MCPRegistry resources must be updated to the new format.

Migration example:

Before:

spec:
  registries:
    - name: production
      format: toolhive
      configMapRef: { name: prod-registry, key: registry.json }

After:

spec:
  sources:
    - name: production
      format: toolhive
      configMapRef: { name: prod-registry, key: registry.json }
  registries:
    - name: default
      sources: ["production"]

Special notes for reviewers

  • Claims typing: Uses apiextensionsv1.JSON for claims fields on sources and registries, and for role entries in authz config. The config generator deserializes and validates that values are string or []string before emitting map[string]any in the YAML output.
  • No auto-injection: Users must explicitly define all sources including Kubernetes discovery sources. This avoids implicit behavior that would be surprising with claims-based authorization.
  • PVC source removed: The PVC source type was removed as it is not a supported use case.
  • PR size: The diff is large but most of it is generated code (CRD manifests, deepcopy), test migrations, and example YAML updates. The core logic changes are in mcpregistry_types.go and config.go.

Generated with Claude Code

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 7, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 35.24229% with 147 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.70%. Comparing base (d68ea47) to head (2741ab3).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/registryapi/config/config.go 31.94% 127 Missing and 20 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4653      +/-   ##
==========================================
- Coverage   68.80%   68.70%   -0.11%     
==========================================
  Files         506      507       +1     
  Lines       52597    52837     +240     
==========================================
+ Hits        36189    36301     +112     
- Misses      13613    13720     +107     
- Partials     2795     2816      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rdimitrov rdimitrov force-pushed the rdimitrov/mcpregistry-v2-crd-types branch from fee3776 to 821ccda Compare April 7, 2026 22:44
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
rdimitrov and others added 3 commits April 8, 2026 12:09
The registry server has moved to a v2 config format that separates data
sources from registry views, adds claims-based authorization, and
supports managed and Kubernetes source types. The operator's MCPRegistry
CRD was still generating v1 config (flat registries[] with inline source
configs), meaning operators deploying via the CRD got none of the v2
authorization features.

This updates the CRD spec to use separate sources[] and registries[]
fields, adds support for claims (apiextensionsv1.JSON), authz roles,
managed sources, Kubernetes sources with namespace selectors, and public
paths on auth config. The config generator now produces v2-compatible
YAML that the registry server can consume directly.

A default Kubernetes source is auto-injected when the user defines none,
preserving backward compatibility for MCP server discovery.

Closes #4572

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users should explicitly define all sources in their MCPRegistry spec,
including Kubernetes discovery sources. The auto-injection was adding
implicit behavior that would be surprising when combined with
claims-based authorization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rdimitrov rdimitrov force-pushed the rdimitrov/mcpregistry-v2-crd-types branch from 99e64c0 to d2d6430 Compare April 8, 2026 09:09
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
ChrisJBurns

This comment was marked as duplicate.

Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Code Review

Reviewed by three specialized agents: Code Reviewer (Go style/conventions), Kubernetes Expert (CRD/operator patterns), and Security Advisor (auth, claims, threat modeling). See inline comments.

Legend: 🔴 Must Fix | 🟡 Strongly Recommended | 🟢 Nice to Have

Confirmed Positive

  • Sources/registries split is well-designed (K8s Expert)
  • apiextensionsv1.JSON is the correct type choice for claims (K8s Expert)
  • DeepCopy generation is correct for all apiextensionsv1.JSON fields (K8s Expert)
  • deserializeClaims is safe and well-implemented (Security Advisor)
  • PVC removal is clean across all files (Code Reviewer)
  • Test coverage is thorough (all agents)

Review performed by Claude Code using code-reviewer, kubernetes-expert, and security-advisor agents.

Fix misplaced godoc for buildCACertFilePath that was attached to
buildTelemetryConfig. Return error from buildTelemetryConfig instead of
silently swallowing ParseFloat failures on sampling values. Add value
type validation to deserializeRoleEntry matching the existing claims
validation contract.

Add CEL XValidation rules: exactly-one-source-type per source config,
registry-to-source cross-reference validation, at-most-one managed
source, authz rejected with anonymous auth mode. Remove redundant
MinItems XValidation rules now covered by field-level markers.

Add kubebuilder validation on publicPaths requiring "/" prefix. Update
REGISTRY.md examples to v2 format with sources/registries split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
Update the Examples section (Production, Development, Private Git, and
Multiple Sources) to use the v2 sources/registries split format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
The nested all/all/exists CEL rule for validating registry-to-source
references exceeded the Kubernetes CEL cost budget by over 100x. Remove
it and keep validation at reconciliation time only (which already
works). Add MaxItems=20 on sources and registries to bound CEL cost
for the remaining rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
Resolve merge conflict: keep pvc_source_test.go deleted since PVC
source support was removed. Fix WithUpstreamFormat() which was setting
RegistryFormatToolHive instead of RegistryFormatUpstream. Add missing
doc comment on RegistryFormatUpstream constant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
Update the default registry API image from v0.6.6 to v1.0.0 which
supports the v2 config format with sources/registries split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
@rdimitrov
Copy link
Copy Markdown
Member Author

Local Validation Results

Tested end-to-end on a Kind cluster with thv-registry-api:v1.0.0 (the new release that supports v2 config format).

Setup

  • Kind cluster with CNPG PostgreSQL
  • Operator built from this branch with task operator-deploy-local
  • Registry API image: ghcr.io/stacklok/thv-registry-api:v1.0.0

MCPRegistry Applied

sources:
  - name: toolhive-catalog
    format: toolhive
    configMapRef: { name: toolhive-registry-data, key: registry.json }
    syncPolicy: { interval: "1h" }
    filter:
      tags:
        include: ["production"]
  - name: k8s-discovery
    kubernetes:
      namespaces: [test-registry]
registries:
  - name: default
    sources: [toolhive-catalog, k8s-discovery]

Config Generation

Generated v2 config.yaml with correct structure:

  • sources[] with file path for ConfigMap, kubernetes with namespace selector
  • registries[] as view referencing both sources
  • Tag filter correctly included
  • Database and auth config passed through

Registry Server

"Loaded configuration","source_count":2,"registry_count":1,"auth_mode":"anonymous"
  • DB migrations ran successfully
  • ConfigMap source synced 2 servers with tag filtering (both matched production tag)
  • Kubernetes source initialized (RBAC errors for cluster-scope listing are pre-existing)
  • /health{"status":"healthy"}
  • /v1/sources → both sources with sync=complete
  • /v1/registriesdefault view with both source references
  • MCPRegistry status: Phase=Running, Condition Ready=True

CEL Validation Rules

All admission-time validations working:

Test Expected Result
No source type on a source Rejected exactly one source type must be specified
Two source types on one source Rejected exactly one source type must be specified
Two managed sources Rejected at most one managed source is allowed
Authz with anonymous auth Rejected authz configuration has no effect when auth mode is anonymous
pvcRef field (removed) Rejected unknown field "spec.sources[0].pvcRef"
Valid v2 MCPRegistry Accepted Reconciled to Ready

Generated with Claude Code

@rdimitrov rdimitrov merged commit a443469 into main Apr 8, 2026
40 checks passed
@rdimitrov rdimitrov deleted the rdimitrov/mcpregistry-v2-crd-types branch April 8, 2026 21:30
@rdimitrov rdimitrov self-assigned this Apr 8, 2026
ChrisJBurns added a commit that referenced this pull request Apr 9, 2026
The MCPRegistry CRD currently mirrors the registry server's config.yaml
structure field-by-field, coupling the operator to registry server config
changes. PR #4653 demonstrated this cost: 2,993+/2,458- lines to align
the CRD with a v2 config format.

Add a new decoupled config path alongside the existing typed fields:

- configYAML: raw YAML string passed through to the registry server as
  config.yaml, with no operator parsing or transformation
- volumes/volumeMounts: standard corev1 types for user-managed volume
  wiring (secrets, ConfigMaps, PVCs)
- pgpassSecretRef: operator-managed pgpass mount with init container
  for chmod 0600 (the one Kubernetes plumbing detail users cannot
  express through volumes alone)

The two paths are mutually exclusive (CEL + reconciler validation).
The existing typed fields are deprecated but fully functional, allowing
gradual migration. A future release will remove the legacy path.

Includes RFC document, validation with reserved name/mount path
collision detection, 27 new unit tests, and 6 example manifests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChrisJBurns added a commit that referenced this pull request Apr 9, 2026
* Add decoupled configYAML path to MCPRegistry CRD

The MCPRegistry CRD currently mirrors the registry server's config.yaml
structure field-by-field, coupling the operator to registry server config
changes. PR #4653 demonstrated this cost: 2,993+/2,458- lines to align
the CRD with a v2 config format.

Add a new decoupled config path alongside the existing typed fields:

- configYAML: raw YAML string passed through to the registry server as
  config.yaml, with no operator parsing or transformation
- volumes/volumeMounts: standard corev1 types for user-managed volume
  wiring (secrets, ConfigMaps, PVCs)
- pgpassSecretRef: operator-managed pgpass mount with init container
  for chmod 0600 (the one Kubernetes plumbing detail users cannot
  express through volumes alone)

The two paths are mutually exclusive (CEL + reconciler validation).
The existing typed fields are deprecated but fully functional, allowing
gradual migration. A future release will remove the legacy path.

Includes RFC document, validation with reserved name/mount path
collision detection, 27 new unit tests, and 6 example manifests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Use raw JSON for volumes/volumeMounts to reduce CRD schema size

Change Volumes and VolumeMounts fields from typed []corev1.Volume /
[]corev1.VolumeMount to []apiextensionsv1.JSON with
x-kubernetes-preserve-unknown-fields. This avoids expanding the full
Kubernetes volume OpenAPI schema into the CRD manifest, reducing the
CRD diff by ~3,900 lines.

Add ParseVolumes() and ParseVolumeMounts() helper methods on
MCPRegistrySpec that deserialize the raw JSON into typed objects at
reconciliation time. Validation of volume structure happens at pod
creation time rather than CRD admission time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove RFC document from tracked files

The RFC is a design document used to guide implementation, not a
permanent repo artifact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address review findings from multi-agent consensus review

Fix correctness bugs, validation gaps, code duplication, and test
coverage identified during the review of the decoupled configYAML path.

Correctness:
- Config hash now covers full spec (not just ConfigYAML), so changes
  to volumes/volumeMounts/pgpassSecretRef trigger pod rollouts
- Parse errors for volumes/volumeMounts are returned as errors instead
  of being silently swallowed in the deployment builder

Validation:
- CEL mutual exclusivity rule now covers all legacy fields (registries,
  databaseConfig, authConfig, telemetryConfig), not just sources
- TelemetryConfig added to reconciler-side validatePathExclusivity
- MaxLength=131072 on configYAML, MaxItems=50 on volumes/volumeMounts

Duplication:
- Extract upsertDeployment to share create-or-update logic between
  ensureDeployment and ensureDeploymentNewPath
- Extract withPGPassMountFromVolume to share init container logic
  between WithPGPassMount and WithPGPassSecretRefMount

Tests:
- Add ParseVolumes/ParseVolumeMounts unit tests (8 cases)
- Add PodTemplateSpec reserved name/mount path collision tests (3 cases)
- Extract pgpassEnvVar constant to fix goconst lint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Simplify CEL rules and fix has() guards for absent fields

Remove the overly complex mutual exclusivity CEL rule — the reconciler
handles this as defense-in-depth, and the registry server validates its
own config. Keep only the essential admission-time rules:

- Either configYAML or sources must be set (operator path selection)
- At most one managed source (legacy path constraint)

Guard all CEL rules with has() checks so they handle absent optional
fields (e.g., sources is absent when using the new configYAML path).

Verified end-to-end with Kind cluster + CNPG PostgreSQL + registry
server v1.0.0: MCPRegistry accepted, ConfigMap created with raw YAML,
registry-api pod running and serving /health, /readiness, /v1/registries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Validate pgpassSecretRef name and key are non-empty

WithPGPassSecretRefMount silently no-ops when name or key is empty,
which would produce a deployment without pgpass mounting and a confusing
database connection error at runtime. Catch this early in reconciler
validation with clear error messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove MaxLength and MaxItems constraints from new config fields

These constraints couple the operator to opinions about size limits
that Kubernetes and etcd already enforce naturally (1MB ConfigMap limit,
~1.5MB etcd object limit). Removing them keeps the decoupled path
truly pass-through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Extract validatePGPassSecretRef to fix gocyclo lint

Split pgpassSecretRef name/key validation into its own function to
bring validatePathExclusivity below the cyclomatic complexity threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix all CI lint failures

Add nolint:staticcheck directives to legacy code that intentionally
references deprecated fields (config.go, pgpass_test.go). Move inline
nolint comments above the line to stay within the 130-char line limit.

Fix unparam lint on ensureService by removing the unused *corev1.Service
return value — both callers discarded it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Regenerate CRD API reference docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Align MCPRegistry CRD with registry server v2 config format

2 participants