Skip to content

Add decoupled configYAML path to MCPRegistry CRD#4693

Merged
ChrisJBurns merged 11 commits intomainfrom
worktree-registry-crd-refactor
Apr 9, 2026
Merged

Add decoupled configYAML path to MCPRegistry CRD#4693
ChrisJBurns merged 11 commits intomainfrom
worktree-registry-crd-refactor

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns commented Apr 9, 2026

Summary

The MCPRegistry CRD currently mirrors the registry server's config.yaml structure field-by-field, coupling the operator to registry server config changes. PR #4653 demonstrated this cost: 2,993+/2,458- lines to align the CRD with a v2 config format. Every time the registry server's config evolves, the operator must update its CRD types, config generation logic, and tests — often as a breaking CRD change.

This adds a new decoupled config path alongside the existing typed fields, allowing gradual migration:

  • configYAML: raw YAML string passed through to the registry server as config.yaml — the operator does not parse, validate, or transform it
  • volumes / volumeMounts: standard corev1.Volume and corev1.VolumeMount types for user-managed volume wiring (secrets, ConfigMaps, PVCs)
  • pgpassSecretRef: operator-managed pgpass mount with init container for chmod 0600 — the one piece of Kubernetes plumbing users cannot express through volumes alone (PostgreSQL's libpq rejects pgpass files that aren't mode 0600, and Kubernetes secret volumes mount as root-owned)

The two paths are mutually exclusive (CEL admission rules + reconciler defense-in-depth validation). The existing typed fields (sources, registries, databaseConfig, authConfig) are deprecated but fully functional. A future release will remove the legacy path after users have migrated.

Resolves: #4702

Type of change

  • New feature

Test plan

  • Unit tests (task test) — 27 new tests covering RawConfigToConfigMap, WithPGPassSecretRefMount, and validateNewPathSpec (mutual exclusivity, reserved names, mount path collisions)
  • Linting (task lint-fix) — no new lint issues (pre-existing deprecation warnings expected)
  • Build verification (go build ./cmd/thv-operator/...)

Changes

File Change
mcpregistry_types.go Add ConfigYAML, Volumes, VolumeMounts, PGPassSecretRef fields; deprecate Sources, Registries, DatabaseConfig, AuthConfig; add CEL mutual exclusivity rules
zz_generated.deepcopy.go Regenerated for new fields
mcpregistry_controller.go Add validateNewPathSpec with 4 focused sub-functions (path exclusivity, reserved names, mount path collisions); call before reconciliation
mcpregistry_controller_test.go 15 new validation test cases + fixture updates for relaxed Sources requirement
config/raw_config.go New RawConfigToConfigMap() — creates ConfigMap from raw YAML with content checksum
config/raw_config_test.go 4 test cases
deployment.go New buildRegistryAPIDeploymentNewPath + ensureDeploymentNewPath — new path deployment builder with user volume/mount injection
manager.go New reconcileNewPath; refactor ReconcileAPIService to branch new/legacy path
podtemplatespec.go New WithPGPassSecretRefMount — takes user SecretKeySelector instead of generated secret name
podtemplatespec_test.go 8 test cases for WithPGPassSecretRefMount
types.go Export 5 constants for cross-package validation access
CRD manifests Regenerated
6 example YAMLs New configYAML examples: minimal, ConfigMap, git auth, API, pgpass, OAuth
RFC document docs/proposals/001-mcpregistry-config-decoupling.md

Does this introduce a user-facing change?

Yes. Users can now choose between two config paths:

New path (recommended for new deployments):

spec:
  configYAML: |
    sources:
      - name: k8s
        format: upstream
        kubernetes: {}
    registries:
      - name: default
        sources: ["k8s"]
    database:
      host: postgres
      port: 5432
      user: db_app
      database: registry
    auth:
      mode: anonymous

Legacy path (existing, deprecated): unchanged, fully functional.

Special notes for reviewers

  • No legacy breakage: all existing types, functions, tests, and examples are untouched. The legacy path is exercised by the same tests as before.
  • RFC included: docs/proposals/001-mcpregistry-config-decoupling.md documents the full design, all migration scenarios, PodTemplateSpec merge conflict analysis, pgpass rationale, and a Phase 2 deprecation removal plan.
  • PR size: the diff is large but ~5,300 lines are regenerated CRD manifests. Core logic is ~1,100 lines of new code.
  • pgpassSecretRef rationale: exists because PostgreSQL's libpq rejects pgpass files without 0600 permissions, Kubernetes mounts secrets as root-owned, and the container runs as non-root (UID 65532). The init container pattern cannot be expressed through volumes/volumeMounts alone. See the RFC's PGPass Discussion section.

Large PR Justification

  • is a bit tricky to do all these changes in separate PRs
  • we're also pressed for time

Generated with Claude Code

The MCPRegistry CRD currently mirrors the registry server's config.yaml
structure field-by-field, coupling the operator to registry server config
changes. PR #4653 demonstrated this cost: 2,993+/2,458- lines to align
the CRD with a v2 config format.

Add a new decoupled config path alongside the existing typed fields:

- configYAML: raw YAML string passed through to the registry server as
  config.yaml, with no operator parsing or transformation
- volumes/volumeMounts: standard corev1 types for user-managed volume
  wiring (secrets, ConfigMaps, PVCs)
- pgpassSecretRef: operator-managed pgpass mount with init container
  for chmod 0600 (the one Kubernetes plumbing detail users cannot
  express through volumes alone)

The two paths are mutually exclusive (CEL + reconciler validation).
The existing typed fields are deprecated but fully functional, allowing
gradual migration. A future release will remove the legacy path.

Includes RFC document, validation with reserved name/mount path
collision detection, 27 new unit tests, and 6 example manifests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 9, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 54.13793% with 133 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.61%. Comparing base (3c5da31) to head (691e8b3).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/registryapi/deployment.go 9.72% 64 Missing and 1 partial ⚠️
cmd/thv-operator/pkg/registryapi/manager.go 6.25% 44 Missing and 1 partial ⚠️
...thv-operator/controllers/mcpregistry_controller.go 82.41% 12 Missing and 4 partials ⚠️
cmd/thv-operator/pkg/registryapi/service.go 22.22% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4693      +/-   ##
==========================================
- Coverage   68.66%   68.61%   -0.06%     
==========================================
  Files         509      516       +7     
  Lines       52987    53841     +854     
==========================================
+ Hits        36384    36942     +558     
- Misses      13782    14051     +269     
- Partials     2821     2848      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChrisJBurns and others added 2 commits April 9, 2026 01:16
Change Volumes and VolumeMounts fields from typed []corev1.Volume /
[]corev1.VolumeMount to []apiextensionsv1.JSON with
x-kubernetes-preserve-unknown-fields. This avoids expanding the full
Kubernetes volume OpenAPI schema into the CRD manifest, reducing the
CRD diff by ~3,900 lines.

Add ParseVolumes() and ParseVolumeMounts() helper methods on
MCPRegistrySpec that deserialize the raw JSON into typed objects at
reconciliation time. Validation of volume structure happens at pod
creation time rather than CRD admission time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RFC is a design document used to guide implementation, not a
permanent repo artifact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
Fix correctness bugs, validation gaps, code duplication, and test
coverage identified during the review of the decoupled configYAML path.

Correctness:
- Config hash now covers full spec (not just ConfigYAML), so changes
  to volumes/volumeMounts/pgpassSecretRef trigger pod rollouts
- Parse errors for volumes/volumeMounts are returned as errors instead
  of being silently swallowed in the deployment builder

Validation:
- CEL mutual exclusivity rule now covers all legacy fields (registries,
  databaseConfig, authConfig, telemetryConfig), not just sources
- TelemetryConfig added to reconciler-side validatePathExclusivity
- MaxLength=131072 on configYAML, MaxItems=50 on volumes/volumeMounts

Duplication:
- Extract upsertDeployment to share create-or-update logic between
  ensureDeployment and ensureDeploymentNewPath
- Extract withPGPassMountFromVolume to share init container logic
  between WithPGPassMount and WithPGPassSecretRefMount

Tests:
- Add ParseVolumes/ParseVolumeMounts unit tests (8 cases)
- Add PodTemplateSpec reserved name/mount path collision tests (3 cases)
- Extract pgpassEnvVar constant to fix goconst lint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
Remove the overly complex mutual exclusivity CEL rule — the reconciler
handles this as defense-in-depth, and the registry server validates its
own config. Keep only the essential admission-time rules:

- Either configYAML or sources must be set (operator path selection)
- At most one managed source (legacy path constraint)

Guard all CEL rules with has() checks so they handle absent optional
fields (e.g., sources is absent when using the new configYAML path).

Verified end-to-end with Kind cluster + CNPG PostgreSQL + registry
server v1.0.0: MCPRegistry accepted, ConfigMap created with raw YAML,
registry-api pod running and serving /health, /readiness, /v1/registries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
WithPGPassSecretRefMount silently no-ops when name or key is empty,
which would produce a deployment without pgpass mounting and a confusing
database connection error at runtime. Catch this early in reconciler
validation with clear error messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
These constraints couple the operator to opinions about size limits
that Kubernetes and etcd already enforce naturally (1MB ConfigMap limit,
~1.5MB etcd object limit). Removing them keeps the decoupled path
truly pass-through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot removed the size/XL Extra large PR: 1000+ lines changed label Apr 9, 2026
@ChrisJBurns ChrisJBurns marked this pull request as ready for review April 9, 2026 13:13
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 9, 2026
@github-actions github-actions bot removed the size/XL Extra large PR: 1000+ lines changed label Apr 9, 2026
@github-actions github-actions bot dismissed their stale review April 9, 2026 13:16

Large PR justification has been provided. Thank you!

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

Split pgpassSecretRef name/key validation into its own function to
bring validatePathExclusivity below the cyclomatic complexity threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
Add nolint:staticcheck directives to legacy code that intentionally
references deprecated fields (config.go, pgpass_test.go). Move inline
nolint comments above the line to stay within the 130-char line limit.

Fix unparam lint on ensureService by removing the unused *corev1.Service
return value — both callers discarded it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 9, 2026
@ChrisJBurns ChrisJBurns merged commit ed7e41b into main Apr 9, 2026
41 checks passed
@ChrisJBurns ChrisJBurns deleted the worktree-registry-crd-refactor branch April 9, 2026 14:43
ChrisJBurns added a commit that referenced this pull request Apr 9, 2026
Remove all legacy typed config fields, config generation code, and
associated tests after the decoupled configYAML path was added in
#4693. The configYAML field is now the only way to configure the
registry server.

CRD changes:
- Remove Sources, Registries, DatabaseConfig, AuthConfig,
  TelemetryConfig fields and all 25+ associated types
- Make configYAML required
- Remove CEL validation rules (configYAML is required via markers)
- Delete legacy helper methods (HasDatabaseConfig, GetDatabaseConfig,
  GetDatabasePort, BuildPGPassSecretName)

Operator code removal:
- Delete config/config.go legacy ConfigManager and all build* functions
  (~1,100 lines)
- Delete config/config_test.go (~2,170 lines)
- Delete pgpass.go and pgpass_test.go (~420 lines)
- Delete WithRegistrySourceMounts, WithGitAuthMount, WithPGPassMount,
  WithRegistryStorageMount from podtemplatespec.go
- Delete reconcileLegacyPath, ensurePGPassSecret from manager.go
- Delete legacy buildRegistryAPIDeployment and ensureDeployment
- Rename NewPath variants to be the primary functions
- Simplify controller validation (remove mutual exclusivity checks)

Tests and examples:
- Rewrite integration test RegistryBuilder to generate configYAML
- Delete all legacy unit tests
- Delete 14 legacy example YAML files
- Regenerate CRD manifests and API docs

Closes #4704

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChrisJBurns added a commit that referenced this pull request Apr 9, 2026
Remove all legacy typed config fields, config generation code, and
associated tests after the decoupled configYAML path was added in
#4693. The configYAML field is now the only way to configure the
registry server.

CRD changes:
- Remove Sources, Registries, DatabaseConfig, AuthConfig,
  TelemetryConfig fields and all 25+ associated types
- Make configYAML required
- Remove CEL validation rules (configYAML is required via markers)
- Delete legacy helper methods (HasDatabaseConfig, GetDatabaseConfig,
  GetDatabasePort, BuildPGPassSecretName)

Operator code removal:
- Delete config/config.go legacy ConfigManager and all build* functions
  (~1,100 lines)
- Delete config/config_test.go (~2,170 lines)
- Delete pgpass.go and pgpass_test.go (~420 lines)
- Delete WithRegistrySourceMounts, WithGitAuthMount, WithPGPassMount,
  WithRegistryStorageMount from podtemplatespec.go
- Delete reconcileLegacyPath, ensurePGPassSecret from manager.go
- Delete legacy buildRegistryAPIDeployment and ensureDeployment
- Rename NewPath variants to be the primary functions
- Simplify controller validation (remove mutual exclusivity checks)

Tests and examples:
- Rewrite integration test RegistryBuilder to generate configYAML
- Delete all legacy unit tests
- Delete 14 legacy example YAML files
- Regenerate CRD manifests and API docs

Closes #4704

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add decoupled configYAML path to MCPRegistry CRD

2 participants