Skip to content

refactor(seinode): drop sidecar TLS management; keep kube-rbac-proxy#267

Merged
bdchatham merged 3 commits into
mainfrom
refactor/seinode-drop-sidecar-tls
May 18, 2026
Merged

refactor(seinode): drop sidecar TLS management; keep kube-rbac-proxy#267
bdchatham merged 3 commits into
mainfrom
refactor/seinode-drop-sidecar-tls

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented May 18, 2026

Summary

  • Removes the controller's entire sidecar-TLS-management surface. The sidecar API runs on the private K8s cluster network; TLS termination doesn't earn its keep at this layer.
  • kube-rbac-proxy stays in front of the sidecar (always-on per SeiNode) for its SAR-based authz value, now in --insecure-listen-address HTTP mode.
  • Net diff: −1538 lines across 27 files after fixups.

What disappears

  • spec.sidecar.tls field + SidecarTLSSpec type + CEL immutability rule
  • Status.SidecarTLS field + SidecarTLSStatus type + ConditionSidecarTLSSecretReady + reason constants
  • internal/controller/node/preflight_sidecar_tls.go (reconcileSidecarTLSReady, validateTLSSecret, requiredDNSNames, missingSANs) + its test file
  • emitSidecarTLSSecretEvent, TLS-gated periodic requeue in the reconciler
  • Planner-side sidecarTLSSecretReady gate; init plans always emit ApplyRBACProxyConfig (no longer conditional)
  • SidecarTLSEnabled, SidecarTLSSecretName, sidecarTLSVolumes helpers, sidecarTLSVolumeName/sidecarTLSMountPath constants
  • internal/sidecartransport/tls.go (insecureTLSClientConfig)
  • LLD docs/design-seinode-sidecar-tls-toggle-lld.md

What stays

  • kube-rbac-proxy native sidecar container — now on every SeiNode pod (including bootstrap pods)
  • ApplyRBACProxyConfig task generates the SAR-policy ConfigMap (group=sei.io, resource=seinodetasks, namespace+name scoped)
  • sidecartransport bearer-token injection — kube-rbac-proxy still needs the controller SA token for TokenReview
  • All other validator-keyring/signing-key/node-key surface unchanged

What changed in the pod model

  • Proxy listens on :8443 plaintext (--insecure-listen-address); SAR authz unchanged
  • Sidecar binds loopback unconditionally (SEI_SIDECAR_AUTHN_MODE=trusted-header)
  • Sidecar liveness/readiness probes are nil — proxy readiness gates pod readiness via upstream forwarding
  • Bootstrap pod also gets the proxy (extends the uniform "every sidecar fronted by SAR authz" invariant)
  • SidecarURLForNode always returns http://...:8443
  • cmd/main.go newSidecarClient always attaches the bearer-token transport (no more spec-derived branching)

Cross-review

Trio (platform / kubernetes / security) ran against the implementation commit. Two commits address findings:

  • 2076a80 — initial refactor (−1576 lines)
  • c51075b — round-1 fixups (+38/−16)

Blockers addressed:

  • K8s: bootstrap-pod sidecar was reachable on :7777 while SidecarURLForNode had moved to :8443 → Phase-2 sidecar tasks would hang for snapshot-bootstrap nodes. Fixed by extending the proxy to the bootstrap pod uniformly + moving ApplyRBACProxyConfig to Phase 0 so the ConfigMap exists when the bootstrap Job pod schedules.
  • Platform: platform.Validate() didn't require SEI_KUBE_RBAC_PROXY_IMAGE. Misconfigured deployments would fail at first reconcile instead of manager startup. Added to the required env map.

Non-blockers applied:

  • Dropped dead desired == nil branch in apply_rbac_proxy_config.go (GenerateRBACProxyConfigMap no longer returns nil)
  • sidecartransport: clone http.DefaultTransport for process-isolation hygiene
  • Cleaned up bootstrap-pod test assertions for the new init-container set

Deferred (tracked):

Operator upgrade note

Per the ~10-node internal-only fleet posture: any SeiNode with spec.sidecar.tls set today should have that field removed before the controller image bumps. Apiserver will reject writes that include the removed field once the new CRD applies.

kubectl get seinodes -A -o json | jq -r '.items[] | select(.spec.sidecar.tls != null) | "\(.metadata.namespace) \(.metadata.name)"' \
  | xargs -L1 sh -c 'kubectl -n $0 patch seinode $1 --type=json -p='\''[{"op":"remove","path":"/spec/sidecar/tls"}]'\'''

(Per project memory, no production SeiNode has TLS in spec yet, so this is most likely a no-op.)

Test plan

  • make test — full suite green
  • golangci-lint run --new-from-rev=origin/main ./... — 0 issues
  • make manifests generate — CRD diff is a clean removal of the TLS fields
  • Unit: always-on proxy invariants (sidecar_proxy_test.go in noderesource + planner)
  • Unit: plan composition includes ApplyRBACProxyConfig before ApplyStatefulSet in all four call sites
  • Unit: pod composition (3 init containers, 8 service ports, probes on the proxy)
  • End-to-end on harbor: create a SeiNode, verify pod boots cleanly with the proxy in HTTP-only mode and the controller's MarkReady succeeds via bearer-token authn

🤖 Generated with Claude Code

bdchatham and others added 2 commits May 18, 2026 10:31
The sidecar API runs on the private K8s cluster network. TLS termination
doesn't earn its keep at this layer; kube-rbac-proxy's SAR-based authz
does. This refactor removes the controller's entire TLS-management
surface and switches the proxy to --insecure-listen-address (HTTP) with
SAR authz unchanged.

API:
- Delete SidecarTLSSpec, SidecarTLSStatus, Status.SidecarTLS field
- Delete ConditionSidecarTLSSecretReady + ReasonTLSSecret* constants
- Delete CEL immutability rule on spec.sidecar.tls

Controller:
- Delete preflight_sidecar_tls.go + tests (validateTLSSecret,
  reconcileSidecarTLSReady, requiredDNSNames, missingSANs)
- Drop emitSidecarTLSSecretEvent and the periodic TLS-gated requeue
- SeiNodeReconciler no longer reads or writes TLS state

Planner:
- Drop sidecarTLSSecretReady gate + helper
- buildBasePlan / buildBootstrapPlan / buildGenesisPlan always emit
  ApplyRBACProxyConfig (no conditional)

Noderesource:
- Delete SidecarTLSEnabled, SidecarTLSSecretName, sidecarTLSVolumes
- Delete sidecarTLSVolumeName, sidecarTLSMountPath constants
- kube-rbac-proxy container is always present; switches to
  --insecure-listen-address=0.0.0.0:8443; drops --tls-cert-file,
  --tls-private-key-file, --tls-reload-interval, and the TLS volume mount
- Probes on the proxy use HTTP (no Scheme=HTTPS)
- Sidecar binds loopback always (SEI_SIDECAR_AUTHN_MODE=trusted-header
  unconditionally); sidecar liveness/readiness probes are nil — proxy
  probes gate pod readiness via upstream forwarding
- Service always publishes :8443
- SidecarURLForNode always returns http://...:8443

sidecartransport:
- Drop tls.go (insecureTLSClientConfig); base RoundTripper is now
  http.DefaultTransport. Bearer-token injection unchanged (kube-rbac-
  proxy still needs the SA token for TokenReview).

Tests:
- Replace sidecar_tls_test.go (noderesource + planner) with focused
  sidecar_proxy_test.go covering always-on proxy invariants
- Update assertions across noderesource_test.go +
  plan_execution_test.go + reconciler_test.go for the new pod
  composition (3 init containers, 8 service ports, probes on the
  proxy not on seid)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
K8s blocker: bootstrap-pod sidecar reachability. SidecarURLForNode now
always returns :8443 (proxy port), but the bootstrap pod only listened
on :7777 → Phase-2 sidecar tasks (snapshot-restore, configure-genesis,
config-apply, mark-ready) would hang against a non-existent listener
for any node with a snapshot bootstrap source. Fix by extending the
"every sidecar fronted by kube-rbac-proxy" invariant to the bootstrap
pod too:
- Bootstrap pod gains a kube-rbac-proxy native sidecar container
- Bootstrap Service exposes RBACProxyPort (:8443)
- Bootstrap sidecar gets SEI_SIDECAR_AUTHN_MODE=trusted-header
- RBAC proxy ConfigMap is mounted via the same rbac-proxy-config
  volume; bootstrap pod no longer advertises its sidecar port directly
- ApplyRBACProxyConfig moves to Phase 0 of buildBootstrapPlan so the
  ConfigMap exists before the bootstrap Job pod schedules

Platform blocker: platform.Validate() did not require
SEI_KUBE_RBAC_PROXY_IMAGE. Misconfigured deployments started cleanly
and failed at first reconcile inside GenerateStatefulSet rather than
at manager startup. Added to the required env map.

Non-blockers:
- Drop dead `desired == nil` branch in apply_rbac_proxy_config.go;
  GenerateRBACProxyConfigMap no longer returns nil.
- sidecartransport: clone http.DefaultTransport so process-global
  mutations don't leak into the controller's HTTP path.

Deferred (tracked):
- NetworkPolicy restricting :8443 ingress to the controller SA — the
  highest-ROI defense-in-depth the refactor leaves on the table.
- Audit /v0/metrics content given it is now unauthenticated cluster-wide
  via --ignore-paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented May 18, 2026

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

Comment thread internal/noderesource/noderesource.go Outdated
Comment thread internal/noderesource/noderesource.go Outdated
Both sidecarLivenessProbe and sidecarReadinessProbe returned nil
unconditionally. Inline the nils at the single call site instead of
keeping vestigial builders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented May 18, 2026

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

@bdchatham bdchatham merged commit b2b4742 into main May 18, 2026
2 checks passed
bdchatham added a commit that referenced this pull request May 18, 2026
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant