refactor(seinode): drop sidecar TLS management; keep kube-rbac-proxy#267
Merged
Conversation
The sidecar API runs on the private K8s cluster network. TLS termination doesn't earn its keep at this layer; kube-rbac-proxy's SAR-based authz does. This refactor removes the controller's entire TLS-management surface and switches the proxy to --insecure-listen-address (HTTP) with SAR authz unchanged. API: - Delete SidecarTLSSpec, SidecarTLSStatus, Status.SidecarTLS field - Delete ConditionSidecarTLSSecretReady + ReasonTLSSecret* constants - Delete CEL immutability rule on spec.sidecar.tls Controller: - Delete preflight_sidecar_tls.go + tests (validateTLSSecret, reconcileSidecarTLSReady, requiredDNSNames, missingSANs) - Drop emitSidecarTLSSecretEvent and the periodic TLS-gated requeue - SeiNodeReconciler no longer reads or writes TLS state Planner: - Drop sidecarTLSSecretReady gate + helper - buildBasePlan / buildBootstrapPlan / buildGenesisPlan always emit ApplyRBACProxyConfig (no conditional) Noderesource: - Delete SidecarTLSEnabled, SidecarTLSSecretName, sidecarTLSVolumes - Delete sidecarTLSVolumeName, sidecarTLSMountPath constants - kube-rbac-proxy container is always present; switches to --insecure-listen-address=0.0.0.0:8443; drops --tls-cert-file, --tls-private-key-file, --tls-reload-interval, and the TLS volume mount - Probes on the proxy use HTTP (no Scheme=HTTPS) - Sidecar binds loopback always (SEI_SIDECAR_AUTHN_MODE=trusted-header unconditionally); sidecar liveness/readiness probes are nil — proxy probes gate pod readiness via upstream forwarding - Service always publishes :8443 - SidecarURLForNode always returns http://...:8443 sidecartransport: - Drop tls.go (insecureTLSClientConfig); base RoundTripper is now http.DefaultTransport. Bearer-token injection unchanged (kube-rbac- proxy still needs the SA token for TokenReview). Tests: - Replace sidecar_tls_test.go (noderesource + planner) with focused sidecar_proxy_test.go covering always-on proxy invariants - Update assertions across noderesource_test.go + plan_execution_test.go + reconciler_test.go for the new pod composition (3 init containers, 8 service ports, probes on the proxy not on seid) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
K8s blocker: bootstrap-pod sidecar reachability. SidecarURLForNode now always returns :8443 (proxy port), but the bootstrap pod only listened on :7777 → Phase-2 sidecar tasks (snapshot-restore, configure-genesis, config-apply, mark-ready) would hang against a non-existent listener for any node with a snapshot bootstrap source. Fix by extending the "every sidecar fronted by kube-rbac-proxy" invariant to the bootstrap pod too: - Bootstrap pod gains a kube-rbac-proxy native sidecar container - Bootstrap Service exposes RBACProxyPort (:8443) - Bootstrap sidecar gets SEI_SIDECAR_AUTHN_MODE=trusted-header - RBAC proxy ConfigMap is mounted via the same rbac-proxy-config volume; bootstrap pod no longer advertises its sidecar port directly - ApplyRBACProxyConfig moves to Phase 0 of buildBootstrapPlan so the ConfigMap exists before the bootstrap Job pod schedules Platform blocker: platform.Validate() did not require SEI_KUBE_RBAC_PROXY_IMAGE. Misconfigured deployments started cleanly and failed at first reconcile inside GenerateStatefulSet rather than at manager startup. Added to the required env map. Non-blockers: - Drop dead `desired == nil` branch in apply_rbac_proxy_config.go; GenerateRBACProxyConfigMap no longer returns nil. - sidecartransport: clone http.DefaultTransport so process-global mutations don't leak into the controller's HTTP path. Deferred (tracked): - NetworkPolicy restricting :8443 ingress to the controller SA — the highest-ROI defense-in-depth the refactor leaves on the table. - Audit /v0/metrics content given it is now unauthenticated cluster-wide via --ignore-paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
bdchatham
commented
May 18, 2026
Both sidecarLivenessProbe and sidecarReadinessProbe returned nil unconditionally. Inline the nils at the single call site instead of keeping vestigial builders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
This was referenced May 18, 2026
bdchatham
added a commit
that referenced
this pull request
May 18, 2026
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--insecure-listen-addressHTTP mode.What disappears
spec.sidecar.tlsfield +SidecarTLSSpectype + CEL immutability ruleStatus.SidecarTLSfield +SidecarTLSStatustype +ConditionSidecarTLSSecretReady+ reason constantsinternal/controller/node/preflight_sidecar_tls.go(reconcileSidecarTLSReady,validateTLSSecret,requiredDNSNames,missingSANs) + its test fileemitSidecarTLSSecretEvent, TLS-gated periodic requeue in the reconcilersidecarTLSSecretReadygate; init plans always emitApplyRBACProxyConfig(no longer conditional)SidecarTLSEnabled,SidecarTLSSecretName,sidecarTLSVolumeshelpers,sidecarTLSVolumeName/sidecarTLSMountPathconstantsinternal/sidecartransport/tls.go(insecureTLSClientConfig)docs/design-seinode-sidecar-tls-toggle-lld.mdWhat stays
kube-rbac-proxynative sidecar container — now on every SeiNode pod (including bootstrap pods)ApplyRBACProxyConfigtask generates the SAR-policy ConfigMap (group=sei.io, resource=seinodetasks, namespace+name scoped)sidecartransportbearer-token injection — kube-rbac-proxy still needs the controller SA token for TokenReviewWhat changed in the pod model
:8443plaintext (--insecure-listen-address); SAR authz unchangedSEI_SIDECAR_AUTHN_MODE=trusted-header)SidecarURLForNodealways returnshttp://...:8443cmd/main.go newSidecarClientalways attaches the bearer-token transport (no more spec-derived branching)Cross-review
Trio (platform / kubernetes / security) ran against the implementation commit. Two commits address findings:
2076a80— initial refactor (−1576 lines)c51075b— round-1 fixups (+38/−16)Blockers addressed:
:7777whileSidecarURLForNodehad moved to:8443→ Phase-2 sidecar tasks would hang for snapshot-bootstrap nodes. Fixed by extending the proxy to the bootstrap pod uniformly + movingApplyRBACProxyConfigto Phase 0 so the ConfigMap exists when the bootstrap Job pod schedules.platform.Validate()didn't requireSEI_KUBE_RBAC_PROXY_IMAGE. Misconfigured deployments would fail at first reconcile instead of manager startup. Added to the required env map.Non-blockers applied:
desired == nilbranch inapply_rbac_proxy_config.go(GenerateRBACProxyConfigMapno longer returns nil)sidecartransport: clonehttp.DefaultTransportfor process-isolation hygieneDeferred (tracked):
:8443ingress to the controller SA (highest-ROI defense-in-depth)/v0/metricscontent; now reachable unauthenticated cluster-wide via--ignore-pathsOperator upgrade note
Per the ~10-node internal-only fleet posture: any SeiNode with
spec.sidecar.tlsset today should have that field removed before the controller image bumps. Apiserver will reject writes that include the removed field once the new CRD applies.(Per project memory, no production SeiNode has TLS in spec yet, so this is most likely a no-op.)
Test plan
make test— full suite greengolangci-lint run --new-from-rev=origin/main ./...— 0 issuesmake manifests generate— CRD diff is a clean removal of the TLS fieldssidecar_proxy_test.goin noderesource + planner)ApplyRBACProxyConfigbeforeApplyStatefulSetin all four call sites🤖 Generated with Claude Code