diff --git a/.agents/skills/helm-dev-environment/SKILL.md b/.agents/skills/helm-dev-environment/SKILL.md new file mode 100644 index 000000000..986ce1490 --- /dev/null +++ b/.agents/skills/helm-dev-environment/SKILL.md @@ -0,0 +1,200 @@ +--- +name: helm-dev-environment +description: Start up, tear down, and configure the local Kubernetes development environment for OpenShell. Uses k3d (Docker-backed k3s) + Skaffold + Helm. Covers cluster lifecycle, optional add-ons (Keycloak OIDC, Envoy Gateway), and port mappings. Trigger keywords - local k8s, local cluster, k3d, skaffold, helm dev, start cluster, stop cluster, tear down cluster, delete cluster, create cluster, helm:k3s, helm:skaffold, local dev environment, dev cluster, k8s dev, envoy gateway local, keycloak local. +--- + +# Helm Dev Environment + +Set up, run, and tear down the local Kubernetes development environment for OpenShell. +The stack is: **k3d** (Docker-backed k3s) for the cluster, **Skaffold** for image builds and Helm deploys, and the **OpenShell Helm chart** (`deploy/helm/openshell/`). + +--- + +## Prerequisites + +- Docker Desktop (macOS) or Docker Engine (Linux) running +- `mise install` completed (provides `k3d`, `kubectl`, `skaffold`, `helm`) + +--- + +## Startup + +### 1. Create the cluster + +```bash +mise run helm:k3s:create +``` + +Creates a k3d cluster and merges its kubeconfig into the worktree-local `kubeconfig` file. +Also applies base manifests (`deploy/kube/manifests/agent-sandbox.yaml`). Traefik is +disabled at cluster creation time. + +**Multi-worktree support:** the cluster name is derived from the last component of the +current git branch (e.g. branch `kube-support/local-dev/tmutch` → cluster +`openshell-dev-tmutch`). Each worktree therefore gets its own isolated cluster and its +own `kubeconfig` file. Override with `HELM_K3S_CLUSTER_NAME` to force a specific name +or share one cluster across worktrees. + +Port mappings created at cluster time (cannot be changed without recreating): + +| Host port | Target | Used by | +|-----------|--------|---------| +| `8080` | Port `80` via k3d load balancer | Envoy Gateway LoadBalancer service (`values-gateway.yaml`) | + +Override with env vars before running `helm:k3s:create`: +- `HELM_K3S_LB_HOST_PORT` (default: `8080`) + +### 2. Deploy OpenShell + +**Iterative dev** (rebuilds on file changes, recommended during active development): +```bash +mise run helm:skaffold:dev +``` + +**One-shot deploy** (build once and leave running): +```bash +mise run helm:skaffold:run +``` + +Both commands build the `gateway` and `supervisor` images and deploy the OpenShell Helm +chart. The `pkiInitJob` hook runs on first install to generate mTLS secrets. Envoy Gateway opt-in; see the Optional Add-ons section below. + +The gateway Service uses ClusterIP. Access is via Envoy Gateway (port `8080`) or `kubectl port-forward`. + +### TLS behaviour + +`values-skaffold.yaml` sets `server.disableTls: true`, so Skaffold-based deploys run +plaintext by default. To test with TLS enabled, comment out that line and redeploy. + +| Mode | `server.disableTls` | Gateway scheme | +|------|---------------------|----------------| +| Skaffold dev (default) | `true` | `http://` | +| TLS enabled | `false` (or omitted) | `https://` | + +### Connecting via port-forward + +Port `8080` is already bound by the k3d load balancer when Envoy Gateway is active, so +the port-forward uses local port `8090` to avoid a collision: + +```bash +KUBECONFIG=kubeconfig kubectl port-forward -n openshell svc/openshell 8090:8080 +``` + +**Plaintext (default Skaffold deploy):** + +```bash +openshell sandbox list --gateway-endpoint http://localhost:8090 +``` + +**With mTLS enabled** — extract the client cert the PKI hook wrote to the cluster, +then place it where the CLI expects it. Run once after each fresh install: + +```bash +mkdir -p ~/.config/openshell/gateways/openshell/mtls +KUBECONFIG=kubeconfig kubectl get secret openshell-client-tls -n openshell \ + -o jsonpath='{.data.ca\.crt}' | base64 -d > ~/.config/openshell/gateways/openshell/mtls/ca.crt +KUBECONFIG=kubeconfig kubectl get secret openshell-client-tls -n openshell \ + -o jsonpath='{.data.tls\.crt}' | base64 -d > ~/.config/openshell/gateways/openshell/mtls/tls.crt +KUBECONFIG=kubeconfig kubectl get secret openshell-client-tls -n openshell \ + -o jsonpath='{.data.tls\.key}' | base64 -d > ~/.config/openshell/gateways/openshell/mtls/tls.key +``` + +The server cert SANs include `localhost` and `127.0.0.1`, so hostname verification +passes over a port-forward without any extra flags: + +```bash +openshell sandbox list --gateway-endpoint https://localhost:8090 +``` + +--- + +## Teardown + +### Remove the Helm releases (keep cluster) + +```bash +mise run helm:skaffold:delete +``` + +### Delete the cluster entirely + +```bash +mise run helm:k3s:delete +``` + +This removes the k3d cluster and all resources. Kubeconfig context is left behind +but will point to a deleted cluster — safe to ignore or clean up manually. + +--- + +## Optional Add-ons + +Each add-on requires uncommenting the corresponding `valuesFiles` entry in +`deploy/helm/openshell/skaffold.yaml` before running `helm:skaffold:dev` or `helm:skaffold:run`. + +### Envoy Gateway (Gateway API / GRPCRoute) + +Envoy Gateway is already installed by Skaffold (the `envoy-gateway` Helm release in +`skaffold.yaml`). To activate routing: + +1. Uncomment `#- values-gateway.yaml` in `skaffold.yaml` +2. Redeploy: `mise run helm:skaffold:run` +3. Apply the GatewayClass: `mise run helm:gateway:apply` +4. Access: `http://127.0.0.1:8080` + +`values-gateway.yaml` creates a `Gateway` (listener on port 80, class `eg`) and a +`GRPCRoute` in the `openshell` namespace. Envoy Gateway provisions a LoadBalancer +service for the proxy; klipper-lb binds it to hostPort 80, reachable via the +`8080:80` load balancer port mapping. + +### Keycloak OIDC + +One-time setup — only needed once per cluster lifetime: + +```bash +mise run keycloak:k8s:setup +``` + +This deploys Keycloak (`quay.io/keycloak/keycloak:24.0`) into the `keycloak` namespace, +imports the openshell realm from `scripts/keycloak-realm.json`, and prints a port-forward +command for acquiring tokens from the CLI. + +Then activate OIDC in the OpenShell Helm chart: +1. Uncomment `#- values-keycloak.yaml` in `skaffold.yaml` +2. Redeploy: `mise run helm:skaffold:run` + +To remove Keycloak: +```bash +mise run keycloak:k8s:teardown +``` + +--- + +## Cluster Lifecycle (suspend/resume) + +Stop the cluster without losing state (faster than delete/recreate): +```bash +mise run helm:k3s:stop +mise run helm:k3s:start +``` + +Check cluster status: +```bash +mise run helm:k3s:status +``` + +--- + +## Key Files + +| Path | Purpose | +|------|---------| +| `deploy/helm/openshell/skaffold.yaml` | Skaffold config — images, Helm releases, values overlays | +| `deploy/helm/openshell/values.yaml` | Default Helm values | +| `deploy/helm/openshell/values-skaffold.yaml` | Dev overrides (image pull policy, local image names) | +| `deploy/helm/openshell/values-cert-manager.yaml` | cert-manager TLS overlay (opt-in; disables pkiInitJob) | +| `deploy/helm/openshell/values-gateway.yaml` | Envoy Gateway GRPCRoute + Gateway overlay | +| `deploy/helm/openshell/values-keycloak.yaml` | Keycloak OIDC overlay | +| `deploy/kube/manifests/envoy-gateway-openshell.yaml` | GatewayClass for Envoy Gateway (`mise run helm:gateway:apply`) | +| `tasks/scripts/helm-k3s-local.sh` | k3d cluster create/delete/start/stop/status | +| `tasks/scripts/keycloak-k8s-setup.sh` | Keycloak deploy + realm import | diff --git a/deploy/docker/Dockerfile.images b/deploy/docker/Dockerfile.images index 16fe08ecb..359789ffd 100644 --- a/deploy/docker/Dockerfile.images +++ b/deploy/docker/Dockerfile.images @@ -13,6 +13,10 @@ # # Rust binaries are built natively before the image build and staged at: # deploy/docker/.build/prebuilt-binaries//openshell-{gateway,sandbox} +# +# For local dev (Skaffold), pass --build-arg BUILD_FROM_SOURCE=1 to compile +# binaries inside Docker instead. BuildKit only executes the selected binary +# staging stage, so missing prebuilt files do not cause a build failure. # Pin by tag AND manifest-list digest to prevent silent upstream republishes # from breaking the build. Update both when bumping k3s versions. @@ -22,22 +26,67 @@ ARG K3S_DIGEST=sha256:4607083d3cac07e1ccde7317297271d13ed5f60f35a78f33fcef84858a ARG K9S_VERSION=v0.50.18 ARG HELM_VERSION=v3.17.3 ARG NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.2-1 +# Controls binary source: 0 = prebuilt (release), 1 = compile in Docker (local dev). +# Must be declared here (global scope) so it can be used in FROM instructions below. +ARG BUILD_FROM_SOURCE=0 + +# --------------------------------------------------------------------------- +# Optional in-Docker Rust build (BUILD_FROM_SOURCE=1, local dev only) +# --------------------------------------------------------------------------- +FROM rust:1.95.0-slim-bookworm AS rust-builder + +RUN apt-get update && apt-get install -y --no-install-recommends \ + build-essential \ + cmake \ + pkg-config \ + libssl-dev \ + ca-certificates \ + && rm -rf /var/lib/apt/lists/* + +WORKDIR /build + +COPY Cargo.toml Cargo.lock ./ +COPY crates/ crates/ +COPY proto/ proto/ + +RUN --mount=type=cache,target=/usr/local/cargo/registry \ + --mount=type=cache,target=/build/target \ + cargo build --release \ + --features "openshell-core/dev-settings" \ + --bin openshell-gateway \ + --bin openshell-sandbox \ + && mkdir -p /build/out \ + && install -m 0755 target/release/openshell-gateway /build/out/openshell-gateway \ + && install -m 0755 target/release/openshell-sandbox /build/out/openshell-sandbox # --------------------------------------------------------------------------- # Per-arch binary stages # --------------------------------------------------------------------------- -FROM scratch AS gateway-binary + +# Prebuilt path (release default, BUILD_FROM_SOURCE=0) +FROM scratch AS gateway-binary-0 ARG TARGETARCH # --chmod=755 preserves the executable bit through actions/upload-artifact + # download-artifact, which strip exec perms during the roundtrip. COPY --chmod=755 deploy/docker/.build/prebuilt-binaries/${TARGETARCH}/openshell-gateway /build/out/openshell-gateway -FROM scratch AS supervisor-binary +# Source-built path (local dev, BUILD_FROM_SOURCE=1) +FROM rust-builder AS gateway-binary-1 + +FROM gateway-binary-${BUILD_FROM_SOURCE} AS gateway-binary + +# Prebuilt path (release default, BUILD_FROM_SOURCE=0) +FROM scratch AS supervisor-binary-0 ARG TARGETARCH # --chmod=755 preserves the executable bit through actions/upload-artifact + # download-artifact, which strip exec perms during the roundtrip. COPY --chmod=755 deploy/docker/.build/prebuilt-binaries/${TARGETARCH}/openshell-sandbox /build/out/openshell-sandbox +# Source-built path (local dev, BUILD_FROM_SOURCE=1) +FROM rust-builder AS supervisor-binary-1 + +FROM supervisor-binary-${BUILD_FROM_SOURCE} AS supervisor-binary + # Minimal extraction stage for fast-deploy: exports only the supervisor # binary (~20-40 MB) instead of the entire build environment (~968 MB). FROM scratch AS supervisor-output diff --git a/deploy/helm/openshell/.helmignore b/deploy/helm/openshell/.helmignore index 414bb6e8a..798d0e7c8 100644 --- a/deploy/helm/openshell/.helmignore +++ b/deploy/helm/openshell/.helmignore @@ -16,3 +16,11 @@ .idea/ *.tmproj .vscode/ + +# Ignore development files +skaffold.yaml +values-keycloak.yaml +values-ingress.yaml +values-gateway.yaml +values-cert-manager.yaml +values-skaffold.yaml diff --git a/deploy/helm/openshell/Chart.yaml b/deploy/helm/openshell/Chart.yaml index f97e316cf..b66bb6d01 100644 --- a/deploy/helm/openshell/Chart.yaml +++ b/deploy/helm/openshell/Chart.yaml @@ -5,5 +5,8 @@ apiVersion: v2 name: openshell description: runtime environment for autonomous agents type: application -version: 0.1.0 -appVersion: "0.1.0" +# Updated to the release version by CI. The appVersion doubles as the default +# image tag (image.tag defaults to appVersion when empty), so a released chart +# automatically pulls the matching gateway and supervisor images. +version: 0.0.0 +appVersion: "0.0.0" diff --git a/deploy/helm/openshell/skaffold.yaml b/deploy/helm/openshell/skaffold.yaml new file mode 100644 index 000000000..fe7b96cf2 --- /dev/null +++ b/deploy/helm/openshell/skaffold.yaml @@ -0,0 +1,104 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# Local dev: builds gateway + supervisor images using Dockerfile.images with +# BUILD_FROM_SOURCE=1, which compiles Rust binaries inside Docker without +# requiring pre-staged artifacts. +# +# Run from repo root: +# skaffold dev -f deploy/helm/openshell/skaffold.yaml +# +# See https://skaffold.dev/docs/deployers/helm/ (setValueTemplates, IMAGE_* fields). +apiVersion: skaffold/v4beta14 +kind: Config +metadata: + name: openshell +build: + local: + push: false + tagPolicy: + gitCommit: {} + artifacts: + - image: openshell/gateway + context: ../../.. + custom: + buildCommand: | + docker buildx build \ + --build-arg BUILD_FROM_SOURCE=1 \ + --target gateway \ + --tag "$IMAGE" \ + --load \ + --file deploy/docker/Dockerfile.images \ + . + dependencies: + paths: + - Cargo.toml + - Cargo.lock + - crates/** + - proto/** + - deploy/docker/Dockerfile.images + - crates/openshell-server/migrations/** + - image: openshell/supervisor + context: ../../.. + custom: + buildCommand: | + docker buildx build \ + --build-arg BUILD_FROM_SOURCE=1 \ + --target supervisor \ + --tag "$IMAGE" \ + --load \ + --file deploy/docker/Dockerfile.images \ + . + dependencies: + paths: + - Cargo.toml + - Cargo.lock + - crates/** + - proto/** + - deploy/docker/Dockerfile.images +deploy: + helm: + releases: + # cert-manager — comment this in and add values-cert-manager.yaml below + # when you want cert-manager to manage the PKI instead of pkiInitJob. + # Requires cert-manager CRDs to be installed in the cluster first. + #- name: cert-manager + # repo: https://charts.jetstack.io + # remoteChart: cert-manager + # version: v1.20.2 + # namespace: cert-manager + # createNamespace: true + # setValues: + # crds.enabled: true + # Envoy Gateway — Kubernetes Gateway API implementation. + # Installs the Gateway API CRDs and the "eg" GatewayClass. + # Required when grpcRoute.enabled is true in the openshell release. + #- name: envoy-gateway + # remoteChart: oci://docker.io/envoyproxy/gateway-helm + # version: v1.7.2 + # namespace: envoy-gateway-system + # createNamespace: true + # # wait ensures Gateway API CRDs are registered before the openshell + # # release attempts to create Gateway and HTTPRoute resources. + # wait: true + - name: openshell + chartPath: . + namespace: openshell + createNamespace: true + valuesFiles: + - values.yaml + - values-skaffold.yaml + # Add values-cert-manager.yaml here (and uncomment the cert-manager + # release above) to switch from pkiInitJob to cert-manager for PKI. + #- values-cert-manager.yaml + # To enable OIDC with a local Keycloak instance, run the one-time + # setup task first, then uncomment the line below: + # mise run keycloak:k8s:setup + #- values-keycloak.yaml + # To enable the Gateway API HTTPRoute (requires Envoy Gateway above): + #- values-gateway.yaml + setValueTemplates: + image.repository: '{{.IMAGE_REPO_openshell_gateway}}' + image.tag: '{{.IMAGE_TAG_openshell_gateway}}' + supervisor.image.repository: '{{.IMAGE_REPO_openshell_supervisor}}' + supervisor.image.tag: '{{.IMAGE_TAG_openshell_supervisor}}' diff --git a/deploy/helm/openshell/templates/_helpers.tpl b/deploy/helm/openshell/templates/_helpers.tpl index 10fd3de0d..09159340d 100644 --- a/deploy/helm/openshell/templates/_helpers.tpl +++ b/deploy/helm/openshell/templates/_helpers.tpl @@ -58,3 +58,26 @@ Create the name of the service account to use {{- default "default" .Values.serviceAccount.name }} {{- end }} {{- end }} + +{{/* +Gateway image reference. Uses image.tag when set; falls back to .Chart.AppVersion +so a released chart automatically pulls the matching image without extra overrides. +*/}} +{{- define "openshell.image" -}} +{{- printf "%s:%s" .Values.image.repository (.Values.image.tag | default .Chart.AppVersion) }} +{{- end }} + +{{/* +Supervisor image reference. Same appVersion fallback as openshell.image so +the supervisor and gateway images stay in sync across releases. +*/}} +{{- define "openshell.supervisorImage" -}} +{{- printf "%s:%s" .Values.supervisor.image.repository (.Values.supervisor.image.tag | default .Chart.AppVersion) }} +{{- end }} + +{{/* +Namespaced Issuer (selfSigned) for cert-manager CA bootstrap. +*/}} +{{- define "openshell.issuerSelfSigned" -}} +{{- printf "%s-selfsigned" (include "openshell.fullname" .) | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/deploy/helm/openshell/templates/cert-manager-pki.yaml b/deploy/helm/openshell/templates/cert-manager-pki.yaml new file mode 100644 index 000000000..43a19d5f5 --- /dev/null +++ b/deploy/helm/openshell/templates/cert-manager-pki.yaml @@ -0,0 +1,98 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +{{- if .Values.certManager.enabled }} +apiVersion: cert-manager.io/v1 +kind: Issuer +metadata: + name: {{ include "openshell.issuerSelfSigned" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "openshell.labels" . | nindent 4 }} +spec: + selfSigned: {} +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: {{ include "openshell.fullname" . }}-ca + namespace: {{ .Release.Namespace }} + labels: + {{- include "openshell.labels" . | nindent 4 }} +spec: + isCA: true + commonName: openshell-ca + secretName: {{ .Values.certManager.caSecretName | quote }} + privateKey: + algorithm: ECDSA + size: 256 + issuerRef: + name: {{ include "openshell.issuerSelfSigned" . }} + kind: Issuer + group: cert-manager.io +--- +apiVersion: cert-manager.io/v1 +kind: Issuer +metadata: + name: {{ include "openshell.fullname" . }}-ca-issuer + namespace: {{ .Release.Namespace }} + labels: + {{- include "openshell.labels" . | nindent 4 }} +spec: + ca: + secretName: {{ .Values.certManager.caSecretName | quote }} +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: {{ include "openshell.fullname" . }}-server + namespace: {{ .Release.Namespace }} + labels: + {{- include "openshell.labels" . | nindent 4 }} +spec: + secretName: {{ .Values.server.tls.certSecretName | quote }} + duration: {{ .Values.certManager.certificateDuration | quote }} + renewBefore: {{ .Values.certManager.certificateRenewBefore | quote }} + commonName: openshell-server + dnsNames: + {{- toYaml .Values.certManager.serverDnsNames | nindent 4 }} + {{- if .Values.certManager.serverIpAddresses }} + ipAddresses: + {{- toYaml .Values.certManager.serverIpAddresses | nindent 4 }} + {{- end }} + privateKey: + algorithm: ECDSA + size: 256 + usages: + - server auth + - digital signature + - key encipherment + issuerRef: + name: {{ include "openshell.fullname" . }}-ca-issuer + kind: Issuer + group: cert-manager.io +--- +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: {{ include "openshell.fullname" . }}-client + namespace: {{ .Release.Namespace }} + labels: + {{- include "openshell.labels" . | nindent 4 }} +spec: + secretName: {{ .Values.server.tls.clientTlsSecretName | quote }} + duration: {{ .Values.certManager.certificateDuration | quote }} + renewBefore: {{ .Values.certManager.certificateRenewBefore | quote }} + commonName: openshell-client + privateKey: + algorithm: ECDSA + size: 256 + usages: + - client auth + - digital signature + - key encipherment + issuerRef: + name: {{ include "openshell.fullname" . }}-ca-issuer + kind: Issuer + group: cert-manager.io +{{- end }} diff --git a/deploy/helm/openshell/templates/gateway.yaml b/deploy/helm/openshell/templates/gateway.yaml new file mode 100644 index 000000000..3fd81345a --- /dev/null +++ b/deploy/helm/openshell/templates/gateway.yaml @@ -0,0 +1,21 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +{{- if and .Values.grpcRoute.enabled .Values.grpcRoute.gateway.create }} +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: {{ default (include "openshell.fullname" .) .Values.grpcRoute.gateway.name }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "openshell.labels" . | nindent 4 }} +spec: + gatewayClassName: {{ .Values.grpcRoute.gateway.className }} + listeners: + - name: http + port: {{ .Values.grpcRoute.gateway.listener.port }} + protocol: {{ .Values.grpcRoute.gateway.listener.protocol }} + allowedRoutes: + namespaces: + from: {{ .Values.grpcRoute.gateway.listener.allowedRoutes }} +{{- end }} diff --git a/deploy/helm/openshell/templates/grpcroute.yaml b/deploy/helm/openshell/templates/grpcroute.yaml new file mode 100644 index 000000000..8fde5458c --- /dev/null +++ b/deploy/helm/openshell/templates/grpcroute.yaml @@ -0,0 +1,24 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +{{- if .Values.grpcRoute.enabled }} +apiVersion: gateway.networking.k8s.io/v1 +kind: GRPCRoute +metadata: + name: {{ include "openshell.fullname" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "openshell.labels" . | nindent 4 }} +spec: + parentRefs: + - name: {{ default (include "openshell.fullname" .) .Values.grpcRoute.gateway.name }} + namespace: {{ default .Release.Namespace .Values.grpcRoute.gateway.namespace }} + {{- if .Values.grpcRoute.hostnames }} + hostnames: + {{- toYaml .Values.grpcRoute.hostnames | nindent 4 }} + {{- end }} + rules: + - backendRefs: + - name: {{ include "openshell.fullname" . }} + port: {{ .Values.service.port }} +{{- end }} diff --git a/deploy/helm/openshell/templates/pki-hook.yaml b/deploy/helm/openshell/templates/pki-hook.yaml new file mode 100644 index 000000000..c5e83c734 --- /dev/null +++ b/deploy/helm/openshell/templates/pki-hook.yaml @@ -0,0 +1,191 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +{{- if and .Values.pkiInitJob.enabled .Values.certManager.enabled }} +{{- fail "pkiInitJob.enabled and certManager.enabled cannot both be true; disable one to avoid conflicting PKI sources." }} +{{- end }} +{{- if .Values.pkiInitJob.enabled }} +{{- $hookName := printf "%s-pki-hook" (include "openshell.fullname" .) }} +{{- $ns := .Release.Namespace }} +{{- $serverSecret := .Values.server.tls.certSecretName }} +{{- $clientSecret := .Values.server.tls.clientTlsSecretName }} +{{- $sanParts := list }} +{{- range .Values.pkiInitJob.serverDnsNames }}{{- $sanParts = append $sanParts (printf "DNS:%s" .) }}{{- end }} +{{- range .Values.pkiInitJob.serverIpAddresses }}{{- $sanParts = append $sanParts (printf "IP:%s" .) }}{{- end }} +{{- $serverSans := join "," $sanParts }} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ $hookName }} + namespace: {{ $ns }} + labels: + {{- include "openshell.labels" . | nindent 4 }} + annotations: + helm.sh/hook: pre-install,pre-upgrade + helm.sh/hook-weight: "-30" + helm.sh/hook-delete-policy: before-hook-creation +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: {{ $hookName }} + namespace: {{ $ns }} + labels: + {{- include "openshell.labels" . | nindent 4 }} + annotations: + helm.sh/hook: pre-install,pre-upgrade + helm.sh/hook-weight: "-30" + helm.sh/hook-delete-policy: before-hook-creation +rules: + - apiGroups: [""] + resources: ["secrets"] + verbs: ["get", "create"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: {{ $hookName }} + namespace: {{ $ns }} + labels: + {{- include "openshell.labels" . | nindent 4 }} + annotations: + helm.sh/hook: pre-install,pre-upgrade + helm.sh/hook-weight: "-30" + helm.sh/hook-delete-policy: before-hook-creation +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: {{ $hookName }} +subjects: + - kind: ServiceAccount + name: {{ $hookName }} + namespace: {{ $ns }} +--- +apiVersion: batch/v1 +kind: Job +metadata: + name: {{ $hookName }} + namespace: {{ $ns }} + labels: + {{- include "openshell.labels" . | nindent 4 }} + annotations: + helm.sh/hook: pre-install,pre-upgrade + helm.sh/hook-weight: "-20" + helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded +spec: + backoffLimit: 3 + activeDeadlineSeconds: 120 + ttlSecondsAfterFinished: 300 + template: + metadata: + labels: + {{- include "openshell.selectorLabels" . | nindent 8 }} + spec: + restartPolicy: OnFailure + serviceAccountName: {{ $hookName }} + containers: + - name: pki-gen + image: {{ .Values.pkiInitJob.image.repository }}:{{ .Values.pkiInitJob.image.tag }} + imagePullPolicy: {{ .Values.pkiInitJob.image.pullPolicy }} + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + env: + - name: NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + - name: SERVER_SECRET + value: {{ $serverSecret | quote }} + - name: CLIENT_SECRET + value: {{ $clientSecret | quote }} + - name: CA_DAYS + value: {{ .Values.pkiInitJob.caValidityDays | quote }} + - name: CERT_DAYS + value: {{ .Values.pkiInitJob.certValidityDays | quote }} + - name: SERVER_SANS + value: {{ $serverSans | quote }} + command: + - /bin/sh + - -c + - | + set -eu + apk add --no-cache openssl curl >/dev/null 2>&1 + + TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) + K8S_CA=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt + API=https://kubernetes.default.svc + + # Idempotency: skip only when both TLS secrets already exist. + # Checking one is insufficient — a partial cleanup can leave one half + # of the pair behind, which would cause mTLS to fail at runtime. + HTTP_SERVER=$(curl -s -o /dev/null -w "%{http_code}" \ + -H "Authorization: Bearer $TOKEN" --cacert "$K8S_CA" \ + "$API/api/v1/namespaces/$NAMESPACE/secrets/$SERVER_SECRET") + HTTP_CLIENT=$(curl -s -o /dev/null -w "%{http_code}" \ + -H "Authorization: Bearer $TOKEN" --cacert "$K8S_CA" \ + "$API/api/v1/namespaces/$NAMESPACE/secrets/$CLIENT_SECRET") + if [ "$HTTP_SERVER" = "200" ] && [ "$HTTP_CLIENT" = "200" ]; then + echo "PKI secrets already exist, skipping." + exit 0 + fi + if [ "$HTTP_SERVER" = "200" ] || [ "$HTTP_CLIENT" = "200" ]; then + echo "ERROR: partial PKI state — one secret exists but not both." >&2 + echo "To recover: kubectl delete secret -n $NAMESPACE $SERVER_SECRET $CLIENT_SECRET" >&2 + exit 1 + fi + + cd /tmp + + # CA (ECDSA P-256) + openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out ca.key 2>/dev/null + openssl req -new -x509 -sha256 -key ca.key -out ca.crt \ + -days "$CA_DAYS" -subj "/O=openshell/CN=openshell-ca" \ + -addext "basicConstraints=critical,CA:TRUE,pathlen:0" \ + -addext "keyUsage=critical,keyCertSign,cRLSign" + + # Server cert (ECDSA P-256) + printf "[ext]\nsubjectAltName=%s\nextendedKeyUsage=serverAuth\nkeyUsage=digitalSignature,keyEncipherment\n" \ + "$SERVER_SANS" > server.ext + openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out server.key 2>/dev/null + openssl req -new -sha256 -key server.key -out server.csr -subj "/CN=openshell-server" + openssl x509 -req -sha256 -in server.csr -CA ca.crt -CAkey ca.key \ + -CAcreateserial -days "$CERT_DAYS" -extensions ext -extfile server.ext -out server.crt + + # Client cert (ECDSA P-256) + printf "[ext]\nextendedKeyUsage=clientAuth\nkeyUsage=digitalSignature,keyEncipherment\n" \ + > client.ext + openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out client.key 2>/dev/null + openssl req -new -sha256 -key client.key -out client.csr -subj "/CN=openshell-client" + openssl x509 -req -sha256 -in client.csr -CA ca.crt -CAkey ca.key \ + -CAcreateserial -days "$CERT_DAYS" -extensions ext -extfile client.ext -out client.crt + + CA_B64=$(base64 -w0 ca.crt) + SERVER_CRT_B64=$(base64 -w0 server.crt) + SERVER_KEY_B64=$(base64 -w0 server.key) + CLIENT_CRT_B64=$(base64 -w0 client.crt) + CLIENT_KEY_B64=$(base64 -w0 client.key) + + # Create server TLS secret + printf '{"apiVersion":"v1","kind":"Secret","metadata":{"name":"%s","namespace":"%s"},"type":"kubernetes.io/tls","data":{"tls.crt":"%s","tls.key":"%s","ca.crt":"%s"}}\n' \ + "$SERVER_SECRET" "$NAMESPACE" \ + "$SERVER_CRT_B64" "$SERVER_KEY_B64" "$CA_B64" > server-secret.json + curl -sf -X POST \ + -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ + --cacert "$K8S_CA" "$API/api/v1/namespaces/$NAMESPACE/secrets" \ + -d @server-secret.json + + # Create client TLS secret + printf '{"apiVersion":"v1","kind":"Secret","metadata":{"name":"%s","namespace":"%s"},"type":"kubernetes.io/tls","data":{"tls.crt":"%s","tls.key":"%s","ca.crt":"%s"}}\n' \ + "$CLIENT_SECRET" "$NAMESPACE" \ + "$CLIENT_CRT_B64" "$CLIENT_KEY_B64" "$CA_B64" > client-secret.json + curl -sf -X POST \ + -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \ + --cacert "$K8S_CA" "$API/api/v1/namespaces/$NAMESPACE/secrets" \ + -d @client-secret.json + + rm -f *.key *.csr *.crt *.ext *.srl *.json + echo "PKI secrets created." +{{- end }} diff --git a/deploy/helm/openshell/templates/ssh-handshake-secret-hook.yaml b/deploy/helm/openshell/templates/ssh-handshake-secret-hook.yaml new file mode 100644 index 000000000..ad444847b --- /dev/null +++ b/deploy/helm/openshell/templates/ssh-handshake-secret-hook.yaml @@ -0,0 +1,27 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +{{- if .Values.sshHandshake.hook.enabled }} +{{- $name := .Values.server.sshHandshakeSecretName }} +{{- $ns := .Release.Namespace }} +{{- $existing := lookup "v1" "Secret" $ns $name }} +{{- if not $existing }} +{{- $hex := .Values.sshHandshake.value }} +{{- if not $hex }} +{{- $hex = printf "%s%s" (uuidv4 | replace "-" "") (uuidv4 | replace "-" "") }} +{{- end }} +apiVersion: v1 +kind: Secret +metadata: + name: {{ $name }} + namespace: {{ $ns }} + labels: + {{- include "openshell.labels" . | nindent 4 }} + annotations: + helm.sh/hook: pre-install,pre-upgrade + helm.sh/hook-weight: "-20" +type: Opaque +stringData: + secret: {{ $hex | quote }} +{{- end }} +{{- end }} diff --git a/deploy/helm/openshell/templates/statefulset.yaml b/deploy/helm/openshell/templates/statefulset.yaml index d4c7ee606..2db3a0c5f 100644 --- a/deploy/helm/openshell/templates/statefulset.yaml +++ b/deploy/helm/openshell/templates/statefulset.yaml @@ -44,7 +44,7 @@ spec: - name: {{ .Chart.Name }} securityContext: {{- toYaml .Values.securityContext | nindent 12 }} - image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + image: {{ include "openshell.image" . | quote }} imagePullPolicy: {{ .Values.image.pullPolicy }} args: - --bind-address @@ -71,10 +71,10 @@ spec: value: {{ .Values.server.sandboxImagePullPolicy | quote }} {{- end }} - name: OPENSHELL_SUPERVISOR_IMAGE - value: {{ .Values.server.supervisorImage | quote }} - {{- if .Values.server.supervisorImagePullPolicy }} + value: {{ include "openshell.supervisorImage" . | quote }} + {{- if .Values.supervisor.image.pullPolicy }} - name: OPENSHELL_SUPERVISOR_IMAGE_PULL_POLICY - value: {{ .Values.server.supervisorImagePullPolicy | quote }} + value: {{ .Values.supervisor.image.pullPolicy | quote }} {{- end }} - name: OPENSHELL_GRPC_ENDPOINT value: {{ if .Values.server.disableTls }}{{ .Values.server.grpcEndpoint | replace "https://" "http://" | quote }}{{ else }}{{ .Values.server.grpcEndpoint | quote }}{{ end }} @@ -191,7 +191,14 @@ spec: secretName: {{ .Values.server.tls.certSecretName }} - name: tls-client-ca secret: + {{- if or .Values.pkiInitJob.enabled (and .Values.certManager.enabled .Values.certManager.clientCaFromServerTlsSecret) }} + secretName: {{ .Values.server.tls.certSecretName }} + items: + - key: ca.crt + path: ca.crt + {{- else }} secretName: {{ .Values.server.tls.clientCaSecretName }} + {{- end }} {{- end }} {{- with .Values.nodeSelector }} nodeSelector: diff --git a/deploy/helm/openshell/values-cert-manager.yaml b/deploy/helm/openshell/values-cert-manager.yaml new file mode 100644 index 000000000..bb024d716 --- /dev/null +++ b/deploy/helm/openshell/values-cert-manager.yaml @@ -0,0 +1,14 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# Merge after values.yaml when cert-manager CRDs are installed, e.g.: +# helm install ... -f values.yaml -f values-cert-manager.yaml +# Or add this file to skaffold manifests.helm.releases[].valuesFiles. +server: + disableTls: false + +pkiInitJob: + enabled: false + +certManager: + enabled: true diff --git a/deploy/helm/openshell/values-gateway.yaml b/deploy/helm/openshell/values-gateway.yaml new file mode 100644 index 000000000..c43a4cd45 --- /dev/null +++ b/deploy/helm/openshell/values-gateway.yaml @@ -0,0 +1,26 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# Gateway API overlay — enables a Gateway and GRPCRoute for external access. +# +# Requires Envoy Gateway in the cluster (installed via skaffold.yaml). +# Add this file to the openshell release valuesFiles to activate: +# uncomment values-gateway.yaml in deploy/helm/openshell/skaffold.yaml +# +# Envoy Gateway will create an Envoy proxy Deployment and a LoadBalancer +# Service (named envoy---*) in the openshell namespace. +# +# To reach the gateway from outside a k3d cluster, port-forward to that service: +# kubectl -n openshell get svc -l gateway.envoyproxy.io/owning-gateway-name=openshell +# kubectl -n openshell port-forward svc/ 8080:80 +# # then: grpcurl -plaintext localhost:8080 ... + +grpcRoute: + enabled: true + gateway: + create: true + className: "eg" + # Set one or more hostnames to scope the route, e.g.: + # hostnames: + # - openshell.example.com + hostnames: [] diff --git a/deploy/helm/openshell/values-keycloak.yaml b/deploy/helm/openshell/values-keycloak.yaml new file mode 100644 index 000000000..42bb2ad4e --- /dev/null +++ b/deploy/helm/openshell/values-keycloak.yaml @@ -0,0 +1,36 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# OIDC configuration overlay for a local Keycloak instance in k3s. +# +# Run the one-time setup task first: +# mise run keycloak:k8s:setup +# +# Then layer this file on top of values.yaml when deploying: +# helm upgrade --install openshell . \ +# -f values.yaml -f values-skaffold.yaml -f values-keycloak.yaml +# +# Or add this file to skaffold.yaml valuesFiles for iterative dev. +# +# Issuer note: the setup task configures Keycloak with KC_HOSTNAME set to the +# in-cluster service hostname, so tokens always carry that hostname as `iss` +# regardless of how they were obtained (e.g. via a localhost port-forward). +# The gateway fetches JWKS from this URL inside the cluster. +# +# CLI token acquisition: keep a port-forward running while using openshell login: +# kubectl -n keycloak port-forward svc/keycloak 9090:80 + +server: + oidc: + # Must match KC_HOSTNAME set by keycloak:k8s:setup (in-cluster service hostname). + issuer: "http://keycloak.keycloak.svc.cluster.local/realms/openshell" + # Must match the client ID in the imported realm (openshell-cli). + audience: "openshell-cli" + # Short TTL for dev so JWKS key rotation is picked up quickly. + # Use 3600 (default) in production. + jwksTtl: 60 + # Keycloak puts realm roles at realm_access.roles in the JWT. + rolesClaim: "realm_access.roles" + # Leave both empty for authentication-only mode (any valid token is accepted). + adminRole: "openshell-admin" + userRole: "openshell-user" diff --git a/deploy/helm/openshell/values-skaffold.yaml b/deploy/helm/openshell/values-skaffold.yaml new file mode 100644 index 000000000..24b60e1c6 --- /dev/null +++ b/deploy/helm/openshell/values-skaffold.yaml @@ -0,0 +1,12 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# Merge with values.yaml for Skaffold-driven local image builds (see skaffold.yaml). +server: + sandboxImagePullPolicy: IfNotPresent + # Comment out to enforce mTLS (uses PKI secrets generated by pkiInitJob). + disableTls: true + +supervisor: + image: + pullPolicy: IfNotPresent diff --git a/deploy/helm/openshell/values.yaml b/deploy/helm/openshell/values.yaml index 1c820c8b1..ca30fa371 100644 --- a/deploy/helm/openshell/values.yaml +++ b/deploy/helm/openshell/values.yaml @@ -7,8 +7,17 @@ replicaCount: 1 image: repository: ghcr.io/nvidia/openshell/gateway - pullPolicy: Always - tag: "latest" + pullPolicy: IfNotPresent + tag: "" + +# Supervisor image — provides the openshell-sandbox binary that is copied into +# sandbox pods via an init container. tag defaults to appVersion (same as the +# gateway image) so both stay in sync when the chart is released. +supervisor: + image: + repository: ghcr.io/nvidia/openshell/supervisor + pullPolicy: "" + tag: "" imagePullSecrets: [] nameOverride: "" @@ -34,9 +43,8 @@ securityContext: - ALL service: - type: NodePort + type: ClusterIP port: 8080 - nodePort: 30051 healthPort: 8081 metricsPort: 9090 @@ -78,14 +86,6 @@ server: # (Always for :latest, IfNotPresent otherwise). Set to "Always" for dev # clusters so new images are picked up without manual eviction. sandboxImagePullPolicy: "" - # Image that provides the openshell-sandbox supervisor binary. An init - # container copies the binary from this image into a shared emptyDir volume - # before the sandbox container starts. Should match the gateway image tag - # so the supervisor and gateway versions stay in sync. - supervisorImage: "ghcr.io/nvidia/openshell/supervisor:latest" - # Kubernetes imagePullPolicy for the supervisor init container. - # Empty = Kubernetes default. - supervisorImagePullPolicy: "" # gRPC endpoint for sandboxes to callback to OpenShell (must be reachable from pods) grpcEndpoint: "https://openshell.openshell.svc.cluster.local:8080" # Public host/port returned to CLI clients for SSH proxy CONNECT requests. @@ -97,7 +97,7 @@ server: # directly and requires client certificates. # Name of the Kubernetes Secret holding the NSSH1 HMAC handshake key. # The secret must contain a `secret` key with the hex-encoded HMAC key. - # For cluster deployments this is auto-created by the bootstrap process. + # By default a pre-install/pre-upgrade hook creates it when missing (see sshHandshake). sshHandshakeSecretName: "openshell-ssh-handshake" # Host gateway IP for sandbox pod hostAliases. When set, sandbox pods get # hostAliases entries mapping host.docker.internal and host.openshell.internal @@ -142,3 +142,89 @@ server: # NetworkPolicy restricting SSH ingress on sandbox pods to the gateway only. networkPolicy: enabled: true + +# NSSH1 SSH gateway handshake Secret (`server.sshHandshakeSecretName`). +# Helm hook creates it only when the Secret does not already exist (safe upgrades). +# Set sshHandshake.value from a gitignored values file for a stable dev secret. +sshHandshake: + hook: + enabled: true + # 64 hex chars (32 bytes), matching openshell-bootstrap. If empty, Helm generates + # a random value at install template time (two UUIDs, dashes stripped). + value: "" + +# PKI bootstrap via a pre-install/pre-upgrade hook Job. +# Generates a self-signed CA, server TLS secret, and client TLS secret using +# openssl (ECDSA P-256) inside the cluster. Key material is written directly to +# K8s Secrets and never appears in Helm release history. Idempotent: existing +# secrets are left untouched on upgrade. +# Air-gapped environments should override pkiInitJob.image with an image that has +# openssl and curl pre-installed (the default alpine image fetches them at runtime). +pkiInitJob: + enabled: true + image: + repository: alpine + tag: "3" + pullPolicy: IfNotPresent + # Days until the CA certificate expires. + caValidityDays: 3650 + # Days until server and client certificates expire. + certValidityDays: 3650 + # DNS SANs for the server certificate. + serverDnsNames: + - openshell + - openshell.openshell.svc + - openshell.openshell.svc.cluster.local + - localhost + - host.docker.internal + # IP SANs for the server certificate. + serverIpAddresses: + - 127.0.0.1 + +# cert-manager Certificate/Issuer resources (requires cert-manager CRDs in-cluster). +# Uses namespaced Issuers only (no ClusterIssuer). Does not install cert-manager itself. +certManager: + enabled: false + # Secret created for the intermediate CA (Certificate with isCA: true). + caSecretName: openshell-ca-tls + # Mount gateway client CA from the server TLS secret's ca.crt (populated by + # cert-manager for certs issued by a CA Issuer). Avoids a separate + # openshell-server-client-ca Secret. + clientCaFromServerTlsSecret: true + certificateDuration: 8760h + certificateRenewBefore: 720h + serverDnsNames: + - openshell + - openshell.openshell.svc + - openshell.openshell.svc.cluster.local + - localhost + - host.docker.internal + serverIpAddresses: + - 127.0.0.1 + +# Kubernetes Gateway API — HTTPRoute and Gateway resources. +# Requires a Gateway API controller in the cluster. Install Envoy Gateway via +# the skaffold.yaml releases or independently: +# helm install eg oci://docker.io/envoyproxy/gateway-helm \ +# --version v1.4.1 -n envoy-gateway-system --create-namespace +grpcRoute: + enabled: false + # Hostnames the GRPCRoute matches on. Leave empty to match all hosts. + hostnames: [] + gateway: + # When true, a Gateway resource is created in the release namespace. + # Set to false and provide name/namespace to attach to a pre-existing Gateway. + create: false + # GatewayClass to reference. Envoy Gateway installs one named "eg". + className: "eg" + # Name of the Gateway resource. Defaults to the chart fullname. + name: "" + # Namespace of the Gateway referenced by the GRPCRoute parentRef. + # Defaults to the release namespace. + namespace: "" + # Listener settings (only used when gateway.create is true). + listener: + port: 80 + protocol: HTTP + # "Same" restricts attached routes to the release namespace; "All" allows any namespace. + allowedRoutes: Same diff --git a/deploy/kube/manifests/envoy-gateway-openshell.yaml b/deploy/kube/manifests/envoy-gateway-openshell.yaml new file mode 100644 index 000000000..583f2b41b --- /dev/null +++ b/deploy/kube/manifests/envoy-gateway-openshell.yaml @@ -0,0 +1,17 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# Envoy GatewayClass for OpenShell. +# +# Apply after a successful Skaffold deploy when gateway routing is enabled: +# mise run helm:gateway:apply + +--- +# The Envoy Gateway Helm chart does not create a GatewayClass — we manage it here. +# The proxy Service defaults to LoadBalancer, which klipper-lb handles in k3d. +apiVersion: gateway.networking.k8s.io/v1 +kind: GatewayClass +metadata: + name: eg +spec: + controllerName: gateway.envoyproxy.io/gatewayclass-controller diff --git a/mise.lock b/mise.lock index 8ee91dcff..cd45a0894 100644 --- a/mise.lock +++ b/mise.lock @@ -120,6 +120,22 @@ url = "https://get.helm.sh/helm-v4.1.4-linux-amd64.tar.gz" checksum = "sha256:7c2eca678e8001fa863cdf8cbf6ac1b3799f9404a89eb55c08260ef5732e658d" url = "https://get.helm.sh/helm-v4.1.4-darwin-arm64.tar.gz" +[[tools.k3d]] +version = "5.8.3" +backend = "aqua:k3d-io/k3d" + +[tools.k3d."platforms.linux-arm64"] +checksum = "sha256:0b8110f2229631af7402fb828259330985918b08fefd38b7f1b788a1c8687216" +url = "https://github.com/k3d-io/k3d/releases/download/v5.8.3/k3d-linux-arm64" + +[tools.k3d."platforms.linux-x64"] +checksum = "sha256:dbaa79a76ace7f4ca230a1ff41dc7d8a5036a8ad0309e9c54f9bf3836dbe853e" +url = "https://github.com/k3d-io/k3d/releases/download/v5.8.3/k3d-linux-amd64" + +[tools.k3d."platforms.macos-arm64"] +checksum = "sha256:8da468daa7dc7cf7cdd4735f90a9bb05179fa27858250f62e3d8cdf5b5ca0698" +url = "https://github.com/k3d-io/k3d/releases/download/v5.8.3/k3d-darwin-arm64" + [[tools.kubectl]] version = "1.35.4" backend = "aqua:kubernetes/kubernetes/kubectl" @@ -250,6 +266,19 @@ provenance = "github-attestations" version = "1.95.0" backend = "core:rust" +[[tools.skaffold]] +version = "2.19.0" +backend = "aqua:GoogleContainerTools/skaffold" + +[tools.skaffold."platforms.linux-arm64"] +url = "https://storage.googleapis.com/skaffold/releases/v2.19.0/skaffold-linux-arm64" + +[tools.skaffold."platforms.linux-x64"] +url = "https://storage.googleapis.com/skaffold/releases/v2.19.0/skaffold-linux-amd64" + +[tools.skaffold."platforms.macos-arm64"] +url = "https://storage.googleapis.com/skaffold/releases/v2.19.0/skaffold-darwin-arm64" + [[tools.uv]] version = "0.10.12" backend = "aqua:astral-sh/uv" diff --git a/mise.toml b/mise.toml index fc4961db8..c994d2c8f 100644 --- a/mise.toml +++ b/mise.toml @@ -26,6 +26,8 @@ kubectl = "1.35.4" uv = "0.10.12" protoc = "29.6" helm = "4.1.4" +skaffold = "2.19.0" +k3d = "5.8.3" "github:anchore/syft" = { version = "1.43.0" } "github:EmbarkStudios/cargo-about" = { version = "0.8.4", version_prefix = "" } zig = "0.14.1" diff --git a/scripts/bin/k9s b/scripts/bin/k9s deleted file mode 100755 index 4b0b6e12d..000000000 --- a/scripts/bin/k9s +++ /dev/null @@ -1,15 +0,0 @@ -#!/usr/bin/env bash - -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -# Gateway-aware k9s wrapper. -# -# Runs k9s inside the active gateway's k3s container via -# `openshell doctor exec`. All arguments are forwarded to k9s. -# -# Usage: -# k9s -# k9s -n openshell - -exec openshell doctor exec -- k9s "$@" diff --git a/scripts/bin/kubectl b/scripts/bin/kubectl deleted file mode 100755 index 09ced971b..000000000 --- a/scripts/bin/kubectl +++ /dev/null @@ -1,15 +0,0 @@ -#!/usr/bin/env bash - -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -# Gateway-aware kubectl wrapper. -# -# Runs kubectl inside the active gateway's k3s container via -# `openshell doctor exec`. All arguments are forwarded to kubectl. -# -# Usage: -# kubectl get pods -A -# kubectl logs -n openshell statefulset/openshell - -exec openshell doctor exec -- kubectl "$@" diff --git a/tasks/helm.toml b/tasks/helm.toml index 65305af39..c7949865b 100644 --- a/tasks/helm.toml +++ b/tasks/helm.toml @@ -7,3 +7,55 @@ description = "Lint the openshell helm chart" run = "helm lint deploy/helm/openshell" hide = true + +["helm:skaffold:dev"] +description = "Run skaffold dev for deploy/helm/openshell (iterative deploy)" +dir = "deploy/helm/openshell" +run = "skaffold dev" + +["helm:skaffold:run"] +description = "Run skaffold run for deploy/helm/openshell (one-shot deploy)" +dir = "deploy/helm/openshell" +run = "skaffold run" + +["helm:skaffold:delete"] +description = "Run skaffold delete for deploy/helm/openshell" +dir = "deploy/helm/openshell" +run = "skaffold delete" + +["helm:skaffold:diagnose"] +description = "Run skaffold diagnose for deploy/helm/openshell" +dir = "deploy/helm/openshell" +run = "skaffold diagnose" +hide = true + +# Local k3s via k3d (Docker required). On macOS this is the supported path for a lightweight cluster to pair with helm:skaffold:* . + +["helm:k3s:create"] +description = "Create a local k3s cluster with k3d and merge kubeconfig (macOS/Linux + Docker; use with helm:skaffold:dev)" +run = "tasks/scripts/helm-k3s-local.sh create" + +["helm:k3s:delete"] +description = "Delete the local k3d cluster created by helm:k3s:create" +run = "tasks/scripts/helm-k3s-local.sh delete" + +["helm:k3s:start"] +description = "Start the local k3d cluster (after helm:k3s:stop)" +run = "tasks/scripts/helm-k3s-local.sh start" +hide = true + +["helm:k3s:stop"] +description = "Stop the local k3d cluster without deleting it" +run = "tasks/scripts/helm-k3s-local.sh stop" +hide = true + +["helm:k3s:status"] +description = "List local k3d clusters" +run = "tasks/scripts/helm-k3s-local.sh status" +hide = true + +# Install Envoy Gateway's GatewayClass into the cluster + +["helm:gateway:apply"] +description = "Apply the Envoy GatewayClass manifest (run after helm:skaffold:run when gateway routing is enabled)" +run = "kubectl apply -f deploy/kube/manifests/envoy-gateway-openshell.yaml" diff --git a/tasks/keycloak.toml b/tasks/keycloak.toml index fc058f0ba..4d35e0dd0 100644 --- a/tasks/keycloak.toml +++ b/tasks/keycloak.toml @@ -14,3 +14,11 @@ run = "scripts/keycloak-dev.sh stop" ["keycloak:status"] description = "Check if the local Keycloak instance is running" run = "scripts/keycloak-dev.sh status" + +["keycloak:k8s:setup"] +description = "Install Keycloak in the local k3s cluster and import the openshell realm (one-time setup)" +run = "tasks/scripts/keycloak-k8s-setup.sh" + +["keycloak:k8s:teardown"] +description = "Remove Keycloak from the local k3s cluster" +run = "kubectl delete namespace keycloak --ignore-not-found" diff --git a/tasks/scripts/helm-k3s-local.sh b/tasks/scripts/helm-k3s-local.sh new file mode 100755 index 000000000..3f268c2dc --- /dev/null +++ b/tasks/scripts/helm-k3s-local.sh @@ -0,0 +1,224 @@ +#!/usr/bin/env bash + +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# Local k3s for Helm / Skaffold workflows using k3d (macOS primary; Linux also supported). +# Requires Docker running. Writes merged kubeconfig to HELM_K3S_KUBECONFIG or $KUBECONFIG or ./kubeconfig. +# +# Multi-worktree: the cluster name is derived from the last component of the current +# git branch (e.g. branch "kube-support/local-dev/tmutch" → cluster "openshell-dev-tmutch"). +# Each worktree therefore gets its own isolated cluster and per-worktree kubeconfig. +# Override with HELM_K3S_CLUSTER_NAME to force a specific name. + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" + +# Derive a DNS-safe suffix from the last component of the current branch name. +_branch="$(git -C "${ROOT}" rev-parse --abbrev-ref HEAD 2>/dev/null)" || _branch="" +_suffix="$(printf '%s' "${_branch##*/}" | tr '[:upper:]' '[:lower:]' | tr -cs 'a-z0-9' '-' | sed 's/-*$//')" +CLUSTER_NAME="${HELM_K3S_CLUSTER_NAME:-openshell-dev${_suffix:+-${_suffix}}}" +# Host port forwarded to port 80 via the k3d load balancer. +# Used by Envoy Gateway's LoadBalancer service (values-gateway.yaml). +HOST_LB_PORT="${HELM_K3S_LB_HOST_PORT:-8080}" + +default_kubeconfig="${ROOT}/kubeconfig" +if [[ -n "${HELM_K3S_KUBECONFIG:-}" ]]; then + KUBECONFIG_TARGET="${HELM_K3S_KUBECONFIG}" +elif [[ -n "${KUBECONFIG:-}" ]]; then + # mise sets KUBECONFIG to a single file — use it when unambiguous + if [[ "${KUBECONFIG}" != *:* ]]; then + KUBECONFIG_TARGET="${KUBECONFIG}" + else + KUBECONFIG_TARGET="${default_kubeconfig}" + fi +else + KUBECONFIG_TARGET="${default_kubeconfig}" +fi + +usage() { + cat >&2 < + +Environment: + HELM_K3S_CLUSTER_NAME k3d cluster name (default: openshell-dev-) + Each git worktree gets its own cluster derived from its branch name. + Override to share a single cluster across worktrees. + HELM_K3S_KUBECONFIG kubeconfig file to write/merge (default: repo kubeconfig or \$KUBECONFIG) + HELM_K3S_LB_HOST_PORT Host port mapped to load balancer port 80 (default: 8080) + +macOS uses k3d (Docker required). Linux uses the same k3d flow when Docker is available. +Pair with: mise run helm:skaffold:dev +EOF +} + +require_supported_os() { + case "$(uname -s)" in + Darwin | Linux) ;; + *) + echo "error: local k3s tasks are only supported on macOS and Linux." >&2 + exit 1 + ;; + esac +} + +require_docker() { + if ! command -v docker >/dev/null 2>&1; then + echo "error: Docker is required for k3d. Install Docker Desktop (macOS) or Docker Engine (Linux)." >&2 + exit 1 + fi + if ! docker info >/dev/null 2>&1; then + echo "error: Docker does not appear to be running." >&2 + exit 1 + fi +} + +require_k3d() { + if ! command -v k3d >/dev/null 2>&1; then + echo "error: k3d not found. Run: mise install" >&2 + exit 1 + fi +} + +require_kubectl() { + if ! command -v kubectl >/dev/null 2>&1; then + echo "error: kubectl not found. Run: mise install" >&2 + exit 1 + fi +} + +k3d_context_name() { + echo "k3d-${CLUSTER_NAME}" +} + +k3d_cluster_exists() { + k3d cluster list "${CLUSTER_NAME}" >/dev/null 2>&1 +} + +merge_kubeconfig() { + require_kubectl + local tmp k3d_cfg merged_dir + tmp="$(mktemp)" + k3d kubeconfig get "${CLUSTER_NAME}" >"${tmp}" + + if [[ -s "${KUBECONFIG_TARGET}" ]]; then + KUBECONFIG="${KUBECONFIG_TARGET}:${tmp}" kubectl config view --flatten >"${tmp}.out" + mv "${tmp}.out" "${KUBECONFIG_TARGET}" + else + merged_dir="$(dirname "${KUBECONFIG_TARGET}")" + mkdir -p "${merged_dir}" + mv "${tmp}" "${KUBECONFIG_TARGET}" + fi + rm -f "${tmp}" + + kubectl --kubeconfig="${KUBECONFIG_TARGET}" config use-context "$(k3d_context_name)" +} + +apply_base_manifests() { + require_kubectl + local manifest="${ROOT}/deploy/kube/manifests/agent-sandbox.yaml" + echo "Applying agent-sandbox manifests..." + kubectl --kubeconfig="${KUBECONFIG_TARGET}" apply -f "${manifest}" +} + +configure_ghcr_credentials() { + [[ -n "${GITHUB_PAT:-}" && -n "${GITHUB_USERNAME:-}" ]] || return 0 + + echo "Configuring ghcr.io credentials on cluster nodes..." + + local registries_content + registries_content="$(printf 'configs:\n "ghcr.io":\n auth:\n username: %s\n password: %s\n' \ + "${GITHUB_USERNAME}" "${GITHUB_PAT}")" + + local -a nodes + mapfile -t nodes < <(docker ps --format '{{.Names}}' \ + --filter "name=k3d-${CLUSTER_NAME}-server" 2>/dev/null || true) + + if [[ ${#nodes[@]} -eq 0 ]]; then + echo "warning: no server nodes found for cluster '${CLUSTER_NAME}', skipping ghcr.io credential setup." >&2 + return 0 + fi + + for node in "${nodes[@]}"; do + printf '%s\n' "${registries_content}" \ + | docker exec -i "${node}" sh -c 'mkdir -p /etc/rancher/k3s && cat > /etc/rancher/k3s/registries.yaml' + docker exec "${node}" kill -SIGHUP 1 + echo " Configured ghcr.io credentials on ${node}" + done +} + +cmd_create() { + require_supported_os + require_docker + require_k3d + + local lb_port_map="${HOST_LB_PORT}:80@loadbalancer" + + if k3d_cluster_exists; then + echo "k3d cluster '${CLUSTER_NAME}' already exists; merging kubeconfig." + else + echo "Creating k3d cluster '${CLUSTER_NAME}'..." + k3d cluster create "${CLUSTER_NAME}" \ + --wait \ + --kubeconfig-update-default=false \ + --kubeconfig-switch-context=false \ + --port "${lb_port_map}" \ + --k3s-arg "--disable=traefik@server:0" + fi + merge_kubeconfig + apply_base_manifests + configure_ghcr_credentials + echo "Active context: $(k3d_context_name)" + echo "Kubeconfig: ${KUBECONFIG_TARGET}" + echo "Envoy Gateway LoadBalancer (port 80): http://127.0.0.1:${HOST_LB_PORT}" +} + +cmd_delete() { + require_supported_os + require_k3d + if k3d_cluster_exists; then + k3d cluster delete "${CLUSTER_NAME}" + echo "Deleted k3d cluster '${CLUSTER_NAME}'." + else + echo "No k3d cluster named '${CLUSTER_NAME}'." + fi +} + +cmd_start() { + require_supported_os + require_k3d + k3d cluster start "${CLUSTER_NAME}" +} + +cmd_stop() { + require_supported_os + require_k3d + k3d cluster stop "${CLUSTER_NAME}" +} + +cmd_status() { + require_supported_os + require_k3d + k3d cluster list +} + +main() { + local sub="${1:-}" + case "${sub}" in + create) cmd_create ;; + delete) cmd_delete ;; + start) cmd_start ;; + stop) cmd_stop ;; + status) cmd_status ;; + -h | --help | help | "") usage ; [[ -n "${sub}" ]] || exit 1 ;; + *) + echo "error: unknown command '${sub}'" >&2 + usage + exit 1 + ;; + esac +} + +main "$@" diff --git a/tasks/scripts/keycloak-k8s-setup.sh b/tasks/scripts/keycloak-k8s-setup.sh new file mode 100755 index 000000000..808219709 --- /dev/null +++ b/tasks/scripts/keycloak-k8s-setup.sh @@ -0,0 +1,172 @@ +#!/usr/bin/env bash +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# One-time Keycloak setup for the local k3s cluster. +# Uses the same quay.io/keycloak/keycloak image and realm JSON as the local +# Docker dev setup (scripts/keycloak-dev.sh), deploying via kubectl manifests. +# +# Idempotent: safe to re-run. The Deployment and ConfigMap are applied with +# kubectl apply, and Keycloak's --import-realm flag skips the realm if it +# already exists. +# +# Usage: +# mise run keycloak:k8s:setup +# +# After setup, add deploy/helm/openshell/values-keycloak.yaml to your Helm +# release and redeploy: +# skaffold dev -f deploy/helm/openshell/skaffold.yaml +# (uncomment values-keycloak.yaml in skaffold.yaml valuesFiles first) +# +# To get tokens for the CLI while the cluster is running: +# kubectl -n keycloak port-forward svc/keycloak 9090:80 +# curl -s -X POST http://localhost:9090/realms/openshell/protocol/openid-connect/token \ +# -d 'grant_type=password&client_id=openshell-cli&username=admin@test&password=admin' \ +# | jq -r .access_token + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)" + +NAMESPACE="keycloak" +KEYCLOAK_IMAGE="${KEYCLOAK_IMAGE:-quay.io/keycloak/keycloak:24.0}" +ADMIN_USER="${KEYCLOAK_ADMIN_USER:-admin}" +ADMIN_PASSWORD="${KEYCLOAK_ADMIN_PASSWORD:-admin}" +SETUP_PORT="${KEYCLOAK_SETUP_PORT:-9090}" +REALM_FILE="${ROOT}/scripts/keycloak-realm.json" +HEALTH_TIMEOUT="${KEYCLOAK_HEALTH_TIMEOUT:-120}" + +# Keycloak's in-cluster service hostname, used as the forced KC_HOSTNAME so +# that the iss claim in tokens is consistent regardless of how they were +# obtained (e.g. via a localhost port-forward). The gateway fetches JWKS from +# this URL inside the cluster. See values-keycloak.yaml. +SVC_HOSTNAME="keycloak.${NAMESPACE}.svc.cluster.local" + +if [[ ! -f "${REALM_FILE}" ]]; then + echo "error: realm file not found: ${REALM_FILE}" >&2 + exit 1 +fi + +# --------------------------------------------------------------------------- +# Namespace + ConfigMap +# --------------------------------------------------------------------------- + +echo "Creating namespace '${NAMESPACE}'..." +kubectl create namespace "${NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f - + +echo "Applying realm ConfigMap..." +kubectl -n "${NAMESPACE}" create configmap openshell-realm \ + --from-file=realm.json="${REALM_FILE}" \ + --dry-run=client -o yaml | kubectl apply -f - + +# --------------------------------------------------------------------------- +# Deployment + Service +# --------------------------------------------------------------------------- + +echo "Applying Keycloak Deployment and Service..." +kubectl apply -f - <