Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a4ee752
Remove v3io from MLRUN_MODEL_ENDPOINT_MONITORING__STORE_PREFIXES__USE…
royischoss Feb 24, 2026
ce7601e
fix formating
royischoss Feb 25, 2026
061e2a4
fix version and remove deprecated dsn
royischoss Feb 25, 2026
0b2243f
adding otel to ce
royischoss Mar 25, 2026
08c0be1
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Mar 25, 2026
40baa80
fix requirements.lock
royischoss Mar 25, 2026
ff914ac
fixes
royischoss Mar 26, 2026
f121089
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Mar 29, 2026
78b175a
works
royischoss Mar 30, 2026
73ba287
Otel with collector works well
royischoss Apr 5, 2026
d71d893
bump chart version
royischoss Apr 5, 2026
6889bb2
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Apr 5, 2026
8b0be84
bump chart version
royischoss Apr 5, 2026
aedbbcf
fix lint
royischoss Apr 5, 2026
3cc4d46
documentation fixes
royischoss Apr 9, 2026
f7cca6a
fixes
royischoss Apr 9, 2026
6f6193c
fixes
royischoss Apr 9, 2026
4f56e6d
change method to push to prometheus
royischoss Apr 12, 2026
3aa7420
change method to push to prometheus
royischoss Apr 14, 2026
36c4b3f
remove labeling s3 and TimescaleDB fix jupyter bug. update documentat…
royischoss Apr 15, 2026
a5e71ef
another jupyter timing fix
royischoss Apr 15, 2026
b15ee61
remove redundant loop for crds check
royischoss Apr 15, 2026
a76e23e
Merge remote-tracking branch 'origin/development' into ceml-641
royischoss Apr 15, 2026
4f2bda3
fix requirements.lock
royischoss Apr 15, 2026
657cab3
fix rc version
royischoss Apr 15, 2026
73c6aa1
fix pin kubectl version in jobs, fix documentation for crds readiness…
royischoss Apr 15, 2026
28cf7ba
fix pin kubectl version in jobs, fix documentation for crds readiness…
royischoss Apr 15, 2026
f46c5d7
fix OTel chart design issues: collector config, namespace-label job,…
royischoss Apr 20, 2026
7c29f3c
fix chart version
royischoss Apr 20, 2026
35bbe7d
Merge branch 'development' into ceml-641
royischoss Apr 20, 2026
5c040f6
rename crd-readiness-job.yaml to otel-cr-installer.yaml
royischoss Apr 20, 2026
2c05daf
Merge remote-tracking branch 'roy/ceml-641' into ceml-641
royischoss Apr 20, 2026
8268cbb
fix explosion naming per pod to metrics
royischoss Apr 20, 2026
00840a9
fix crds creation in otel cr installer
royischoss Apr 26, 2026
a3dadba
fix: address OTel PR review issues — RBAC lifecycle, CR installer res…
royischoss Apr 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ jobs:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add strimzi https://strimzi.io/charts/
helm repo add seaweedfs https://seaweedfs.github.io/seaweedfs/helm
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

- name: Run chart-releaser
uses: helm/chart-releaser-action@cae68fefc6b5f367a0275617c9f83181ba54714f
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@ charts/mlrun-ce/charts/*
**/.DS_Store
*.DS_Store
**/__pycache__
# Packaged chart tarballs (generated by make package)
charts/mlrun-ce/mlrun-ce-*.tgz
# MLRun project directories created by test scripts
otlp-pro/
2 changes: 1 addition & 1 deletion charts/mlrun-ce/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v1
name: mlrun-ce
version: 0.11.0-rc.32
version: 0.11.0-rc.33
description: MLRun Open Source Stack
home: https://iguazio.com
icon: https://www.iguazio.com/wp-content/uploads/2019/10/Iguazio-Logo.png
Expand Down
53 changes: 50 additions & 3 deletions charts/mlrun-ce/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The Open source MLRun ce chart includes the following stack:
* Spark Operator - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
* Pipelines - https://github.com/kubeflow/pipelines
* Prometheus stack - https://github.com/prometheus-community/helm-charts
* OpenTelemetry Operator - https://github.com/open-telemetry/opentelemetry-operator (observability)

## Prerequisites

Expand Down Expand Up @@ -64,6 +65,33 @@ helm --namespace mlrun \
mlrun/mlrun-ce
```

### Installing with OpenTelemetry Enabled

> **Note:** OpenTelemetry is **disabled by default**. Follow the standard [Installing the Chart](#installing-the-chart) steps, adding the OTel flags below.

To install with OpenTelemetry enabled, append the following flags to the helm install command:

```bash
helm --namespace mlrun \
install my-mlrun \
--wait \
--set global.registry.url=<registry URL e.g. index.docker.io/iguazio> \
--set global.registry.secretName=registry-credentials \
--set opentelemetry-operator.enabled=true \
--set opentelemetry.namespaceLabel.enabled=true \
--set opentelemetry.collector.enabled=true \
--set opentelemetry.instrumentation.enabled=true \
mlrun/mlrun-ce
```

To verify the OpenTelemetry resources were created:

```bash
kubectl -n mlrun get opentelemetrycollectors
kubectl -n mlrun get instrumentations
kubectl -n mlrun get pods | grep opentelemetry
```

### Installing MLRun-ce on minikube

The Open source MLRun ce uses node ports for simplicity. If your kubernetes cluster is running inside a VM,
Expand All @@ -89,6 +117,25 @@ following values:
Additional configurable values are documented in the `values.yaml`, and the `values.yaml` of all sub charts.
Override those [in the normal methods](https://helm.sh/docs/chart_template_guide/values_files/).

### Configuring OpenTelemetry (Observability)

MLRun CE includes the OpenTelemetry Operator for collecting metrics and traces. When enabled, it deploys a single collector per namespace (deployment mode) — instrumented pods push OTLP data to the collector, which forwards metrics to Prometheus via the OTLP endpoint. Python auto-instrumentation is applied namespace-wide via a webhook, and the `mlrun.io/otel: "true"` label is applied to Jupyter and Nuclio function pods to mark them for metric enrichment and trigger OTel injection on restart.

For a fresh install with OTel, see [Installing with OpenTelemetry Enabled](#installing-with-opentelemetry-enabled).

To enable OTel on an existing installation:

```bash
helm --namespace mlrun upgrade my-mlrun \
--set opentelemetry-operator.enabled=true \
--set opentelemetry.namespaceLabel.enabled=true \
--set opentelemetry.collector.enabled=true \
--set opentelemetry.instrumentation.enabled=true \
mlrun/mlrun-ce
```

> **Note:** The above assumes a single-namespace installation. For multi-namespace (admin/non-admin) deployments, refer to the MLRun documentation.

### Working with ECR

To work with ECR, you must create a secret with your AWS credentials and a secret with ECR Token while providing both secret names to the helm install command.
Expand Down Expand Up @@ -282,6 +329,6 @@ Refer to the [**Kubeflow documentation**](https://www.kubeflow.org/docs/started/

This table shows the versions of the main components in the MLRun CE chart:

| MLRun CE | MLRun | Nuclio | Jupyter | MPI Operator | SeaweedFS | Spark Operator | Pipelines | Kube-Prometheus-Stack |
|------------|--------|--------|---------|--------------|-----------|----------------|-----------|-----------------------|
| **0.11.0** | 1.11.0 | 1.15.9 | 4.5.0 | 0.2.3 | 4.17.0 | 2.1.0 | 2.15.0 | 72.1.1 |
| MLRun CE | MLRun | Nuclio | Jupyter | MPI Operator | SeaweedFS | Spark Operator | Pipelines | Kube-Prometheus-Stack | OpenTelemetry Operator |
|------------|--------|--------|---------|--------------|-----------|----------------|-----------|-----------------------|------------------------|
| **0.11.0** | 1.11.0 | 1.15.9 | 4.5.0 | 0.2.3 | 4.17.0 | 2.1.0 | 2.15.0 | 72.1.1 | 0.78.1 |
25 changes: 25 additions & 0 deletions charts/mlrun-ce/admin_installation_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,28 @@ strimzi-kafka-operator:

kafka:
enabled: false

# OpenTelemetry Operator - enabled for CRD installation at cluster level
opentelemetry-operator:
enabled: true
admissionWebhooks:
certManager:
enabled: false
autoGenerateCert:
enabled: true
# Only apply webhooks to namespaces with the opentelemetry label
namespaceSelector:
matchLabels:
opentelemetry.io/inject: "enabled"

# OpenTelemetry CRs - disabled at admin level, enabled in user namespaces
# Note: Controller namespace does NOT need the opentelemetry label since
# no workloads are instrumented here - only the operator runs here
opentelemetry:
namespaceLabel:
enabled: false
collector:
enabled: false
instrumentation:
enabled: false

16 changes: 16 additions & 0 deletions charts/mlrun-ce/non_admin_cluster_ip_installation_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,19 @@ kafka:

kube-prometheus-stack:
enabled: false

# OpenTelemetry Operator - disabled, CRDs installed at controller level
opentelemetry-operator:
enabled: false

# OpenTelemetry CRs - enabled for user namespace
# The namespace will be labeled with opentelemetry.io/inject=enabled
# so the operator can inject sidecars into pods
opentelemetry:
namespaceLabel:
enabled: true
collector:
enabled: true
instrumentation:
enabled: true

16 changes: 16 additions & 0 deletions charts/mlrun-ce/non_admin_installation_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,19 @@ kafka:

kube-prometheus-stack:
enabled: false

# OpenTelemetry Operator - disabled, CRDs installed at controller level
opentelemetry-operator:
enabled: false

# OpenTelemetry CRs - enabled for user namespace
# The namespace will be labeled and annotated for OTel deployment-mode collection
# and namespace-wide Python auto-instrumentation.
opentelemetry:
namespaceLabel:
enabled: true
collector:
enabled: true
instrumentation:
enabled: true

7 changes: 5 additions & 2 deletions charts/mlrun-ce/requirements.lock
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,8 @@ dependencies:
- name: strimzi-kafka-operator
repository: https://strimzi.io/charts/
version: 0.48.0
digest: sha256:e2b2d1b7531c4829aa25c8ce8d95506642ab59d0cb692a343d2e508a71525374
generated: "2026-03-31T17:13:31.403112322Z"
- name: opentelemetry-operator
repository: https://open-telemetry.github.io/opentelemetry-helm-charts
version: 0.78.1
digest: sha256:50ed77fd11e450e243c05eadac99857b4b0aae92ae73ca9a6c00fc1cdc726f70
generated: "2026-04-15T11:23:19.249332+03:00"
4 changes: 4 additions & 0 deletions charts/mlrun-ce/requirements.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ dependencies:
repository: "https://strimzi.io/charts/"
version: "0.48.0"
condition: strimzi-kafka-operator.enabled
- name: opentelemetry-operator
repository: "https://open-telemetry.github.io/opentelemetry-helm-charts"
version: "0.78.1"
condition: opentelemetry-operator.enabled
39 changes: 39 additions & 0 deletions charts/mlrun-ce/templates/NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,5 +127,44 @@ TimescaleDB is available at:
{{- end }}
{{- end }}

{{- if index .Values "opentelemetry-operator" "enabled" }}
{{- "\n" }}
OpenTelemetry Operator is enabled!
- Operator manages OpenTelemetryCollector and Instrumentation CRs
- Namespace selector: opentelemetry.io/inject=enabled
{{- if .Values.opentelemetry.collector.enabled }}
{{- "\n" }}
OpenTelemetry Collector (deployment mode):
- Collector CR: {{ include "mlrun-ce.otel.collector.fullname" . }}
- OTLP gRPC endpoint: {{ include "mlrun-ce.otel.collector.fullname" . }}-collector:{{ .Values.opentelemetry.collector.otlp.grpcPort }}
- OTLP HTTP endpoint: {{ include "mlrun-ce.otel.collector.fullname" . }}-collector:{{ .Values.opentelemetry.collector.otlp.httpPort }}
- Metrics export: collector pushes via OTLP to Prometheus at /api/v1/otlp/v1/metrics
{{- end }}
{{- if .Values.opentelemetry.instrumentation.enabled }}
{{- "\n" }}
OpenTelemetry Auto-Instrumentation:
- Instrumentation CR: {{ include "mlrun-ce.otel.instrumentation.fullname" . }}
{{- if .Values.opentelemetry.instrumentation.python.enabled }}
- Python auto-instrumentation: enabled (namespace-wide via namespace annotation)
{{- end }}
{{- if .Values.opentelemetry.instrumentation.java.enabled }}
- Java auto-instrumentation: enabled
{{- end }}
{{- end }}
{{- if .Values.opentelemetry.namespaceLabel.enabled }}
{{- "\n" }}
Namespace OTel configuration:
- Label: {{ .Values.opentelemetry.namespaceLabel.key }}={{ .Values.opentelemetry.namespaceLabel.value }}
{{- if .Values.opentelemetry.instrumentation.enabled }}
- Python instrumentation annotation applied to all pods in namespace {{ .Release.Namespace }}
{{- end }}
{{- end }}
{{- if or .Values.opentelemetry.collector.enabled .Values.opentelemetry.instrumentation.enabled }}
{{- "\n" }}
Pods labeled with mlrun.io/otel=true: Jupyter and Nuclio function pods (via functionDefaults).
These Python-based pods receive OTel auto-instrumentation (runtime metrics, traces, HTTP metrics for Nuclio functions).
{{- end }}
{{- end }}

Happy MLOPSing!!! :]
{{- end }}
151 changes: 151 additions & 0 deletions charts/mlrun-ce/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -413,3 +413,154 @@ TimescaleDB connection string for MLRun model monitoring
postgresql://{{ .Values.timescaledb.auth.username | urlquery }}:{{ .Values.timescaledb.auth.password | urlquery }}@{{ include "mlrun-ce.timescaledb.fullname" . }}:{{ .Values.timescaledb.service.port }}/{{ .Values.timescaledb.auth.database }}
{{- end }}

{{/*
=============================================================================
OpenTelemetry helpers
=============================================================================
*/}}

{{/*
OpenTelemetry Collector name
*/}}
{{- define "mlrun-ce.otel.collector.name" -}}
{{- default "otel" .Values.opentelemetry.collector.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
OpenTelemetry Collector fullname
*/}}
{{- define "mlrun-ce.otel.collector.fullname" -}}
{{- if .Values.opentelemetry.collector.fullnameOverride }}
{{- .Values.opentelemetry.collector.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default "otel" .Values.opentelemetry.collector.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
OpenTelemetry Instrumentation name
*/}}
{{- define "mlrun-ce.otel.instrumentation.name" -}}
{{- default "otel-instrumentation" .Values.opentelemetry.instrumentation.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
OpenTelemetry Instrumentation fullname
*/}}
{{- define "mlrun-ce.otel.instrumentation.fullname" -}}
{{- if .Values.opentelemetry.instrumentation.fullnameOverride }}
{{- .Values.opentelemetry.instrumentation.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default "otel-instrumentation" .Values.opentelemetry.instrumentation.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
OpenTelemetry common labels
*/}}
{{- define "mlrun-ce.otel.labels" -}}
{{ include "mlrun-ce.common.labels" . }}
{{ include "mlrun-ce.otel.selectorLabels" . }}
{{- end }}

{{/*
OpenTelemetry selector labels
*/}}
{{- define "mlrun-ce.otel.selectorLabels" -}}
{{ include "mlrun-ce.common.selectorLabels" . }}
app.kubernetes.io/component: opentelemetry
{{- end }}

{{/*
OpenTelemetryCollector CR manifest for use in the CRD readiness job
*/}}
{{- define "mlrun-ce.otel.collector.manifest" -}}
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: {{ include "mlrun-ce.otel.collector.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "mlrun-ce.otel.labels" . | nindent 4 }}
spec:
mode: {{ .Values.opentelemetry.collector.mode }}
upgradeStrategy: {{ .Values.opentelemetry.collector.upgradeStrategy }}
managementState: managed
image: {{ (index .Values "opentelemetry-operator").manager.collectorImage.repository }}:{{ (index .Values "opentelemetry-operator").manager.collectorImage.tag }}
resources:
{{- toYaml .Values.opentelemetry.collector.resources | nindent 4 }}
config:
{{- toYaml .Values.opentelemetry.collector.config | nindent 4 }}
{{- end }}

{{/*
Instrumentation CR manifest for use in the CRD readiness job
*/}}
{{- define "mlrun-ce.otel.instrumentation.manifest" -}}
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: {{ include "mlrun-ce.otel.instrumentation.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "mlrun-ce.otel.labels" . | nindent 4 }}
spec:
exporter:
endpoint: http://{{ include "mlrun-ce.otel.collector.fullname" . }}-collector:{{ .Values.opentelemetry.collector.otlp.httpPort }}
propagators:
{{- toYaml .Values.opentelemetry.instrumentation.propagators | nindent 4 }}
sampler:
type: {{ .Values.opentelemetry.instrumentation.sampler.type }}
argument: {{ .Values.opentelemetry.instrumentation.sampler.argument | quote }}
env:
- name: OTEL_SERVICE_NAME
Comment thread
royischoss marked this conversation as resolved.
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OTEL_METRICS_EXPORTER
value: otlp
- name: OTEL_TRACES_EXPORTER
value: otlp
- name: OTEL_LOGS_EXPORTER
value: none
{{- if .Values.opentelemetry.instrumentation.python.enabled }}
python:
image: {{ .Values.opentelemetry.instrumentation.python.image.repository }}:{{ .Values.opentelemetry.instrumentation.python.image.tag }}
resourceRequirements:
{{- toYaml .Values.opentelemetry.instrumentation.python.resources | nindent 6 }}
env:
- name: OTEL_PYTHON_LOG_CORRELATION
value: "true"
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: "false"
- name: OTEL_PYTHON_DISABLED_INSTRUMENTATIONS
value: "aws_lambda"
{{- end }}
{{- if .Values.opentelemetry.instrumentation.java.enabled }}
java:
image: {{ .Values.opentelemetry.instrumentation.java.image.repository }}:{{ .Values.opentelemetry.instrumentation.java.image.tag }}
resourceRequirements:
{{- toYaml .Values.opentelemetry.instrumentation.java.resources | nindent 6 }}
env:
- name: OTEL_INSTRUMENTATION_COMMON_DEFAULT_ENABLED
value: "true"
{{- end }}
{{- end }}
{{/*
OTel pod label — marks a pod as OTel-monitored for metric enrichment and discovery.
Namespace-level instrumentation annotation (set by namespace-label job) handles Python auto-instrumentation.
Wrap usage with: {{- if and .Values.opentelemetry.collector.enabled .Values.opentelemetry.instrumentation.enabled }}
*/}}
{{- define "mlrun-ce.otel.podLabels" -}}
mlrun.io/otel: "true"
{{- end }}
Loading
Loading