Skip to content

Comments

fix: update oauth-proxy image to resolve ImagePullBackOff#663

Open
jeremyeder wants to merge 1 commit intoambient-code:mainfrom
jeremyeder:fix/update-oauth-proxy-image
Open

fix: update oauth-proxy image to resolve ImagePullBackOff#663
jeremyeder wants to merge 1 commit intoambient-code:mainfrom
jeremyeder:fix/update-oauth-proxy-image

Conversation

@jeremyeder
Copy link
Collaborator

Summary

  • Update oauth-proxy sidecar image from quay.io/openshift/origin-oauth-proxy:4.14 to registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21
  • The old image is no longer pullable, causing ImagePullBackOff on frontend pods
  • New image uses the official Red Hat registry and matches our OpenShift 4.21 cluster

Test plan

  • Verify frontend pod starts without ImagePullBackOff
  • Verify OAuth login flow still works through the proxy
  • Confirm oc get pods shows frontend pod Running with both containers ready

🤖 Generated with Claude Code

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude Code Review

Summary

This PR resolves a production ImagePullBackOff caused by an unreachable quay.io/openshift/origin-oauth-proxy:4.14 image, and bundles updates to Grafana (11→12 major bump) and OTel Collector (0.94→0.146, 52 minor versions). The oauth-proxy fix is well-motivated and the two patch files are updated consistently. However, the OTel Collector jump has a high probability of breaking the observability stack due to a removed exporter, and the Grafana major version bump lacks any test coverage.

Issues by Severity

🚫 Blocker Issues

OTel Collector logging exporter removed in v0.146.0

components/manifests/observability/otel-collector.yaml lines 36–39 and 46 still reference the logging exporter:

exporters:
  logging:
    loglevel: info
    sampling_initial: 5
    sampling_thereafter: 200
...
exporters: [prometheus, logging]

The OTel Collector logging exporter was deprecated in favour of the debug exporter starting around v0.96 and was removed in later releases. Jumping from 0.94.0 to 0.146.0 will almost certainly cause the collector pod to fail on startup with a configuration error (unknown exporter type: logging), breaking the entire observability pipeline.

Required fix: Replace the logging exporter block with debug:

exporters:
  debug:
    verbosity: normal
...
exporters: [prometheus, debug]

Note: debug does not accept loglevel, sampling_initial, or sampling_thereafter — those fields must be removed.


🔴 Critical Issues

registry.redhat.io requires authenticated pull access

Unlike quay.io/openshift, registry.redhat.io requires a Red Hat pull secret. On a production OCP cluster the global pull secret (openshift-config/pull-secret) typically covers this automatically, but:

  1. The manifests contain no imagePullSecrets entry and no documentation of this requirement.
  2. Fresh clusters, kind/minikube dev clusters, or clusters with expired/missing pull secrets will reproduce the original ImagePullBackOff on registry.redhat.io.
  3. The project docs for kind-based development would need updating to note this dependency.

Recommended: Add a note in the manifests README or deployment docs about the pull-secret requirement, and confirm whether the existing cluster pull secret covers registry.redhat.io.


🟡 Major Issues

Grafana 11 → 12 is a major version bump with no test coverage

Grafana 12 introduced breaking changes: removed legacy APIs, changed default visualization behaviours, and updated the provisioning schema. The test plan only covers the OAuth login flow — Grafana is not mentioned. The dashboard JSON uses "schemaVersion": 39 which should be forward-compatible, but datasource provisioning and the admin credential setup (GF_SECURITY_ADMIN_USER/PASSWORD) should be verified against Grafana 12.

Recommended: Add a test plan item for Grafana: confirm the pod starts, the datasource provisions correctly, and at least one panel renders data.


PR scope creep — two unrelated concerns in one fix

The PR title and description describe fixing the oauth-proxy ImagePullBackOff. The second commit silently adds Grafana and OTel Collector upgrades. These are legitimate improvements but bundling them:

  • Makes the PR harder to revert cleanly if only one change causes a regression
  • Leaves the test plan incomplete (no coverage for observability)
  • Obscures what changed in the git log for operators triaging incidents

Recommended: Separate the observability image updates into their own PR, or expand the PR description and test plan to explicitly cover Grafana and OTel.


🔵 Minor Issues

Mutable image tags (no digest pinning)

All three images use floating version tags:

  • registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21
  • grafana/grafana:12.3.3
  • otel/opentelemetry-collector-contrib:0.146.0

These tags can be silently overwritten by a registry push, making deployments non-reproducible and harder to audit. Using immutable SHA digests (image@sha256:...) is a security and reproducibility best practice.


Duplicate oauth-proxy spec across two patch files

frontend-oauth-deployment-patch.yaml and frontend-oauth-patch.yaml both define the full oauth-proxy container spec (image, args, probes, resources, volumeMounts). This is a pre-existing issue but this PR updates both files identically, confirming the duplication. Any future change to the proxy config will require a two-file edit.

This is not a blocker for this PR but worth tracking as a maintenance debt item.


Positive Highlights

  • Root cause is correctly diagnosed — switching away from an unavailable community image to the official Red Hat registry image that matches the cluster version (OCP 4.21) is the right fix.
  • Both patch files updated consistently — the image reference is changed in all three places where it appears (two YAML files + README example).
  • Clear, informative commit messages — both commits explain why the change was made, not just what changed.
  • Appropriate OTel version target — 0.146.0 is current; catching up 52 minor versions is the right direction, the issue is only the configuration compatibility gap.

Recommendations

Prioritised action items before merging:

  1. Fix the OTel loggingdebug exporter (blocker — prevents OTel pod from starting)
  2. Document the registry.redhat.io pull secret requirement (critical — will bite fresh clusters)
  3. Add Grafana to the test plan and verify it starts cleanly on 12.x
  4. Consider splitting the observability bump into a follow-up PR or expand the PR description to cover it

🤖 Generated with Claude Code


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@jeremyeder jeremyeder force-pushed the fix/update-oauth-proxy-image branch from c513ba3 to 6a17f74 Compare February 21, 2026 04:59
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude Code Review

Summary

This PR addresses a production ImagePullBackOff on the frontend pod by updating the oauth-proxy image, and bundles in observability stack updates (Grafana 11→12, OTel 0.94→0.146). The oauth-proxy fix is correct and necessary. However, the OTel version jump introduces a breaking configuration change that will prevent the collector from starting, and the Grafana major version bump warrants explicit compatibility validation.

Issues by Severity

Blocker Issues

OTel Collector: logging exporter removed in v0.107.0

The otel-collector.yaml config references the logging exporter, which was deprecated in v0.102.0 and removed in v0.107.0. The jump from 0.94.0 -> 0.146.0 crosses this removal boundary. The collector pod will fail to start with an error like unknown exporter "logging".

The logging exporter config in the current file must be replaced with the debug exporter before this version bump lands. The debug exporter uses verbosity: normal instead of loglevel: info, and does not support sampling_initial or sampling_thereafter — those fields must be removed.

Verify quay.io/openshift/origin-oauth-proxy:4.21 tag exists

The commit history shows the first fix used registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21 (requires pull secret), then reverted to quay.io. The origin-oauth-proxy images on quay.io are community/CI builds following OCP release cadence and may not have a :4.21 tag if OCP 4.21 hasn't produced a corresponding community build. The original :4.14 became unpullable — the same risk exists for :4.21 if the tag doesn't exist or hasn't been published.

Verify before merging: skopeo inspect docker://quay.io/openshift/origin-oauth-proxy:4.21

If the tag doesn't exist, consider a known-good quay.io tag (e.g., :4.16, :4.17) or switch to the RHEL image with a pull secret configured.

Critical Issues

Grafana 11 -> 12 is a major version bump with no observability test coverage

Grafana 12.x contains breaking changes relative to 11.x (plugin API, authentication, dashboard rendering). The test plan only covers the oauth-proxy flow — no Grafana validation steps. At minimum verify: pod starts and is healthy, the ambient-operator dashboard (schemaVersion: 39 in the ConfigMap) renders correctly, and all panels load without errors.

Major Issues

Scope creep couples an urgent fix to two unvalidated observability updates

The PR bundles three independent changes: oauth-proxy 4.14 -> 4.21 (urgent production fix), Grafana 11.4.0 -> 12.3.3 (major version bump), and OTel Collector 0.94.0 -> 0.146.0 (52 minor versions, crosses a breaking change boundary). If OTel or Grafana regressions appear post-merge, rollback also reverts the oauth-proxy fix that unblocked production. These should be decoupled.

Minor Issues

No digest pinning for production images — mutable version tags were the root cause of the original ImagePullBackOff. Digest-pinned references (image@sha256:<digest>) prevent this class of failure from recurring.

Duplicate oauth-proxy patch files (pre-existing) — both frontend-oauth-patch.yaml and frontend-oauth-deployment-patch.yaml define the full oauth-proxy container spec, requiring every future change to be made in two places.

Positive Highlights

  • Correct registry choice: Reverting from registry.redhat.io (requires pull secret) to quay.io is right. The commit history shows good iterative reasoning — attempted the Red Hat registry, identified the pull secret requirement, corrected course.
  • Tight diff: 5 files, +5/-5 lines — surgical and easy to review.
  • Commit messages explain the why: Each commit body documents the decision rationale, not just the change.
  • README kept in sync: The doc example was updated to match the production patch.

Recommendations

  1. Block on OTel config fix: Migrate logging exporter to debug exporter (remove loglevel, sampling_initial, sampling_thereafter; add verbosity: normal) before the 0.94 -> 0.146 bump can safely land.
  2. Verify the quay.io tag is pullable: Run skopeo inspect and add the result as evidence.
  3. Consider splitting the PR: The oauth-proxy change is 3 files, 3 lines and can merge immediately to unblock production. OTel + Grafana follow in a separate PR with a proper observability test plan.
  4. Add digest pinning to production overlays to prevent future tag-gone failures.

Generated with Claude Code


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude Code Review

Summary

PR #663 fixes an ImagePullBackOff on frontend pods by bumping the oauth-proxy image from :4.14 to :4.21. The diff is clean and minimal, but there are a few issues worth addressing — notably a significant discrepancy between what the PR description claims to do versus what the code actually does, and two bundled unrelated dependency bumps that deserve separate treatment.


Issues by Severity

🚫 Blocker Issues

None.


🔴 Critical Issues

1. PR description claims a registry migration that didn't happen

The PR body states:

"New image uses the official Red Hat registry"
registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21

But both patch files show the image stayed on the same quay.io/openshift community registry, just with the version bumped:

# What the diff actually shows:
image: quay.io/openshift/origin-oauth-proxy:4.21

This matters for two reasons:

  • If the root cause was the quay.io/openshift registry becoming unreliable for that tag, staying on the same registry may be fragile.
  • If the cluster requires the official registry.redhat.io image (which requires a Red Hat pull secret), the described fix was never actually applied.

Action needed: Clarify whether the intent was to stay on quay.io/openshift (simpler, no pull secret needed) or migrate to registry.redhat.io (supported, requires pull secret). If staying on quay.io, update the PR description. If migrating, apply the described change.


2. Grafana 11 → 12 major version bump with no migration notes

grafana/grafana:11.4.0grafana/grafana:12.3.3 is a major version jump. Grafana 12 introduced breaking changes including:

  • New default authentication system changes
  • Plugin API compatibility breaks
  • Dashboard schema changes in some cases

This bump has no mention in the PR title, description, or test plan, and is not related to the stated oauth-proxy fix.

Action needed: Either split this into a separate PR with its own test plan verifying Grafana dashboards and datasources still function, or add explicit validation steps.


🟡 Major Issues

3. Unrelated changes bundled into a focused fix PR

Three of the five changed files address components not mentioned in the PR title or description:

  • observability/grafana.yaml: Grafana 11.4.0 → 12.3.3
  • observability/otel-collector.yaml: OTel Collector 0.94.0 → 0.146.0

The OTel Collector jump spans 52 minor versions (0.94 → 0.146). While likely safe given semver conventions at the 0.x level, it warrants its own review and test plan.

Recommendation: Keep infra dependency bumps in a separate PR from targeted fixes. This keeps rollback scope narrow and CI signal meaningful.


🔵 Minor Issues

4. Image tags use floating version strings, not digests

All images in this PR use mutable tags (:4.21, :12.3.3, :0.146.0). Tags can be overwritten. For production manifests, pinning by digest provides reproducibility and prevents supply-chain tag mutation:

# Example with digest pinning:
image: quay.io/openshift/origin-oauth-proxy:4.21@sha256:<digest>

This is a pre-existing pattern in the repo, so not blocking, but worth addressing incrementally.

5. --https-address= flag in oauth-proxy config is empty (pre-existing)

In both patch files, the oauth-proxy is configured with:

- --http-address=:8443
- --https-address=

This runs HTTP (not HTTPS) on port 8443, which is unconventional. TLS termination appears to happen at the Route level, so this may be intentional, but it's worth a comment in the YAML to document the intent. This is pre-existing and not introduced by this PR.


Positive Highlights

  • Consistent update: Both frontend-oauth-patch.yaml and frontend-oauth-deployment-patch.yaml were updated together — no partial update drift.
  • README kept in sync: manifests/README.md was updated to match the actual image tag, maintaining documentation accuracy.
  • Minimal diff: The core fix (oauth-proxy version bump) is exactly 2 lines changed in the right places — easy to review and revert if needed.
  • No security regressions: The oauth-proxy configuration itself (scopes, secrets, delegate URLs) is unchanged, preserving the existing auth posture.

Recommendations

  1. Resolve the registry discrepancy — confirm quay.io/openshift/origin-oauth-proxy:4.21 is the intended final image, and update the PR description to match reality.
  2. Split Grafana/OTel updates into a separate PR — with its own test plan covering dashboard rendering and metric pipeline continuity.
  3. Verify Grafana 12 compatibility before merging the observability changes — check plugins, datasource config format, and any existing dashboard JSONs for schema compatibility.
  4. Consider adding the actual fix validation from the test plan (oc get pods showing both containers Ready) as a CI step or at minimum a screenshot in the PR.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@jeremyeder jeremyeder force-pushed the fix/update-oauth-proxy-image branch 3 times, most recently from 04f921d to 23e3495 Compare February 21, 2026 05:03
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude Code Review

Summary

PR #663 fixes a critical ImagePullBackOff issue by updating the oauth-proxy sidecar image. It also bundles Grafana and OTel Collector version bumps. The change is small (5 files, net 0 lines) but contains significant issues that need to be addressed before merging.

Issues by Severity

🚫 Blocker Issues

1. PR description and commit message do not match the actual code changes

The PR body and commit message both claim:

Update oauth-proxy sidecar image from quay.io/openshift/origin-oauth-proxy:4.14 to registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21

The actual diff shows:

# What the diff actually contains:
image: image-registry.openshift-image-registry.svc:5000/openshift/oauth-proxy:v4.4

The image registry, image name, and version are all different from what was described. This makes it impossible to verify intent without additional context. Reviewers and operators cannot trust that what was merged matches what was tested or described.

2. OTel Collector config is likely broken with v0.146.0

The config in observability/otel-collector.yaml uses the logging exporter:

exporters:
  logging:
    loglevel: info
    sampling_initial: 5
    sampling_thereafter: 200

The logging exporter was deprecated in favor of debug around v0.92 and removed from opentelemetry-collector-contrib in v0.104+. Jumping from v0.94.0 to v0.146.0 with this config means the OTel Collector pod will fail to start due to an unrecognized exporter. If this PR is merged and deployed, observability will break entirely.

Required fix: Replace the logging exporter with the debug exporter:

exporters:
  debug:
    verbosity: normal
    sampling_initial: 5
    sampling_thereafter: 200

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource]
      exporters: [prometheus, debug]  # was: logging

🔴 Critical Issues

3. Internal cluster registry creates a hard deployment dependency

image: image-registry.openshift-image-registry.svc:5000/openshift/oauth-proxy:v4.4

The image-registry.openshift-image-registry.svc:5000 address is the internal OpenShift image registry, accessible only from within the cluster. This:

  • Breaks kind local developmentmake kind-up deploys to a plain Kubernetes cluster where this address does not resolve and the openshift/oauth-proxy imagestream does not exist
  • Breaks any fresh OpenShift cluster deployment — the image must already be present in the cluster's internal registry; it will not be pulled from an external source
  • Breaks CI/CD pipelines running outside the cluster
  • Not portable — conflicts with CLAUDE.md's explicit support for both kind and OpenShift CRC environments

The openshift namespace imagestream oauth-proxy is available in OpenShift clusters by default, but the v4.4 tag is very old (OAuth Proxy 4.4 corresponds to OpenShift 4.4, released 2020). Using this on a 4.21 cluster introduces a significant version gap.

🟡 Major Issues

4. Grafana major version bump (11.4.0 → 12.3.3) is undocumented and untested

A major version bump (11 → 12) typically includes breaking changes. Grafana 12 introduced changes to the Angular plugin deprecation enforcement, auth configuration, and alerting. The test plan only mentions verifying the OAuth login flow — there is no mention of verifying:

  • Grafana dashboard rendering still works
  • Datasource configuration is still valid
  • Admin credentials still work (GF_SECURITY_ADMIN_PASSWORD: admin hardcoded in the manifest is a separate concern)

5. Bundling unrelated changes in a focused fix PR

The PR title and description are scoped to the oauth-proxy ImagePullBackOff fix. However, the commit also bumps:

  • Grafana: 11.4.0 → 12.3.3
  • OTel Collector: 0.94.0 → 0.146.0

These are large, independent changes with their own risk profiles. Bundling them makes the fix harder to revert if one of the observability changes causes a regression. The OTel version in particular appears to have a breaking config change (see blocker #2 above), which was likely not caught because the OTel Collector is optional/supplementary to the core platform.

🔵 Minor Issues

6. README documentation now references an internal registry URL

# components/manifests/README.md (updated)
image: image-registry.openshift-image-registry.svc:5000/openshift/oauth-proxy:v4.4

The README is a developer reference document. Using an internal, non-portable registry URL in documentation will mislead developers setting up non-OpenShift environments. The README should show a portable image reference (e.g., from quay.io or registry.redhat.io) and note that the production overlay uses the internal imagestream.

7. GF_SECURITY_ADMIN_PASSWORD: admin in Grafana manifest

Not introduced by this PR, but worth flagging since it's visible: grafana.yaml hardcodes admin/admin credentials with GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD. This is acceptable for a development/observability sidecar but should be tracked as a known risk.

Positive Highlights

  • The change correctly identifies and targets a real ImagePullBackOff failure
  • The use of kustomize overlays is correct — only the production overlay is modified, preserving local dev behavior (or would, if the image were portable)
  • The diff is clean and minimal — five files, five changed lines
  • PR uses the conventional commit format and has a structured description with a test plan checklist

Recommendations

Prioritized action items:

  1. Fix the OTel Collector config — change logging exporter to debug before this PR can be safely deployed (blocker)
  2. Clarify the oauth-proxy image — update the PR description to accurately reflect what image was chosen and why. If using the internal OpenShift imagestream is intentional (avoiding Red Hat registry pull secrets), document this explicitly and note it will not work with kind
  3. Verify and document Grafana 12 compatibility — add Grafana validation to the test plan and confirm dashboards render correctly
  4. Consider splitting the observability bumps into a separate PR to isolate risk from the urgent oauth-proxy fix
  5. Update the README to use a portable image reference with a note about the production overlay

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude Code Review

Summary

PR #663 updates container image versions across production manifests to resolve an ImagePullBackOff on the OAuth proxy sidecar. The change is manifest-only (no application code), touching 5 files across the oauth-proxy, Grafana, and OTel Collector configurations. The fix is operationally necessary but has a critical discrepancy between the PR description and the actual diff, plus unrelated scope bundled in.


Issues by Severity

🚫 Blocker Issues

PR description does not match the actual changes

The PR description claims:

Update oauth-proxy sidecar image from `quay.io/openshift/origin-oauth-proxy:4.14` to `registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21`
New image uses the official Red Hat registry and matches our OpenShift 4.21 cluster

The actual diff applies a different change:

-  image: quay.io/openshift/origin-oauth-proxy:4.14
+  image: quay.io/openshift/origin-oauth-proxy:4.21

The registry stays the same (quay.io/openshift), only the version tag is bumped. This matters because:

  1. If the root cause is that quay.io/openshift/origin-oauth-proxy images are no longer published/maintained (a known issue with community OpenShift images), then 4.21 may also fail to pull.
  2. The test plan asks to "verify OAuth login flow still works" but the actual change (same registry, new version) has different risk than described (new registry with auth requirements).

Before merging, confirm that quay.io/openshift/origin-oauth-proxy:4.21 actually exists and is pullable from your cluster. The previous commit 23e3495 already fixed other stale images — validate this one resolves the same way. If the intent was to migrate to registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21, that requires a pull secret (registry.redhat.io is authenticated), which is a separate concern not addressed here.


🔴 Critical Issues

Grafana 11.4.0 → 12.3.3 is a major version jump with no rationale

components/manifests/observability/grafana.yaml upgrades Grafana across a full major version boundary (v11 → v12). Major Grafana releases can include:

  • Breaking changes to dashboard JSON schema
  • Plugin API incompatibilities
  • Database migration requirements
  • Changed default configurations

This upgrade is not mentioned anywhere in the PR title, description, summary, or test plan. It appears to be an opportunistic bump bundled into a hotfix. A major-version Grafana upgrade should:

  1. Be a separate PR with its own test plan
  2. Explicitly verify existing dashboards still render correctly
  3. Account for any Grafana provisioning or datasource config changes

OTel Collector 0.94.0 → 0.146.0 (52 minor versions) with no rationale

components/manifests/observability/otel-collector.yaml jumps 52 minor versions. Collector releases can deprecate or remove receivers, processors, and exporters between versions. The logging exporter used in the config was deprecated in favor of debug in recent OTel releases — this may cause a startup failure. The test plan has no validation step for observability stack health.


🟡 Major Issues

Scope creep in a hotfix PR

This is described as a targeted fix for ImagePullBackOff on the frontend pod, but it bundles Grafana and OTel image bumps that are completely unrelated to the stated problem. Hotfix PRs should be minimal to reduce blast radius and make rollback straightforward. If any of the observability changes causes an issue, it complicates attribution.

Liveness/readiness probe scheme vs. port naming is confusing

In both frontend-oauth-deployment-patch.yaml and frontend-oauth-patch.yaml:

ports:
- containerPort: 8443
  name: dashboard-ui
livenessProbe:
  httpGet:
    path: /oauth/healthz
    port: dashboard-ui
    scheme: HTTP   # HTTP probe on port 8443

Port 8443 strongly implies HTTPS by convention. It works here because --https-address= is explicitly left empty (HTTPS disabled on the proxy), and --http-address=:8443 puts HTTP on that port. This is correct but misleading. Adding a comment or renaming the port to oauth-http would prevent future confusion, especially if TLS is ever enabled on this container.


🔵 Minor Issues

README code example updated but inconsistently (components/manifests/README.md:134)

The README example was updated from 4.144.21, which is good. However, if the intent described in the PR was to switch to registry.redhat.io, the README example would need a more substantial update. As-is the README is now consistent with the actual manifests, which is fine assuming quay.io/openshift is the correct registry.

No securityContext on observability containers

Neither grafana.yaml nor otel-collector.yaml define a securityContext. Per project standards (CLAUDE.md), container security contexts with AllowPrivilegeEscalation: false and Capabilities.Drop: [\"ALL\"] should be set on all pods. This is pre-existing, not introduced by this PR, but worth tracking.

Hardcoded Grafana admin credentials (grafana.yaml:66-67)

- name: GF_SECURITY_ADMIN_PASSWORD
  value: admin

Pre-existing issue, not introduced here. Credentials should come from a Secret reference. Flagging for awareness since the Grafana upgrade makes it more visible.


Positive Highlights

  • The core change (bumping the OAuth proxy version) is minimal and low-risk for that specific component
  • All three production manifests that reference the oauth-proxy image were updated consistently (deployment patch, oauth patch, README)
  • Resource limits and probes on the oauth-proxy sidecar are well-configured with appropriate values
  • The PR links a clear test plan for verifying the OAuth flow

Recommendations

  1. Verify image pullability before merge: Run podman pull quay.io/openshift/origin-oauth-proxy:4.21 or check the quay.io tag list to confirm the tag exists. If it does not, the PR description's suggested registry.redhat.io image may be the correct path — but requires a pull secret setup.

  2. Split the Grafana and OTel bumps into a separate PR with a proper test plan that includes: verifying dashboards render, checking OTel pipeline logs for deprecation warnings, and validating the logging exporter is still supported in 0.146.0.

  3. Add a comment to the probe/port configuration in the oauth patch files to clarify that HTTP runs on 8443 intentionally.

  4. Update the PR description to accurately reflect the actual change (same registry, new version) if quay.io/openshift/origin-oauth-proxy:4.21 is confirmed to exist and resolve the pull issue.


🤖 Generated with Claude Code


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude Code Review

Summary

This PR fixes an ImagePullBackOff issue on frontend pods by bumping the oauth-proxy image tag from 4.14 to 4.21 on quay.io/openshift. It also bundles unrelated observability upgrades: Grafana 11.4.0 → 12.3.3 (major version) and OTel Collector 0.94.0 → 0.146.0 (52 minor versions). The core oauth-proxy fix is straightforward, but there are significant concerns with the bundled changes.

Issues by Severity

🚫 Blocker Issues

OTel Collector config incompatible with v0.146.0

The otel-collector.yaml config references the logging exporter, which was deprecated and subsequently removed in the OTel Collector Contrib project (removed around v0.113.0). At v0.146.0, the collector will fail to start with an unknown exporter error.

# components/manifests/observability/otel-collector.yaml
exporters:
  logging:           # <-- This exporter no longer exists in v0.146.0
    loglevel: info
    sampling_initial: 5
    sampling_thereafter: 200
service:
  pipelines:
    metrics:
      exporters: [prometheus, logging]   # <-- Will cause startup failure

Fix: Replace logging with the debug exporter:

exporters:
  debug:
    verbosity: normal
    sampling_initial: 5
    sampling_thereafter: 200
service:
  pipelines:
    metrics:
      exporters: [prometheus, debug]

🔴 Critical Issues

PR description does not match actual diff

The PR body states:

"New image uses the official Red Hat registry" and shows: registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21

The actual diff changes to: quay.io/openshift/origin-oauth-proxy:4.21

These are different images from different registries with different support commitments. quay.io/openshift/origin-oauth-proxy is an upstream/community image; registry.redhat.io/openshift4/ose-oauth-proxy-rhel9 is the Red Hat supported image. The PR description and the code are inconsistent — reviewers and future operators cannot know which was intended.

Image availability not confirmed

The original problem is that quay.io/openshift/origin-oauth-proxy:4.14 is no longer pullable. Before merging, confirm that quay.io/openshift/origin-oauth-proxy:4.21 is actually pullable:

docker pull quay.io/openshift/origin-oauth-proxy:4.21

If the quay.io/openshift namespace is being deprecated or rate-limited (which is likely if 4.14 became unavailable), 4.21 may suffer the same fate. If so, the description's intent to use registry.redhat.io was the correct long-term fix, and the PR should implement what was described.

🟡 Major Issues

Grafana major version bump (11 → 12) bundled without dedicated testing

Grafana 12 contains breaking changes from v11, including plugin API changes and revised visualization options. The test plan covers only the oauth-proxy flow (oc get pods, OAuth login) — it does not include:

  • Verifying Grafana dashboards load correctly at v12
  • Confirming the Prometheus datasource overlay patches are still compatible
  • Checking that installed plugins (if any) are compatible with Grafana 12

This should either be a separate PR with its own test plan or the test plan should be expanded to cover Grafana.

OTel Collector version jump spans 52 minor releases without changelog review

The jump from 0.94.0 to 0.146.0 crosses many breaking config changes and exporter/receiver restructuring. The config should be validated against the v0.146.0 changelog before merging. Beyond the logging exporter issue above, there may be additional configuration keys that have been renamed or removed.

Bundled unrelated changes

The oauth-proxy fix is an urgent production fix (ImagePullBackOff). Bundling Grafana 11→12 and OTel 0.94→0.146 with it means: (a) the urgent fix can't be merged independently if the observability upgrades have issues, and (b) rollback becomes more complex. These should be split into separate PRs.

🔵 Minor Issues

PR title does not reflect full scope of changes

Title: fix: update oauth-proxy image to resolve ImagePullBackOff

The PR also updates Grafana and OTel Collector. Consider: fix: update stale container images (oauth-proxy, grafana, otel-collector) — or split the PR.

No image digest pinning for production

All three image references use mutable version tags. Production deployments benefit from pinning to SHA digests to prevent silent image substitution:

# Current (mutable tag):
image: quay.io/openshift/origin-oauth-proxy:4.21
# Preferred (immutable digest):
image: quay.io/openshift/origin-oauth-proxy@sha256:<digest>

This is a low-priority concern but worth tracking as a follow-up.

Positive Highlights

  • The oauth-proxy fix correctly updates both production overlay files (frontend-oauth-deployment-patch.yaml and frontend-oauth-patch.yaml) and the README documentation — no drift between the three locations.
  • The PR description clearly identifies the root cause (ImagePullBackOff) and links it to the image registry change.
  • The test plan for the oauth-proxy fix is concrete and actionable.

Recommendations

  1. Before merging: Fix the loggingdebug exporter in otel-collector.yaml — this will cause an outage if deployed as-is.
  2. Clarify intent: Decide whether the oauth-proxy fix should use quay.io/openshift/origin-oauth-proxy:4.21 (as coded) or registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21 (as described). Update the PR body or the YAML to match.
  3. Confirm image availability: Run docker pull quay.io/openshift/origin-oauth-proxy:4.21 before merging to verify the image exists.
  4. Split the PR: Merge the oauth-proxy fix urgently (it unblocks production). Open a separate PR for the Grafana and OTel collector upgrades with an appropriate test plan covering observability stack validation.

Review generated by Claude Code (claude-sonnet-4-6)


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

- oauth-proxy: 4.14 → 4.21 (was failing to pull)
- OTel Collector: 0.94.0 → 0.146.0
- Grafana: 11.4.0 → 12.3.3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jeremyeder jeremyeder force-pushed the fix/update-oauth-proxy-image branch from 23e3495 to f842165 Compare February 21, 2026 05:10
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Claude Code Review

Summary

This PR bundles four distinct changes: (1) the stated fix — updating the oauth-proxy sidecar image to resolve ImagePullBackOff; (2) migrating Python package management from pip to uv across CI workflows and the runner Dockerfile; (3) bumping Grafana from 11.4.0 to 12.3.3; and (4) bumping the OTel Collector from 0.94.0 to 0.146.0. The pip→uv migration is well-aligned with project standards, but there are concerns with the primary fix and some of the bundled changes.

Issues by Severity

🚫 Blocker Issues

1. PR description contradicts the actual code change (oauth-proxy)

The PR summary states:

New image uses the official Red Hat registry and matches our OpenShift 4.21 cluster
registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21

But the actual diff changes:

- quay.io/openshift/origin-oauth-proxy:4.14
+ quay.io/openshift/origin-oauth-proxy:4.21

These are different images from different registries:

  • quay.io/openshift/origin-oauth-proxy — upstream community build
  • registry.redhat.io/openshift4/ose-oauth-proxy-rhel9 — official Red Hat enterprise image

If the root cause of ImagePullBackOff was quay.io availability or the community image being discontinued, bumping the tag to :4.21 on the same registry may not resolve the issue. The test plan should explicitly verify that quay.io/openshift/origin-oauth-proxy:4.21 is actually pullable before merge, or the code should be updated to use registry.redhat.io as described. (Note: registry.redhat.io requires a Red Hat pull secret to be configured in the cluster.)

🔴 Critical Issues

2. Grafana 11 → 12 is a breaking major version upgrade

grafana/grafana:12.3.3 is a major version jump from 11.4.0. Grafana 12 includes breaking changes in the dashboard JSON schema, plugin API changes, and deprecated legacy alerting paths. Without an explicit migration validation, this risks breaking existing dashboards and alert rules. The PR description does not mention this change at all. This should either be a separate PR with explicit migration testing, or justified with evidence that the existing configuration is compatible.

3. OTel Collector jump of 52 minor versions (0.94.0 → 0.146.0)

The collector configuration format, receiver names, exporter names, and pipeline schema have all evolved across these versions. A jump this large has a high probability of config breakage at runtime. The test plan should include verifying the collector starts successfully and traces flow end-to-end, or this should be a separate PR.

🟡 Major Issues

4. Inconsistent uv installation in backend-unit-tests.yml

amber-dependency-sync.yml and runner-tests.yml correctly use the official astral-sh/setup-uv@v5 GitHub Action. However, backend-unit-tests.yml uses:

pip install uv
uv pip install --system junit2html

This is inconsistent, installs potentially older uv builds, and misses GHA caching. It should use astral-sh/setup-uv@v5 like the other workflows.

5. PR scope creep makes rollback harder

The PR mixes an urgent production fix (oauth-proxy) with tooling migration (pip→uv across 4 files) and observability upgrades (Grafana, OTel). If the Grafana or OTel change causes a production issue, rolling back will revert the oauth-proxy fix. These should ideally be separate PRs for clean git history and independent rollback capability.

🔵 Minor Issues

6. Missing --no-cache in Dockerfile uv install

# Before (correctly avoids layer bloat)
RUN pip install --no-cache-dir '/app/claude-runner[all]'

# After (uv's cache stored in /root/.cache/uv, baked into the layer)
RUN uv pip install --system '/app/claude-runner[all]'

Add --no-cache to avoid baking uv's cache into the image layer:

RUN uv pip install --system --no-cache '/app/claude-runner[all]'

7. README.md example not updated to match stated intent

If the true fix is registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21 (as described), the README example in components/manifests/README.md should reflect that. It currently also shows quay.io/openshift/origin-oauth-proxy:4.21.

Positive Highlights

  • uv adoption is correct: CLAUDE.md standards explicitly state "Package manager: Prefer uv over pip". This PR correctly moves CI and the Dockerfile toward that standard.
  • astral-sh/setup-uv@v5 usage: The amber-dependency-sync.yml and runner-tests.yml changes use the proper GitHub Action with caching enabled — the right approach.
  • --system flag is correct: Using uv pip install --system is the right flag for installing into the system Python (not a venv), consistent with the Dockerfile's non-venv context.
  • Fixes a real production blocker: The intent to resolve ImagePullBackOff is valid and necessary.

Recommendations

  1. [Blocker] Verify quay.io/openshift/origin-oauth-proxy:4.21 is actually pullable from your cluster, or update the code to use registry.redhat.io/openshift4/ose-oauth-proxy-rhel9:v4.21 as described (with pull secret configured). Align the description and README with the actual image used.
  2. [Critical] Either validate Grafana 12 compatibility with the existing dashboard/alert configs, or move the Grafana bump to its own PR.
  3. [Critical] Either validate OTel Collector 0.146.0 config compatibility against otel-collector-config.yaml, or move that bump to its own PR.
  4. [Major] Update backend-unit-tests.yml to use astral-sh/setup-uv@v5 instead of pip install uv.
  5. [Minor] Add --no-cache to the Dockerfile's uv pip install --system line.

🤖 Generated with Claude Code


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant