Skip to content

feat: add automated model discovery with feature-flag-gated availability#692

Draft
maskarb wants to merge 5 commits intoambient-code:mainfrom
maskarb:001-model-discovery
Draft

feat: add automated model discovery with feature-flag-gated availability#692
maskarb wants to merge 5 commits intoambient-code:mainfrom
maskarb:001-model-discovery

Conversation

@maskarb
Copy link
Contributor

@maskarb maskarb commented Feb 25, 2026

No description provided.

maskarb and others added 3 commits February 25, 2026 09:27
Replace hardcoded model arrays across backend, frontend, operator, and
runner with a single model manifest (models.json) deployed as a ConfigMap.
Backend serves models via a new API endpoint filtered by Unleash feature
flags. Operator resolves Vertex AI model IDs from the manifest and injects
them as env vars for runners. A daily GitHub Action probes Vertex AI
endpoints and auto-creates PRs for manifest updates with corresponding
disabled Unleash flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add missing Next.js API route handler for /projects/[name]/models that
proxies to the backend, fixing 404 in the UI. Defer the models query
until the create-session dialog is opened and invalidate the cache on
each open to ensure fresh data after feature flag changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Claude Code Review

Summary

PR #692 adds automated model discovery via a scheduled GHA workflow, a backend /models endpoint backed by a Kubernetes ConfigMap, feature-flag-gated model availability via Unleash, operator-level Vertex AI model ID resolution, and frontend React Query integration. The layered fail-open/fallback strategy is well thought out. Auth is correctly enforced via ValidateProjectContext() middleware. Test coverage spans all three components. A few issues need attention before merging.


Issues by Severity

🚫 Blocker Issues

None (PR is in DRAFT).


🔴 Critical Issues

1. Non-existent GitHub Actions versions — workflow will fail immediately

.github/workflows/model-discovery.yml references:

uses: actions/checkout@v6       # ❌ latest is @v4
uses: actions/setup-python@v6   # ❌ latest is @v5

These produce an Unable to resolve action error and the workflow will never run. Fix:

uses: actions/checkout@v4
uses: actions/setup-python@v5

🟡 Major Issues

2. Form defaultValues is static — dynamic defaultModel from API is silently dropped

In create-session-dialog.tsx, useForm defaultValues are set once at mount. Since modelsData is undefined on first render (query enabled: open — runs only after dialog opens), defaultModel evaluates to the hardcoded "claude-sonnet-4-5" fallback and never updates when the API responds.

If the manifest's defaultModel ever differs from "claude-sonnet-4-5", the form will always submit the wrong default silently. Fix:

useEffect(() => {
  if (modelsData?.defaultModel) {
    form.setValue("model", modelsData.defaultModel, { shouldDirty: false });
  }
}, [modelsData?.defaultModel, form]);

3. loadManifest() makes an uncached K8s API call on every request

loadManifest() is called on every GET /models AND on every POST /agentic-sessions (via isModelAvailable). The manifest changes only when the CI workflow merges a PR. A short TTL in-process cache (e.g. 60 s) or a ConfigMap watch would eliminate this latency from the session-creation hot path.

4. ResolveVertexID in the operator ignores the Available flag

operator/internal/models/models.goResolveVertexID returns the vertexId even when a model has available: false. The backend isModelAvailable check prevents session creation in the first place, but the operator lacks defence-in-depth:

// Fix: add Available guard
if m.ID == modelID && m.Available {
    return m.VertexID
}

5. context.Background() used in request-scoped K8s calls

Both handlers/models.go and operator/internal/models/models.go call K8s with context.Background(). Client disconnects or request timeouts won't cancel in-flight K8s calls. Accept ctx context.Context in loadManifest/LoadManifest and pass c.Request.Context() from the handler.


🔵 Minor Issues

6. invalidateQueries on every dialog open bypasses the 60 s staleTime

const handleTriggerClick = () => {
  queryClient.invalidateQueries({ queryKey: modelKeys.all }); // ⚠️ unconditional
  setOpen(true);
};

This triggers a network request every time the dialog opens, even when data is 2 seconds old. Remove the invalidation — staleTime: 60_000 plus enabled: open already handles freshness correctly.

7. No UI error feedback when models API fails

The component checks isLoading but not isError. On API failure it silently falls back to fallbackModels with no indication to the user. Consider a subtle warning when the fallback is active (e.g. a tooltip on the model selector).

8. Python urllib imports inside functions

In scripts/create-model-flags.py, urllib.request and urllib.error are imported inside flag_exists and create_flag. Move them to module level per PEP 8 and to avoid repeated import overhead in loops.

9. /models route is project-scoped but models are platform-global

The handler ignores c.Param("projectName") — the same list is returned for every project. The route would be cleaner at /api/models. If per-project filtering is a future goal, document it; otherwise consider moving the route.

10. claude-opus-4-1 in CANDIDATE_MODELS is absent from models.json and defaultModels()

The discovery script will add it when the Vertex endpoint responds, but keep the frontend fallback list and Go defaultModels() in sync as new models are discovered.


Positive Highlights

  • Auth correctly enforced: ValidateProjectContext() validates the bearer token before the handler runs. Using the backend SA to read a non-sensitive platform ConfigMap is acceptable here.
  • Fail-open strategy is consistent and documented at every layer (backend, operator, frontend) with clear log warnings at each level.
  • Clean type separation: ModelEntry/ModelManifest are intentionally duplicated between backend and operator to avoid cross-module coupling — the package comment explains this clearly.
  • Feature-flag-gated with safe defaults: new models start disabled in Unleash; only the designated default model bypasses flag checks — correct approach for gradual rollout.
  • disableNameSuffixHash: true on the ConfigMap generator ensures a stable, predictable name for K8s client reads.
  • Good test coverage: backend Ginkgo tests cover nil client, missing ConfigMap, malformed JSON, and unavailable model scenarios; operator Go tests cover all LoadManifest branches; runner tests cover all three LLM_MODEL_VERTEX_ID auth-path combinations.
  • encodeURIComponent applied to the project name in the Next.js route handler.
  • Idempotent flag creation: create-model-flags.py checks existence before creating — safe to re-run.

Recommendations

Prioritised action items before promoting from DRAFT:

  1. [Must fix] Pin GHA to actions/checkout@v4 and actions/setup-python@v5.
  2. [Must fix] Add useEffect + form.setValue to apply the API-returned defaultModel after load.
  3. [Should fix] Guard ResolveVertexID with m.Available check for defence-in-depth.
  4. [Should fix] Pass request context into loadManifest/LoadManifest.
  5. [Should fix] Remove unconditional invalidateQueries from handleTriggerClick.
  6. [Nice to have] Surface an error state in the UI when the models API call fails.
  7. [Nice to have] Add a short TTL cache in loadManifest() to avoid K8s round-trips on every session creation.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

…l discovery

Remove the unused `default` boolean from ModelEntry (defaultModel string
is the source of truth). Fix a pre-existing Go slice aliasing bug in
models_test.go where mutating a shallow-copied struct's slice corrupted
the shared test fixture. Rewrite model-discovery.py to use a curated
KNOWN_MODELS list with automatic Vertex AI version resolution via the
Model Garden get API, falling back gracefully to the manifest's existing
vertexId or @default. Fix model_id_to_label() capitalization bug.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Claude Code Review

Summary

PR #692 introduces automated model discovery with feature-flag-gated availability. The implementation spans all five components (backend, frontend, operator, runner, CI) and is architecturally coherent. The fail-open fallback strategy for model availability is well thought out. A few security pattern violations, a reactive form bug, and some missing safeguards need attention before this is ready to merge.


Issues by Severity

🚫 Blocker Issues

None — the feature is functional, but the critical issues below should be resolved before marking ready for review.


🔴 Critical Issues

1. ListModels uses the backend service account instead of a user-scoped client

components/backend/handlers/models.go:74

func loadManifest() (*types.ModelManifest, error) {
    if K8sClient == nil { ... }
    cm, err := K8sClient.CoreV1().ConfigMaps(Namespace).Get(...)  // ← service account

The documented security rule is unambiguous: "ALWAYS use user-scoped clients for API operations." Even though models.json is global config with no user-specific data, the handler never calls GetK8sClientsForRequest(c). If middleware auth ever has a gap, this handler will silently serve unauthenticated requests using the backend's elevated credentials.

Fix: Either (a) call GetK8sClientsForRequest(c) in ListModels and pass the user client to loadManifest, or (b) explicitly document with a comment why the service account is appropriate here (it reads only global config, never user data) and file a follow-up to make the exception reviewable.


2. Form default model is not reactive — API default can be silently ignored

components/frontend/src/components/create-session-dialog.tsx:89

const defaultModel = modelsData?.defaultModel ?? "claude-sonnet-4-5";

const form = useForm({
  defaultValues: {
    model: defaultModel,  // ← evaluated once at mount, before API responds
  },
})

react-hook-form defaultValues are captured on the first render. Because useModels is only enabled when open === true, the form is initialized before the fetch completes and will always use the hardcoded fallback "claude-sonnet-4-5", ignoring whatever defaultModel the server returns.

Fix:

useEffect(() => {
  if (modelsData?.defaultModel) {
    form.setValue('model', modelsData.defaultModel, { shouldValidate: false });
  }
}, [modelsData?.defaultModel]);

🟡 Major Issues

3. Cache invalidation is too broad — invalidates all projects' model caches

components/frontend/src/components/create-session-dialog.tsx:137

const handleTriggerClick = () => {
  queryClient.invalidateQueries({ queryKey: modelKeys.all });  // ← invalidates every project
  setOpen(true);
};

modelKeys.all is ['models'], which is the prefix for every project's model cache. For a user who has multiple projects open in tabs, this silently evicts all of them.

Fix: Invalidate only the current project:

queryClient.invalidateQueries({ queryKey: modelKeys.list(projectName) });

4. isModelAvailable is fail-open for unknown models

components/backend/handlers/models.go:113

// Model not found in manifest — allow it (may be an older model still in use)
return true

This means any arbitrary string (e.g. "gpt-4", "my-custom-model") passes validation and gets forwarded to the runner. The actual API will reject it, but the error surface is pushed downstream and the validation provides no real guard.

Fix: Return false for models not in the manifest, or at minimum log the unknown model ID at WARNING level so it is visible in production. If backward compatibility with pre-manifest sessions is a concern, document the specific case and add a metrics increment.


5. loadManifest is called on every request with no caching

components/backend/handlers/models.goloadManifest is called by both ListModels (every GET) and isModelAvailable (every session creation). Each call hits the Kubernetes API server.

At moderate load this will produce measurable latency on session creation. A simple TTL cache (30–60 s) would eliminate the cost in the happy path:

var (
  manifestCache   *types.ModelManifest
  manifestCacheAt time.Time
  manifestTTL     = 60 * time.Second
)

This is not blocking but will become a problem at scale.


6. GitHub Actions uses action versions that likely do not exist

.github/workflows/model-discovery.yml:30,33

uses: actions/checkout@v6
uses: actions/setup-python@v6

As of the last stable release cycle, actions/checkout is at v4 and actions/setup-python is at v5. Version @v6 for these actions does not exist and will fail at runtime. Verify the correct pinned versions and prefer SHA pinning for security:

uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2

🔵 Minor Issues

7. IsModelEnabledWithContext is defined but never called

components/backend/featureflags/featureflags.go — The context-aware variant exists but ListModels uses IsModelEnabled (no context). If per-user model flags are intended, wire in the user context from the Gin request. If not yet needed, remove to keep the surface small.

8. Leftover context import suppressed with var _ = context.Background

components/backend/handlers/models_test.go:223

// Suppress unused import warning for context package
var _ = context.Background

This indicates "context" was imported but is not actually used in the test. Remove the import and the suppression line.

9. isModelAvailable has no unit test

The function has multiple branches (empty ID, default model, available=false, flag disabled, not found) but is not directly exercised. The ListModels tests cover the HTTP layer; isModelAvailable should have its own table-driven tests since it gates session creation.

10. model-discovery.py has no tests

The script performs non-trivial logic (resolve_version, probe_model, model_id_to_label, manifest merge). At minimum, unit tests for model_id_to_label and the manifest merge loop would catch regressions when new model naming conventions appear.


Positive Highlights

  • Fail-open strategy is correct for availability: returning defaults when the ConfigMap is missing or malformed means a ConfigMap outage cannot block session creation. The layered fallback (ConfigMap → hardcoded defaults) is well implemented.
  • Operator Vertex ID injection is clean: injecting LLM_MODEL_VERTEX_ID as an env var and having the runner prefer it over the static map is a sound, backward-compatible approach.
  • Type definitions follow project conventions: models.ts uses type (not interface), separates API types from internal types, and the API client layer is pure functions. All follow the documented patterns.
  • Feature flag semantics are correct: IsModelEnabled defaults to true when Unleash is unconfigured (models available by default), in contrast to the existing IsEnabled which defaults to false (features opt-in). The asymmetry is intentional and documented.
  • ConfigMap approach as single source of truth: having models.json live in the manifest repo and be distributed as a ConfigMap gives a clear audit trail and separates model configuration from code releases.
  • Test coverage is solid: operator LoadManifest/ResolveVertexID tests, backend handler unit tests with fake K8s client, and runner auth tests all cover the new paths well.

Recommendations

Priority order:

  1. Fix critical issue Outcome: Reduce Refinement Time with agent System #1 — add GetK8sClientsForRequest to ListModels or provide a documented justification for the exception.
  2. Fix critical issue Epic: RAT Architecture & Design #2 — add form.setValue effect to sync API default model into the form.
  3. Fix major issue Epic: Testing & Validation #6 — verify GitHub Actions version tags before the workflow is enabled; a broken workflow will silently stop model discovery.
  4. Fix major issue Epic: Data Source Integration #3 — narrow the invalidation to modelKeys.list(projectName).
  5. Address major issue Epic: AI Agent Development #4 — return false for unknown model IDs or at least log + add metrics.
  6. Consider caching loadManifest (issue Epic: Jira Integration & Workflow #5) — not urgent for the initial rollout but should be addressed before GA.
  7. Minor issues can be addressed in follow-up PRs.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant