Skip to content

fix: actually grab metric measurements in UI verifications query#1022

Merged
adityachoudhari26 merged 2 commits intomainfrom
grab-metric-measurements
Apr 21, 2026
Merged

fix: actually grab metric measurements in UI verifications query#1022
adityachoudhari26 merged 2 commits intomainfrom
grab-metric-measurements

Conversation

@adityachoudhari26
Copy link
Copy Markdown
Member

@adityachoudhari26 adityachoudhari26 commented Apr 21, 2026

fixes #1018

Summary by CodeRabbit

  • Bug Fixes
    • Fixed verification status lookup to use internal database queries instead of external HTTP requests, improving reliability and response times.

Copilot AI review requested due to automatic review settings April 21, 2026 18:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

The status protectedProcedure in the verifications router was refactored to replace HTTP-based external verification status lookup with internal database queries. The new implementation retrieves metric status records and measurements, groups measurements by metric, computes phase derivations using helper functions, and aggregates results across metrics to determine overall status.

Changes

Cohort / File(s) Summary
Verification Status Logic
packages/trpc/src/routes/verifications.ts
Replaced HTTP client call with database-driven status computation. Added helper functions for phase derivation based on failure/success thresholds and measurement streaks. Updated procedure signature to include ctx parameter. Removed TRPCError exception handling previously tied to HTTP calls.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Procedure as Status<br/>Procedure
    participant DB as Database
    participant Logic as Phase<br/>Logic

    Client->>Procedure: Request status for jobId
    Procedure->>DB: Query jobVerificationMetricStatus<br/>(where jobId = ?)
    DB-->>Procedure: metric status rows
    
    alt Metrics exist
        Procedure->>DB: Load jobVerificationMetricMeasurement<br/>rows (ordered) for all<br/>metric status IDs
        DB-->>Procedure: measurement rows
        
        Procedure->>Logic: Group measurements<br/>by metric
        Logic-->>Procedure: grouped measurements
        
        Procedure->>Logic: Derive per-metric phase<br/>(failed count, passed streak,<br/>thresholds, count cap)
        Logic-->>Procedure: phase per metric
        
        Procedure->>Logic: Aggregate phases across<br/>metrics (failed > running > passed)
        Logic-->>Procedure: final status
        
        Procedure-->>Client: status result
    else No metrics found
        Procedure-->>Client: empty string ("")
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 From HTTP calls to database dreams,
We query metrics by the seams,
Group and phase with careful math,
Following verification's path! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: actually grab metric measurements in UI verifications query' accurately describes the main change: replacing external HTTP-based verification lookup with internal database-driven computation that properly retrieves metric measurements.
Linked Issues check ✅ Passed The PR directly addresses issue #1018 by implementing proper metric measurement retrieval in the verifications query, fixing the 'stuck in verifying state' problem through database-driven status computation instead of external HTTP lookups.
Out of Scope Changes check ✅ Passed All changes are scoped to the verifications status query implementation, replacing HTTP-based lookup with database-driven computation to fix metric measurement retrieval, with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch grab-metric-measurements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes verification status computation so metrics that meet success_threshold via consecutive passes are treated as terminal/passed, preventing jobs from remaining stuck in a “verifying” state (issue #1018).

Changes:

  • Extend GetSiblingMetricStatuses terminal detection to include consecutive-pass early termination.
  • Extend GetAggregateJobVerificationStatus so metrics that hit success_threshold no longer keep the job in running.
  • Regenerate sqlc output to reflect the updated SQL.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
apps/workspace-engine/pkg/db/queries/job_verification_metric.sql Adds consecutive-pass counting logic and updates terminal/aggregate status rules to respect success_threshold.
apps/workspace-engine/pkg/db/job_verification_metric.sql.go Regenerated sqlc code embedding the updated SQL and comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +66 to +70
LEFT JOIN LATERAL (
SELECT COUNT(*)::int AS consecutive_passes
FROM job_verification_metric_measurement mm
WHERE mm.job_verification_metric_status_id = m.id
AND mm.status = 'passed'
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new consecutive_passes lateral join introduces additional scans of job_verification_metric_measurement (including a correlated MAX(measured_at) subquery). In the current schema there is no index on job_verification_metric_measurement.job_verification_metric_status_id (see apps/workspace-engine/pkg/db/queries/schema.sql:419+), so this is likely to devolve into repeated seq scans as measurement volume grows. Consider adding an index (or partial indexes) to support lookups by (job_verification_metric_status_id, status, measured_at) so these terminal-status queries remain fast.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +55
OR (
COALESCE(m.success_threshold, 0) > 0
AND COALESCE(cp.consecutive_passes, 0) >= m.success_threshold
)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the definition of is_terminal to allow early termination when success_threshold consecutive passes are reached. There are already controller-level tests for success-threshold behavior, but there doesn’t appear to be a DB/query-level regression test to ensure GetSiblingMetricStatuses (and the aggregate status query) reflects the same logic. Adding an integration test that inserts measurements and asserts terminal/aggregate status would help prevent the UI from getting stuck in "verifying" again.

Copilot uses AI. Check for mistakes.
//
// 'running' if any metric is still incomplete,
// 'failed' if any metric has exceeded its failure threshold.
// Returns ” (empty string) if the job has no verification metrics.
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated Go doc comment has a typographic closing quote (”) where it looks like it intended to show an empty string literal. This reads like a copy/paste/smart-quote artifact; please update the source SQL comment (and regenerate) so the Go comment renders correctly.

Suggested change
// Returns (empty string) if the job has no verification metrics.
// Returns "" (empty string) if the job has no verification metrics.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/trpc/src/routes/verifications.ts (1)

23-33: Minor: single-pass grouping is cleaner and O(n) instead of O(metrics × measurements).

Not a correctness issue — current rowsets are small — but for each metric the code re-scans the entire measurements array. A single pass is also easier to read:

Proposed refactor
-function groupMeasurementsByMetric(
-  metricIds: string[],
-  measurements: MeasurementRow[],
-): Map<string, MeasurementStatus[]> {
-  return new Map(
-    metricIds.map((id) => [
-      id,
-      measurements.filter((r) => r.metricId === id).map((r) => r.status),
-    ]),
-  );
-}
+function groupMeasurementsByMetric(
+  metricIds: string[],
+  measurements: MeasurementRow[],
+): Map<string, MeasurementStatus[]> {
+  const byMetric = new Map<string, MeasurementStatus[]>(
+    metricIds.map((id) => [id, []]),
+  );
+  for (const r of measurements) {
+    if (r.metricId == null) continue;
+    byMetric.get(r.metricId)?.push(r.status);
+  }
+  return byMetric;
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/trpc/src/routes/verifications.ts` around lines 23 - 33, The current
groupMeasurementsByMetric scans measurements once per metric; instead build the
Map in a single pass: initialize a Map with each metricId -> empty
MeasurementStatus[] (use metricIds array), then iterate measurements exactly
once and for each MeasurementRow r do if map.has(r.metricId)
map.get(r.metricId)!.push(r.status); finally return the map. Update the function
groupMeasurementsByMetric to use metricIds, measurements, MeasurementRow, and
MeasurementStatus as described.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/trpc/src/routes/verifications.ts`:
- Around line 35-60: computeMetricPhase currently treats failureThreshold with
strict '>' (fails after exceeding) and successThreshold with '>=' (passes when
reaching), and it treats reaching count as automatic pass when successThreshold
is null/0; confirm intended semantics and update accordingly: either unify to
symmetric comparisons (use '>=' for failures to fail when failedCount >=
failureThreshold, or use '>' for both to require exceeding) inside
computeMetricPhase (referencing failureThreshold, failedCount, successThreshold,
consecutiveSuccessCount) and change the count-cap fallback behavior (the final
if using measurements.length >= count) to instead treat inconclusive/no-success
runs as "running" or a new "inconclusive" state per spec; ensure logic still
consults getFailedCount and getConsecutiveSuccessCount and add/tests to reflect
the chosen behavior.

---

Nitpick comments:
In `@packages/trpc/src/routes/verifications.ts`:
- Around line 23-33: The current groupMeasurementsByMetric scans measurements
once per metric; instead build the Map in a single pass: initialize a Map with
each metricId -> empty MeasurementStatus[] (use metricIds array), then iterate
measurements exactly once and for each MeasurementRow r do if
map.has(r.metricId) map.get(r.metricId)!.push(r.status); finally return the map.
Update the function groupMeasurementsByMetric to use metricIds, measurements,
MeasurementRow, and MeasurementStatus as described.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 95b839eb-5976-448a-add8-0130f18be67d

📥 Commits

Reviewing files that changed from the base of the PR and between 0d77063 and 9a18598.

📒 Files selected for processing (1)
  • packages/trpc/src/routes/verifications.ts

Comment on lines +35 to +60
function computeMetricPhase(
measurements: MeasurementStatus[],
count: number,
successThreshold: number | null,
failureThreshold: number | null,
): "passed" | "failed" | "running" {
const failureLimit = failureThreshold ?? 0;
const failedCount = getFailedCount(measurements);
const consecutiveSuccessCount = getConsecutiveSuccessCount(measurements);

const hasAnyFailures = failedCount > 0;
const isFailureLimitExceeded = failureLimit > 0 && failedCount > failureLimit;
if ((failureLimit === 0 && hasAnyFailures) || isFailureLimitExceeded)
return "failed";

if (
successThreshold != null &&
successThreshold > 0 &&
consecutiveSuccessCount >= successThreshold
)
return "passed";

if (measurements.length >= count) return "passed";

return "running";
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

🏁 Script executed:

# First, explore the verifications.ts file structure and context
cd packages/trpc/src/routes && wc -l verifications.ts

Repository: ctrlplanedev/ctrlplane

Length of output: 86


🏁 Script executed:

# Check if there are test files for this module
find . -type f -name "*verification*" | head -20

Repository: ctrlplanedev/ctrlplane

Length of output: 633


🏁 Script executed:

# Search for the full context of computeMetricPhase and related functions
cd packages/trpc/src/routes && sed -n '1,100p' verifications.ts

Repository: ctrlplanedev/ctrlplane

Length of output: 3359


🏁 Script executed:

# Look for tests that might clarify the expected behavior
find . -type f \( -name "*.test.ts" -o -name "*.spec.ts" \) | xargs grep -l "computeMetricPhase\|Measurement\|threshold" 2>/dev/null | head -10

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

# Search for any documentation or comments about thresholds
rg "failureThreshold|successThreshold" --type ts -B2 -A2

Repository: ctrlplanedev/ctrlplane

Length of output: 17061


🏁 Script executed:

# Check if there's documentation about the verification logic
find . -type f \( -name "*.md" -o -name "README*" \) | xargs grep -l "verification\|threshold" 2>/dev/null

Repository: ctrlplanedev/ctrlplane

Length of output: 189


🏁 Script executed:

# Look for test files specifically for the verifications route
find . -path "*/node_modules" -prune -o -type f \( -name "*verifications*test*" -o -name "*verifications*spec*" \) -print

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

# Search for test files that might contain tests for computeMetricPhase or similar logic
rg "computeMetricPhase\|getFailedCount\|getConsecutiveSuccessCount" --type ts

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

# Look at the full status procedure to understand the flow better
cd packages/trpc/src/routes && sed -n '70,136p' verifications.ts

Repository: ctrlplanedev/ctrlplane

Length of output: 2374


🏁 Script executed:

# Check if there are any comments or documentation about verification requirements
rg "verification.*metric|threshold.*behavior" --type ts -i -B2 -A2

Repository: ctrlplanedev/ctrlplane

Length of output: 33046


🏁 Script executed:

# Look at the database schema to understand defaults and constraints
cd packages/db/src/schema && cat -n job-verification-metric.ts

Repository: ctrlplanedev/ctrlplane

Length of output: 3625


🏁 Script executed:

# Check the deployment verification documentation
find . -name "deployment-verification.mdx" -exec cat {} \;

Repository: ctrlplanedev/ctrlplane

Length of output: 6522


🏁 Script executed:

# Search for examples or documentation about threshold semantics
rg "successThreshold|failureThreshold" --type mdx -B3 -A3

Repository: ctrlplanedev/ctrlplane

Length of output: 93


🏁 Script executed:

# Look for any Go implementation in the workspace-engine (backend) that might clarify semantics
find apps/workspace-engine -name "*.go" -type f | xargs grep -l "failureThreshold\|successThreshold" 2>/dev/null | head -5

Repository: ctrlplanedev/ctrlplane

Length of output: 369


🏁 Script executed:

# Check if there are any comments or logic in the Go code
find apps/workspace-engine -name "*.go" -type f | xargs grep -B5 -A10 "failureThreshold\|successThreshold" 2>/dev/null | head -80

Repository: ctrlplanedev/ctrlplane

Length of output: 6959


🏁 Script executed:

# Get the full test file to understand the threshold semantics
cat apps/workspace-engine/pkg/oapi/job_verification_test.go

Repository: ctrlplanedev/ctrlplane

Length of output: 3045


🏁 Script executed:

# Also look at the Status() method implementation to understand the logic
rg "func.*Status\(\)" apps/workspace-engine/pkg/oapi/ -A15

Repository: ctrlplanedev/ctrlplane

Length of output: 1175


🏁 Script executed:

# Get the full Status() method implementation
sed -n '/func (jv \*JobVerification) Status()/,/^func /p' apps/workspace-engine/pkg/oapi/oapi.go | head -80

Repository: ctrlplanedev/ctrlplane

Length of output: 1391


Verify threshold semantics against your verification spec and decision on count-cap fallback.

The code has two semantic behaviors worth confirming:

  1. Asymmetric thresholds are intentional. Failure uses strict > while success uses >=: with failureThreshold = 3 the metric fails on the 4th failure, but with successThreshold = 3 it passes on the 3rd success. This is confirmed by the backend test suite (FailureLimitReached_NotFailedBeforeExceeded shows failureThreshold=2 with 2 failures does NOT fail). Decide if this asymmetry matches your intent or should be unified.

  2. Count-cap fallback behavior. When successThreshold is not set (null or 0—the database defaults), reaching count measurements without exceeding the failure limit declares success, even if no actual successes occurred (e.g., all inconclusive). This is the current design, but consider whether inconclusive runs should be treated as "running" or "inconclusive" instead of "passed".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/trpc/src/routes/verifications.ts` around lines 35 - 60,
computeMetricPhase currently treats failureThreshold with strict '>' (fails
after exceeding) and successThreshold with '>=' (passes when reaching), and it
treats reaching count as automatic pass when successThreshold is null/0; confirm
intended semantics and update accordingly: either unify to symmetric comparisons
(use '>=' for failures to fail when failedCount >= failureThreshold, or use '>'
for both to require exceeding) inside computeMetricPhase (referencing
failureThreshold, failedCount, successThreshold, consecutiveSuccessCount) and
change the count-cap fallback behavior (the final if using measurements.length
>= count) to instead treat inconclusive/no-success runs as "running" or a new
"inconclusive" state per spec; ensure logic still consults getFailedCount and
getConsecutiveSuccessCount and add/tests to reflect the chosen behavior.

@adityachoudhari26 adityachoudhari26 merged commit 46198ea into main Apr 21, 2026
12 checks passed
@adityachoudhari26 adityachoudhari26 deleted the grab-metric-measurements branch April 21, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stuff is stuck in a verifying state

2 participants