Use Codex structured outputs for threat detection parsing by Copilot · Pull Request #35061 · github/gh-aw

Copilot · 2026-05-26T21:53:25Z

Codex threat detection produces false parse_error warnings when SSE/tracing noise in the log stream prevents THREAT_DETECTION_RESULT:{...} from being scraped correctly, even when the model's verdict is clean. This replaces log scraping with Codex CLI's native structured output mechanism.

Engine changes (`codex_engine.go`)

Switches from -c response_schema='...' (config flag, result embedded in log stream) to:

--output-schema /tmp/gh-aw/threat-detection/detection_schema.json \
-o /tmp/gh-aw/threat-detection/detection_result.json

The schema file is written before Codex runs via a detectionSchemaWriteCmd:

AWF mode: written in PathSetup (host-side, before the container starts; /tmp/gh-aw/ is the shared read-write tree)
Non-AWF mode: prepended as a shell line in the command preamble

Parser changes (`parse_threat_detection_results.cjs`)

Adds parseStructuredResultFile() — reads and validates detection_result.json (checks boolean types on all three verdict fields). In runMain(), the structured file is tried first:

File present and valid → use verdict directly, skip log parsing
File present but malformed → warn and fall back to log scraping (conservative)
File absent → fall back to log scraping (backward compatible with non-Codex engines)

Test updates

codex_engine_test.go: asserts --output-schema / -o flags and correct file paths; asserts legacy response_schema flag is absent
parse_threat_detection_results.test.cjs: adds parseStructuredResultFile describe block; fs-dependent cases follow existing describe.skip pattern for the CJS mock limitation

- codex_engine.go: switch from -c response_schema config flag to --output-schema <file> and -o/--output-last-message <file> flags. Write detection JSON schema to detectionSchemaFilePath before Codex runs (AWF PathSetup for AWF mode; command preamble for non-AWF mode). Codex writes the structured verdict directly to detectionResultFilePath. - parse_threat_detection_results.cjs: add parseStructuredResultFile() helper that reads and validates detection_result.json. In runMain(), try the structured result file before falling back to log scraping. Export the new function. - constants.cjs: add DETECTION_RESULT_FILENAME constant. - codex_engine_test.go: update TestCodexEngineDetectionRunUsesStructuredOutputSchema to verify --output-schema/-o flags and schema/result file paths instead of the old response_schema config flag. - parse_threat_detection_results.test.cjs: add parseStructuredResultFile describe block with null-check test and describe.skip for fs-dependent scenarios (matching existing CJS mock limitation pattern). Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

- parse_threat_detection_results.cjs: remove potentially sensitive file content excerpt from JSON parse error message - codex_engine.go: eliminate duplicated non-AWF command construction by using a conditional schemaWriteLine variable instead of four separate fmt.Sprintf blocks Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-05-26T22:42:57Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-05-26T22:42:58Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-05-26T22:43:00Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-05-26T22:43:04Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #35061 does not have the 'implementation' label and has 92 new lines (≤100 threshold) in default business logic directories.

Copilot

Pull request overview

Replaces log scraping of THREAT_DETECTION_RESULT: with Codex CLI's native structured output (--output-schema + -o) to eliminate false parse_error warnings from noisy SSE/tracing logs. The parser tries the structured result file first and falls back to log scraping if the file is missing or malformed.

Changes:

codex_engine.go: emits --output-schema <schema> / -o <result> for detection runs and writes the schema file via PathSetup (AWF) or a preamble shell line (non-AWF).
parse_threat_detection_results.cjs: adds parseStructuredResultFile() and uses it as the primary verdict source, with conservative fallback to log parsing.
Tests/constants: new DETECTION_RESULT_FILENAME, updated Codex engine flag assertions, new (largely describe.skip-pended) parser test cases.

Show a summary per file

File	Description
pkg/workflow/codex_engine.go	Switch detection runs from `-c response_schema=...` to `--output-schema`/`-o`, write schema file via `PathSetup` or preamble.
pkg/workflow/codex_engine_test.go	Assert new flags and absence of legacy `response_schema`.
actions/setup/js/constants.cjs	Add `DETECTION_RESULT_FILENAME` constant.
actions/setup/js/parse_threat_detection_results.cjs	Add `parseStructuredResultFile`; consume structured file first in `runMain`, fall back to log parsing.
actions/setup/js/parse_threat_detection_results.test.cjs	Add `parseStructuredResultFile` test block (most cases skipped due to fs mock limitation).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 5/5 changed files
Comments generated: 3

+        // Jump directly to verdict evaluation — no log parsing needed.
+        const verdict = structuredResult.verdict;
+        core.info("📋 Threat detection verdict (from structured result file):");
+        core.info(`   prompt_injection : ${verdict.prompt_injection}`);
+        core.info(`   secret_leak      : ${verdict.secret_leak}`);
+        core.info(`   malicious_patch  : ${verdict.malicious_patch}`);
+        if (verdict.reasons && verdict.reasons.length > 0) {
+          core.info(`   reasons (${verdict.reasons.length}):`);
+          verdict.reasons.forEach((reason, i) => core.info(`     [${i + 1}] ${reason}`));
+        } else {
+          core.info("   reasons          : (none)");
+        }
+
+        // ── Evaluate verdict and set conclusion ───────────────────────────
+        const threatsDetected = verdict.prompt_injection || verdict.secret_leak || verdict.malicious_patch;
+        if (threatsDetected) {
+          const threats = [];
+          if (verdict.prompt_injection) threats.push("prompt injection");
+          if (verdict.secret_leak) threats.push("secret leak");
+          if (verdict.malicious_patch) threats.push("malicious patch");
+          const reasonsText = verdict.reasons && verdict.reasons.length > 0 ? "\nReasons: " + verdict.reasons.join("; ") : "";
+          core.error("🚨 Security threats detected: " + threats.join(", "));
+          if (verdict.reasons && verdict.reasons.length > 0) {
+            core.error("   Reasons: " + verdict.reasons.join("; "));
+          }
+          setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`);
+        } else {
+          core.info("✅ No security threats detected. Safe outputs may proceed.");
+          core.setOutput("conclusion", "success");
+          core.setOutput("success", "true");
+          core.setOutput("reason", "");
+        }
+
+        core.info("════════════════════════════════════════════════════════");
+        core.info("🛡️  Threat detection conclusion complete.");
+        core.info("════════════════════════════════════════════════════════");
+        return;


+          const reasonsText = verdict.reasons && verdict.reasons.length > 0 ? "\nReasons: " + verdict.reasons.join("; ") : "";
+          core.error("🚨 Security threats detected: " + threats.join(", "));
+          if (verdict.reasons && verdict.reasons.length > 0) {
+            core.error("   Reasons: " + verdict.reasons.join("; "));
+          }
+          setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`);


+		// Optionally prepend the detection schema write command for detection runs.
+		schemaWriteLine := ""
+		if workflowData.IsDetectionRun {
+			schemaWriteLine = detectionSchemaWriteCmd + "\n"
+		}
+
 		if harnessScriptName != "" {
 			// Harness handles prompt reading via --prompt-file; no INSTRUCTION variable needed.
 			command = fmt.Sprintf(`set -o pipefail
 printf '%%s' "$(date +%%s%%3N)" > %s
 touch %s
 (umask 177 && touch %s)
 mkdir -p "$CODEX_HOME/logs"
-%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, codexCommand, logFile)
+%s%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, schemaWriteLine, codexCommand, logFile)
 		} else {
 			command = fmt.Sprintf(`set -o pipefail
 printf '%%s' "$(date +%%s%%3N)" > %s
 touch %s
 (umask 177 && touch %s)
 INSTRUCTION="$(cat "$GH_AW_PROMPT")"
 mkdir -p "$CODEX_HOME/logs"
-%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, codexCommand, logFile)
+%s%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, schemaWriteLine, codexCommand, logFile)


github-actions · 2026-05-26T22:48:03Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 95/100 — Excellent

Analyzed 2 active test(s) across 1 Go file and 1 JavaScript file: 2 design tests, 0 implementation tests, 0 guideline violations. 8 additional tests exist but are skipped due to a CJS fs mock limitation — noted below.

📊 Metrics & Test Classification (2 active tests analyzed)

Metric	Value
New/modified tests analyzed	2 active (+ 8 skipped)
✅ Design tests (behavioral contracts)	2 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	2 (100%)
Duplicate test clusters	0
Test inflation detected	No
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`parseStructuredResultFile - should return null when file does not exist`	`actions/setup/js/parse_threat_detection_results.test.cjs`	✅ Design	None — uses real fs, verifies null return for absent file
`TestCodexEngineDetectionRunUsesStructuredOutputSchema` (modified)	`pkg/workflow/codex_engine_test.go`	✅ Design	None — table-driven, covers detection and agent run modes, positive and negative assertions

Language Support

Tests analyzed:

🐹 Go (*_test.go): 1 test — unit (//go:build !integration)
🟨 JavaScript (*.test.cjs): 1 active test + 8 skipped (vitest)

⚠️ Noted: 8 Skipped Tests in `describe.skip` (informational)

The describe.skip("with structured result file present (CJS fs mock limitation)") block contains 8 well-written tests for parseStructuredResultFile that are skipped because vi.mock("fs") does not intercept require("fs") for built-in modules in the CJS+vitest configuration used by this project (same limitation documented in safe_output_validator.test.cjs).

The skipped tests cover important behavioral contracts:

Valid clean JSON → returns verdict object
Reasons field populated → verdict.reasons contains expected values
Invalid JSON → returns { error } object
Empty file → error contains "empty"
Missing required boolean field → error names the missing field
Boolean field with wrong type → error names the incorrect field
JSON root is an array → error contains "array"
readFileSync throws → error contains "Failed to read"

These are not counted in the score (skipped tests provide no runtime coverage), but they represent good behavioral thinking. Consider whether there is a way to exercise this logic in integration/E2E tests or by restructuring parseStructuredResultFile to accept an injectable fs adapter, which would allow proper unit testing.

Verdict

✅ Check passed. 0% of active tests are implementation tests (threshold: 30%). Both active tests verify observable behavioral contracts with edge-case coverage.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.2M · ◷

github-actions

✅ Test Quality Sentinel: 95/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). Both active tests verify observable behavioral contracts with edge-case coverage.

github-actions

REQUEST_CHANGES — The structured-output approach is the right fix for the log-scraping false-positive problem, but several issues need to be resolved before merging.

Blocking themes

Verdict evaluation logic is duplicated

The structured-result path (new Step 2) re-implements the full verdict evaluation — logging each field, computing threatsDetected, assembling threats/reasonsText, calling setDetectionFailure/core.setOutput, printing the divider banner — identically to the existing Steps 6–7 log-parsing path. Any change to verdict handling now has to be made twice and the two branches will silently drift. Extract a shared evaluateAndReportVerdict(verdict, source) helper. (Existing comment.)

Non-AWF schema write failure is silently swallowed

In the non-AWF command preamble schemaWriteLine is a plain newline-terminated shell line with no && joining it to codexCommand. If mkdir/printf fails, the script continues and Codex runs with --output-schema pointing at a non-existent file, producing a confusing Codex CLI error instead of a clear schema-write failure. Chain with && to match AWF mode. (Existing comment.)

Concurrent jobs clobber shared schema/result files

Both schema and result file paths are fixed strings under /tmp/gh-aw/threat-detection/. Two detection jobs running in parallel on the same runner will overwrite each other's files, producing a wrong verdict for one job with no error raised. Scope the directory to a per-run/per-job identifier. (New comment on codex_engine.go.)

`reasons` array items not validated as strings

Array.isArray(parsed.reasons) ? parsed.reasons : [] admits non-string elements that coerce to "[object Object]" or "null" in downstream join() calls, silently producing misleading threat reasons. (New comment.)

`describe.skip` leaves the new parsing function entirely untested

All interesting code paths (valid JSON, threat verdicts, wrong types, empty file, read error) are skipped. The CJS mock limitation is avoidable by writing real temp files in the test. (New comment.)

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.8M

github-actions · 2026-05-26T22:49:53Z

+
+// detectionResultFilePath is the path where Codex writes the final structured
+// verdict via --output-last-message. The parser reads this file directly instead
+// of scraping the log stream, eliminating false parse_error warnings from noisy


Concurrent detection runs on the same runner will clobber each other's schema and result files, producing a wrong or missing structured verdict with no error surfaced.

💡 Details

Both detectionSchemaFilePath and detectionResultFilePath are fixed paths under /tmp/gh-aw/threat-detection/. In a self-hosted runner pool — or a single runner handling two detection jobs in parallel — Job A's detection_result.json can be overwritten by Job B's Codex run before Job A's parser reads it. Job A then reads Job B's verdict, with no indication that anything is wrong.

Fix: scope the directory to a per-run/per-job identifier:

// Use GITHUB_RUN_ID + GITHUB_JOB, injected as env vars at generation time detectionDir := fmt.Sprintf("/tmp/gh-aw/threat-detection/%s-%s", runID, jobName)

Alternatively, write the dir path into a step output or env var so the parser can read the same job-unique path.

github-actions · 2026-05-26T22:49:53Z

+        malicious_patch: parsed.malicious_patch,
+        reasons: Array.isArray(parsed.reasons) ? parsed.reasons : [],
+      },
+    };


reasons array items are not validated to be strings, so non-string elements (objects, numbers, nulls) flow silently through to core.error() and setDetectionFailure().

💡 Details

The check Array.isArray(parsed.reasons) ? parsed.reasons : [] accepts any array, including [{"nested":true}, null, 42]. Downstream, verdict.reasons.join("; ") coerces those to "[object Object]", "null", "42", producing misleading threat reasons in the Actions log and failure message. If the model output is malformed, this silently hides the issue.

Add an element-level validation:

if (!parsed.reasons.every(r => typeof r === "string")) { return { error: `"reasons" array must contain only strings` }; }

github-actions · 2026-05-26T22:49:53Z

+
+  // Note: The following tests are skipped because mocking fs for CJS modules
+  // is difficult in vitest (vi.mock("fs") does not intercept require("fs") for
+  // built-in modules in this CJS+vitest setup, same as safe_output_validator.test.cjs).


All meaningful test paths for parseStructuredResultFile are unconditionally skipped, meaning the new feature ships with effectively zero test coverage for its primary code paths.

💡 Details

The one test that actually runs checks only that the function returns null when the file does not exist — the trivial fallthrough case that requires no parsing logic at all. Every interesting path (valid JSON → clean verdict, threat verdict, invalid JSON, wrong field types, empty file, read error) lives in describe.skip and never executes in CI.

The stated reason is "CJS fs mock limitation", but this is solvable without mocking the real fs module. A simple alternative: create real temporary files in the test:

const os = require("os"); const tmpFile = path.join(os.tmpdir(), `detection_result_${Date.now()}.json`); fs.writeFileSync(tmpFile, JSON.stringify({ prompt_injection: false, ... })); try { const result = parseStructuredResultFile(tmpFile); expect(result.verdict.prompt_injection).toBe(false); } finally { fs.unlinkSync(tmpFile); }

This avoids the CJS mock limitation entirely and tests the real I/O path.

github-actions

Skills-Based Review 🧠

Applied /zoom-out, /tdd, /grill-with-docs, and /diagnose — no blocking issues, but four things worth addressing before merge.

📋 Key Themes & Highlights

Key Themes

Duplicated verdict evaluation (/zoom-out): The ~25-line threat/success evaluation block is copy-pasted into the structured-file branch and the log-parsing branch. Extracting a evaluateAndSetConclusion(verdict) helper would remove the drift risk.
Fragile non-skipped test (/tdd): The one live test for parseStructuredResultFile passes only because a specific /tmp path doesn't exist on the runner — a stale file breaks it silently. Use a UUID-suffixed temp path instead.
-o vs --output-last-message inconsistency (/grill-with-docs): The JS comment says --output-last-message; the Go engine uses -o. Clarify whether these are aliases.
Shell quoting fragility (/diagnose): printf '%s' '<JSON>' breaks silently if a single quote is ever added to the schema constant. Consider base64 or a compile-time assertion.
Cross-language contract undocumented (/grill-with-docs): DETECTION_RESULT_FILENAME (JS) and detectionResultFilePath (Go) must stay in sync but have no cross-reference comment.

Positive Highlights

✅ Clean fallback chain — structured file → log scraping → error, with no silent swallowing
✅ Conservative error handling: a malformed result file warns and falls back rather than silently clearing threats
✅ Good test coverage for the new Go flags (presence of --output-schema, absence of legacy response_schema)
✅ Two-mode (AWF PathSetup vs non-AWF preamble) schema-file write is well-commented

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.8M

github-actions · 2026-05-26T22:50:22Z

+        core.info("✔️  Structured result file found and parsed successfully.");
+        // Jump directly to verdict evaluation — no log parsing needed.
+        const verdict = structuredResult.verdict;
+        core.info("📋 Threat detection verdict (from structured result file):");


[/zoom-out] Verdict evaluation is duplicated between the structured-file path and the log-parsing path — any future change (new threat field, output format, conclusion logic) must be applied in two places.

💡 Suggested refactor

Extract a shared helper:

function evaluateAndSetConclusion(verdict) { const threatsDetected = verdict.prompt_injection || verdict.secret_leak || verdict.malicious_patch; if (threatsDetected) { const threats = []; if (verdict.prompt_injection) threats.push("prompt injection"); if (verdict.secret_leak) threats.push("secret leak"); if (verdict.malicious_patch) threats.push("malicious patch"); const reasonsText = verdict.reasons?.length ? "\nReasons: " + verdict.reasons.join("; ") : ""; core.error("🚨 Security threats detected: " + threats.join(", ")); setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`); } else { core.info("✅ No security threats detected."); core.setOutput("conclusion", "success"); core.setOutput("success", "true"); core.setOutput("reason", ""); } }

Both the structured-file branch and the log-parsing branch then call evaluateAndSetConclusion(verdict) instead of duplicating ~25 lines.

github-actions · 2026-05-26T22:50:22Z

+    const result = parseStructuredResultFile("/tmp/gh-aw/threat-detection/detection_result.json");
+    expect(result).toBeNull();
+  });
+


[/tdd] The single non-skipped test implicitly relies on /tmp/gh-aw/threat-detection/detection_result.json not existing on the test runner. A stale file left by a prior run would cause the function to enter the parse path rather than returning null, making this a passing-but-wrong assertion.

💡 Suggested fix

Use a guaranteed-unique path:

const crypto = require("crypto"); const os = require("os"); it("should return null when file does not exist", () => { const nonExistentPath = path.join(os.tmpdir(), `detection_result_${crypto.randomUUID()}.json`); const result = parseStructuredResultFile(nonExistentPath); expect(result).toBeNull(); });

This removes the implicit filesystem-state dependency entirely.

github-actions · 2026-05-26T22:50:22Z

@@ -112,6 +112,15 @@ const GITHUB_RATE_LIMITS_JSONL_PATH = `${TMP_GH_AW_PATH}/github_rate_limits.json
 */
 const DETECTION_LOG_FILENAME = "detection.log";



[/grill-with-docs] The JSDoc says the file is written via \--output-last-message`but the Go engine actually passes-o. If -ois an alias for--output-last-message` that should be confirmed and documented; if they are different flags this comment is misleading about what actually happens at runtime.

💡 Suggested update

/** * Filename of the structured threat detection result written by the Codex engine * via `-o` (`--output-last-message`). When present, the parser reads this file * directly instead of scraping the detection log... */

github-actions · 2026-05-26T22:50:22Z

+	var detectionSchemaWriteCmd string
 	if workflowData.IsDetectionRun {
-		structuredOutputParam = fmt.Sprintf(` -c response_schema='%s'`, detectionResponseSchema)
+		// --output-schema <file>: constrain model output to the threat detection schema


[/diagnose] The printf '%s' '<JSON>' shell command works today because detectionResponseSchema contains no single quotes, but this is an invisible constraint — adding a ' to the schema (e.g. a description or example field) would silently produce a broken shell command that fails at runtime on the agent runner.

💡 More robust alternative

Write the schema file using a Go-generated script that base64-encodes the content:

encoded := base64.StdEncoding.EncodeToString([]byte(detectionResponseSchema)) detectionSchemaWriteCmd = fmt.Sprintf( "mkdir -p /tmp/gh-aw/threat-detection && echo %s | base64 -d > %s", encoded, detectionSchemaFilePath, )

This survives any JSON content without shell-quoting concerns.

Alternatively, add a compile-time assertion: if strings.Contains(detectionResponseSchema, "'") { panic(...) } to make the constraint explicit and immediately visible.

github-actions · 2026-05-26T22:50:22Z

+ * caused by noisy SSE/tracing output in the log stream.
+ * @type {string}
+ */
+const DETECTION_RESULT_FILENAME = "detection_result.json";


[/grill-with-docs] DETECTION_RESULT_FILENAME is defined here as "detection_result.json" and the Go constant detectionResultFilePath is defined separately as "/tmp/gh-aw/threat-detection/detection_result.json". These are the same contract expressed independently — if one changes the other silently diverges, causing the parser to fall back to log-scraping with no error.

💡 Suggestion

Add a comment to each constant explicitly naming its counterpart:

// ⚠️ Must match detectionResultFilePath in pkg/workflow/codex_engine.go const DETECTION_RESULT_FILENAME = "detection_result.json";

// ⚠️ Must match DETECTION_RESULT_FILENAME in actions/setup/js/constants.cjs const detectionResultFilePath = "/tmp/gh-aw/threat-detection/detection_result.json"

Longer-term, a single source of truth (e.g. a shared config file or a test that reads both values) would eliminate the drift risk entirely.

pelikhan · 2026-05-26T22:52:17Z

@copilot review all comments

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-26T23:10:57Z

@copilot review all comments

Addressed the review feedback in commit 492b2ab.

Consolidated structured-file and log-parsing verdict handling into a shared helper in parse_threat_detection_results.cjs.
Added a defensive fallback when a structured result is parsed without a verdict.
Updated non-AWF Codex execution to chain schema write with && so schema write failures stop execution early.
Fixed missing argument separator so -o ... --prompt-file is always correctly spaced.
Added/updated Codex engine tests for non-AWF schema chaining and argument spacing.
Ran targeted validations (make fmt-check-cjs, make build-js, parser vitest, and go test ./pkg/workflow -run 'TestCodexEngineDetectionRun|TestCodexEngine').

pelikhan · 2026-05-26T23:25:53Z

@copilot merge main and recompile

…uctured-outputs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-26T23:39:22Z

@copilot merge main and recompile

Done in a317363: merged main into this branch and recompiled workflows.

github-actions · 2026-05-26T23:49:05Z

@copilot review all comments
Please address the unresolved review feedback on this PR.

Generated by 👨‍🍳 PR Sous Chef · gpt54 17.2M · ◷

done

pelikhan · 2026-05-27T00:01:19Z

@copilot lint go, update wasm golden

https://github.com/github/gh-aw/actions/runs/26481384175/job/77981328795

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-27T00:10:55Z

@copilot lint go, update wasm golden

https://github.com/github/gh-aw/actions/runs/26481384175/job/77981328795

Done in 8d3d63f: ran Go lint fixes and updated wasm golden output.

github-actions · 2026-05-27T00:20:36Z

@copilot review all comments
Please address the reviewer note in the verdict-reporting path and post a brief update.

Generated by 👨‍🍳 PR Sous Chef · gpt54 23.9M · ◷

Initial plan

0265dfe

Copilot AI assigned Copilot and pelikhan May 26, 2026

Copilot started work on behalf of pelikhan May 26, 2026 21:53 View session

Copilot AI linked an issue May 26, 2026 that may be closed by this pull request

Use Codex structured outputs for threat detection parsing #34980

Closed

Copilot AI and others added 2 commits May 26, 2026 22:15

Copilot AI changed the title ~~[WIP] Use Codex structured outputs for threat detection parsing~~ Use Codex structured outputs for threat detection parsing May 26, 2026

Copilot finished work on behalf of pelikhan May 26, 2026 22:18

Copilot AI requested a review from pelikhan May 26, 2026 22:18

pelikhan marked this pull request as ready for review May 26, 2026 22:39

Copilot AI review requested due to automatic review settings May 26, 2026 22:39

Copilot started reviewing on behalf of pelikhan May 26, 2026 22:40 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 26, 2026

[aw] No-Op Runs #34919

Open

github-actions Bot approved these changes May 26, 2026

View reviewed changes

github-actions Bot previously requested changes May 26, 2026

View reviewed changes

github-actions Bot reviewed May 26, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 26, 2026 22:52 View session

Copilot AI and others added 3 commits May 26, 2026 22:57

Plan review comment follow-up

0cdf50e

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Address review feedback on threat detection parsing

2182c23

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Fix codex argument spacing and consolidate verdict handling

492b2ab

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 26, 2026 23:11

Copilot started work on behalf of pelikhan May 26, 2026 23:26 View session

Copilot AI and others added 2 commits May 26, 2026 23:33

Merge remote-tracking branch 'origin/main' into copilot/use-codex-str…

ca7f1b2

…uctured-outputs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Merge main and recompile workflow locks

a317363

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 26, 2026 23:39

pelikhan removed their request for review May 26, 2026 23:48

Copilot started work on behalf of pelikhan May 27, 2026 00:01 View session

Run go lint fixes and refresh codex wasm golden

ad201ac

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 27, 2026 00:11

Copilot AI requested a review from pelikhan May 27, 2026 00:11

pelikhan merged commit c017f7b into main May 27, 2026
32 checks passed

pelikhan deleted the copilot/use-codex-structured-outputs branch May 27, 2026 00:28

github-actions Bot mentioned this pull request May 27, 2026

[compiler-threat-spec] spec: compiler threat detection spec v1.0.13 (2026-05-27 daily audit) #35103

Merged

		@@ -112,6 +112,15 @@ const GITHUB_RATE_LIMITS_JSONL_PATH = `${TMP_GH_AW_PATH}/github_rate_limits.json
		*/
		const DETECTION_LOG_FILENAME = "detection.log";

Conversation

Copilot AI commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Engine changes (codex_engine.go)

Parser changes (parse_threat_detection_results.cjs)

Test updates

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented May 26, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Verdict evaluation logic is duplicated

Non-AWF schema write failure is silently swallowed

Concurrent jobs clobber shared schema/result files

reasons array items not validated as strings

describe.skip leaves the new parsing function entirely untested

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

pelikhan commented May 26, 2026

Uh oh!

Copilot AI commented May 26, 2026

Uh oh!

pelikhan commented May 26, 2026

Uh oh!

Copilot AI commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Copilot AI commented May 26, 2026 •

edited

Loading

Engine changes (`codex_engine.go`)

Parser changes (`parse_threat_detection_results.cjs`)

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

`reasons` array items not validated as strings

`describe.skip` leaves the new parsing function entirely untested