Use Codex structured outputs for threat detection parsing#35061
Conversation
- codex_engine.go: switch from -c response_schema config flag to --output-schema <file> and -o/--output-last-message <file> flags. Write detection JSON schema to detectionSchemaFilePath before Codex runs (AWF PathSetup for AWF mode; command preamble for non-AWF mode). Codex writes the structured verdict directly to detectionResultFilePath. - parse_threat_detection_results.cjs: add parseStructuredResultFile() helper that reads and validates detection_result.json. In runMain(), try the structured result file before falling back to log scraping. Export the new function. - constants.cjs: add DETECTION_RESULT_FILENAME constant. - codex_engine_test.go: update TestCodexEngineDetectionRunUsesStructuredOutputSchema to verify --output-schema/-o flags and schema/result file paths instead of the old response_schema config flag. - parse_threat_detection_results.test.cjs: add parseStructuredResultFile describe block with null-check test and describe.skip for fs-dependent scenarios (matching existing CJS mock limitation pattern). Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- parse_threat_detection_results.cjs: remove potentially sensitive file content excerpt from JSON parse error message - codex_engine.go: eliminate duplicated non-AWF command construction by using a conditional schemaWriteLine variable instead of four separate fmt.Sprintf blocks Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #35061 does not have the 'implementation' label and has 92 new lines (≤100 threshold) in default business logic directories. |
There was a problem hiding this comment.
Pull request overview
Replaces log scraping of THREAT_DETECTION_RESULT: with Codex CLI's native structured output (--output-schema + -o) to eliminate false parse_error warnings from noisy SSE/tracing logs. The parser tries the structured result file first and falls back to log scraping if the file is missing or malformed.
Changes:
codex_engine.go: emits--output-schema <schema>/-o <result>for detection runs and writes the schema file viaPathSetup(AWF) or a preamble shell line (non-AWF).parse_threat_detection_results.cjs: addsparseStructuredResultFile()and uses it as the primary verdict source, with conservative fallback to log parsing.- Tests/constants: new
DETECTION_RESULT_FILENAME, updated Codex engine flag assertions, new (largelydescribe.skip-pended) parser test cases.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/codex_engine.go | Switch detection runs from -c response_schema=... to --output-schema/-o, write schema file via PathSetup or preamble. |
| pkg/workflow/codex_engine_test.go | Assert new flags and absence of legacy response_schema. |
| actions/setup/js/constants.cjs | Add DETECTION_RESULT_FILENAME constant. |
| actions/setup/js/parse_threat_detection_results.cjs | Add parseStructuredResultFile; consume structured file first in runMain, fall back to log parsing. |
| actions/setup/js/parse_threat_detection_results.test.cjs | Add parseStructuredResultFile test block (most cases skipped due to fs mock limitation). |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 3
| // Jump directly to verdict evaluation — no log parsing needed. | ||
| const verdict = structuredResult.verdict; | ||
| core.info("📋 Threat detection verdict (from structured result file):"); | ||
| core.info(` prompt_injection : ${verdict.prompt_injection}`); | ||
| core.info(` secret_leak : ${verdict.secret_leak}`); | ||
| core.info(` malicious_patch : ${verdict.malicious_patch}`); | ||
| if (verdict.reasons && verdict.reasons.length > 0) { | ||
| core.info(` reasons (${verdict.reasons.length}):`); | ||
| verdict.reasons.forEach((reason, i) => core.info(` [${i + 1}] ${reason}`)); | ||
| } else { | ||
| core.info(" reasons : (none)"); | ||
| } | ||
|
|
||
| // ── Evaluate verdict and set conclusion ─────────────────────────── | ||
| const threatsDetected = verdict.prompt_injection || verdict.secret_leak || verdict.malicious_patch; | ||
| if (threatsDetected) { | ||
| const threats = []; | ||
| if (verdict.prompt_injection) threats.push("prompt injection"); | ||
| if (verdict.secret_leak) threats.push("secret leak"); | ||
| if (verdict.malicious_patch) threats.push("malicious patch"); | ||
| const reasonsText = verdict.reasons && verdict.reasons.length > 0 ? "\nReasons: " + verdict.reasons.join("; ") : ""; | ||
| core.error("🚨 Security threats detected: " + threats.join(", ")); | ||
| if (verdict.reasons && verdict.reasons.length > 0) { | ||
| core.error(" Reasons: " + verdict.reasons.join("; ")); | ||
| } | ||
| setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`); | ||
| } else { | ||
| core.info("✅ No security threats detected. Safe outputs may proceed."); | ||
| core.setOutput("conclusion", "success"); | ||
| core.setOutput("success", "true"); | ||
| core.setOutput("reason", ""); | ||
| } | ||
|
|
||
| core.info("════════════════════════════════════════════════════════"); | ||
| core.info("🛡️ Threat detection conclusion complete."); | ||
| core.info("════════════════════════════════════════════════════════"); | ||
| return; |
| const reasonsText = verdict.reasons && verdict.reasons.length > 0 ? "\nReasons: " + verdict.reasons.join("; ") : ""; | ||
| core.error("🚨 Security threats detected: " + threats.join(", ")); | ||
| if (verdict.reasons && verdict.reasons.length > 0) { | ||
| core.error(" Reasons: " + verdict.reasons.join("; ")); | ||
| } | ||
| setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`); |
| // Optionally prepend the detection schema write command for detection runs. | ||
| schemaWriteLine := "" | ||
| if workflowData.IsDetectionRun { | ||
| schemaWriteLine = detectionSchemaWriteCmd + "\n" | ||
| } | ||
|
|
||
| if harnessScriptName != "" { | ||
| // Harness handles prompt reading via --prompt-file; no INSTRUCTION variable needed. | ||
| command = fmt.Sprintf(`set -o pipefail | ||
| printf '%%s' "$(date +%%s%%3N)" > %s | ||
| touch %s | ||
| (umask 177 && touch %s) | ||
| mkdir -p "$CODEX_HOME/logs" | ||
| %s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, codexCommand, logFile) | ||
| %s%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, schemaWriteLine, codexCommand, logFile) | ||
| } else { | ||
| command = fmt.Sprintf(`set -o pipefail | ||
| printf '%%s' "$(date +%%s%%3N)" > %s | ||
| touch %s | ||
| (umask 177 && touch %s) | ||
| INSTRUCTION="$(cat "$GH_AW_PROMPT")" | ||
| mkdir -p "$CODEX_HOME/logs" | ||
| %s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, codexCommand, logFile) | ||
| %s%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, schemaWriteLine, codexCommand, logFile) |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 95/100 — Excellent
📊 Metrics & Test Classification (2 active tests analyzed)
Test Classification Details
Language SupportTests analyzed:
|
There was a problem hiding this comment.
REQUEST_CHANGES — The structured-output approach is the right fix for the log-scraping false-positive problem, but several issues need to be resolved before merging.
Blocking themes
Verdict evaluation logic is duplicated
The structured-result path (new Step 2) re-implements the full verdict evaluation — logging each field, computing threatsDetected, assembling threats/reasonsText, calling setDetectionFailure/core.setOutput, printing the divider banner — identically to the existing Steps 6–7 log-parsing path. Any change to verdict handling now has to be made twice and the two branches will silently drift. Extract a shared evaluateAndReportVerdict(verdict, source) helper. (Existing comment.)
Non-AWF schema write failure is silently swallowed
In the non-AWF command preamble schemaWriteLine is a plain newline-terminated shell line with no && joining it to codexCommand. If mkdir/printf fails, the script continues and Codex runs with --output-schema pointing at a non-existent file, producing a confusing Codex CLI error instead of a clear schema-write failure. Chain with && to match AWF mode. (Existing comment.)
Concurrent jobs clobber shared schema/result files
Both schema and result file paths are fixed strings under /tmp/gh-aw/threat-detection/. Two detection jobs running in parallel on the same runner will overwrite each other's files, producing a wrong verdict for one job with no error raised. Scope the directory to a per-run/per-job identifier. (New comment on codex_engine.go.)
reasons array items not validated as strings
Array.isArray(parsed.reasons) ? parsed.reasons : [] admits non-string elements that coerce to "[object Object]" or "null" in downstream join() calls, silently producing misleading threat reasons. (New comment.)
describe.skip leaves the new parsing function entirely untested
All interesting code paths (valid JSON, threat verdicts, wrong types, empty file, read error) are skipped. The CJS mock limitation is avoidable by writing real temp files in the test. (New comment.)
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.8M
|
|
||
| // detectionResultFilePath is the path where Codex writes the final structured | ||
| // verdict via --output-last-message. The parser reads this file directly instead | ||
| // of scraping the log stream, eliminating false parse_error warnings from noisy |
There was a problem hiding this comment.
Concurrent detection runs on the same runner will clobber each other's schema and result files, producing a wrong or missing structured verdict with no error surfaced.
💡 Details
Both detectionSchemaFilePath and detectionResultFilePath are fixed paths under /tmp/gh-aw/threat-detection/. In a self-hosted runner pool — or a single runner handling two detection jobs in parallel — Job A's detection_result.json can be overwritten by Job B's Codex run before Job A's parser reads it. Job A then reads Job B's verdict, with no indication that anything is wrong.
Fix: scope the directory to a per-run/per-job identifier:
// Use GITHUB_RUN_ID + GITHUB_JOB, injected as env vars at generation time
detectionDir := fmt.Sprintf("/tmp/gh-aw/threat-detection/%s-%s", runID, jobName)Alternatively, write the dir path into a step output or env var so the parser can read the same job-unique path.
| malicious_patch: parsed.malicious_patch, | ||
| reasons: Array.isArray(parsed.reasons) ? parsed.reasons : [], | ||
| }, | ||
| }; |
There was a problem hiding this comment.
reasons array items are not validated to be strings, so non-string elements (objects, numbers, nulls) flow silently through to core.error() and setDetectionFailure().
💡 Details
The check Array.isArray(parsed.reasons) ? parsed.reasons : [] accepts any array, including [{"nested":true}, null, 42]. Downstream, verdict.reasons.join("; ") coerces those to "[object Object]", "null", "42", producing misleading threat reasons in the Actions log and failure message. If the model output is malformed, this silently hides the issue.
Add an element-level validation:
if (!parsed.reasons.every(r => typeof r === "string")) {
return { error: `"reasons" array must contain only strings` };
}|
|
||
| // Note: The following tests are skipped because mocking fs for CJS modules | ||
| // is difficult in vitest (vi.mock("fs") does not intercept require("fs") for | ||
| // built-in modules in this CJS+vitest setup, same as safe_output_validator.test.cjs). |
There was a problem hiding this comment.
All meaningful test paths for parseStructuredResultFile are unconditionally skipped, meaning the new feature ships with effectively zero test coverage for its primary code paths.
💡 Details
The one test that actually runs checks only that the function returns null when the file does not exist — the trivial fallthrough case that requires no parsing logic at all. Every interesting path (valid JSON → clean verdict, threat verdict, invalid JSON, wrong field types, empty file, read error) lives in describe.skip and never executes in CI.
The stated reason is "CJS fs mock limitation", but this is solvable without mocking the real fs module. A simple alternative: create real temporary files in the test:
const os = require("os");
const tmpFile = path.join(os.tmpdir(), `detection_result_${Date.now()}.json`);
fs.writeFileSync(tmpFile, JSON.stringify({ prompt_injection: false, ... }));
try {
const result = parseStructuredResultFile(tmpFile);
expect(result.verdict.prompt_injection).toBe(false);
} finally {
fs.unlinkSync(tmpFile);
}This avoids the CJS mock limitation entirely and tests the real I/O path.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /zoom-out, /tdd, /grill-with-docs, and /diagnose — no blocking issues, but four things worth addressing before merge.
📋 Key Themes & Highlights
Key Themes
- Duplicated verdict evaluation (
/zoom-out): The ~25-line threat/success evaluation block is copy-pasted into the structured-file branch and the log-parsing branch. Extracting aevaluateAndSetConclusion(verdict)helper would remove the drift risk. - Fragile non-skipped test (
/tdd): The one live test forparseStructuredResultFilepasses only because a specific/tmppath doesn't exist on the runner — a stale file breaks it silently. Use a UUID-suffixed temp path instead. -ovs--output-last-messageinconsistency (/grill-with-docs): The JS comment says--output-last-message; the Go engine uses-o. Clarify whether these are aliases.- Shell quoting fragility (
/diagnose):printf '%s' '<JSON>'breaks silently if a single quote is ever added to the schema constant. Consider base64 or a compile-time assertion. - Cross-language contract undocumented (
/grill-with-docs):DETECTION_RESULT_FILENAME(JS) anddetectionResultFilePath(Go) must stay in sync but have no cross-reference comment.
Positive Highlights
- ✅ Clean fallback chain — structured file → log scraping → error, with no silent swallowing
- ✅ Conservative error handling: a malformed result file warns and falls back rather than silently clearing threats
- ✅ Good test coverage for the new Go flags (presence of
--output-schema, absence of legacyresponse_schema) - ✅ Two-mode (AWF
PathSetupvs non-AWF preamble) schema-file write is well-commented
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.8M
| core.info("✔️ Structured result file found and parsed successfully."); | ||
| // Jump directly to verdict evaluation — no log parsing needed. | ||
| const verdict = structuredResult.verdict; | ||
| core.info("📋 Threat detection verdict (from structured result file):"); |
There was a problem hiding this comment.
[/zoom-out] Verdict evaluation is duplicated between the structured-file path and the log-parsing path — any future change (new threat field, output format, conclusion logic) must be applied in two places.
💡 Suggested refactor
Extract a shared helper:
function evaluateAndSetConclusion(verdict) {
const threatsDetected = verdict.prompt_injection || verdict.secret_leak || verdict.malicious_patch;
if (threatsDetected) {
const threats = [];
if (verdict.prompt_injection) threats.push("prompt injection");
if (verdict.secret_leak) threats.push("secret leak");
if (verdict.malicious_patch) threats.push("malicious patch");
const reasonsText = verdict.reasons?.length ? "\nReasons: " + verdict.reasons.join("; ") : "";
core.error("🚨 Security threats detected: " + threats.join(", "));
setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`);
} else {
core.info("✅ No security threats detected.");
core.setOutput("conclusion", "success");
core.setOutput("success", "true");
core.setOutput("reason", "");
}
}Both the structured-file branch and the log-parsing branch then call evaluateAndSetConclusion(verdict) instead of duplicating ~25 lines.
| const result = parseStructuredResultFile("/tmp/gh-aw/threat-detection/detection_result.json"); | ||
| expect(result).toBeNull(); | ||
| }); | ||
|
|
There was a problem hiding this comment.
[/tdd] The single non-skipped test implicitly relies on /tmp/gh-aw/threat-detection/detection_result.json not existing on the test runner. A stale file left by a prior run would cause the function to enter the parse path rather than returning null, making this a passing-but-wrong assertion.
💡 Suggested fix
Use a guaranteed-unique path:
const crypto = require("crypto");
const os = require("os");
it("should return null when file does not exist", () => {
const nonExistentPath = path.join(os.tmpdir(), `detection_result_${crypto.randomUUID()}.json`);
const result = parseStructuredResultFile(nonExistentPath);
expect(result).toBeNull();
});This removes the implicit filesystem-state dependency entirely.
| @@ -112,6 +112,15 @@ const GITHUB_RATE_LIMITS_JSONL_PATH = `${TMP_GH_AW_PATH}/github_rate_limits.json | |||
| */ | |||
| const DETECTION_LOG_FILENAME = "detection.log"; | |||
|
|
|||
There was a problem hiding this comment.
[/grill-with-docs] The JSDoc says the file is written via \--output-last-message`but the Go engine actually passes-o. If -ois an alias for--output-last-message` that should be confirmed and documented; if they are different flags this comment is misleading about what actually happens at runtime.
💡 Suggested update
/**
* Filename of the structured threat detection result written by the Codex engine
* via `-o` (`--output-last-message`). When present, the parser reads this file
* directly instead of scraping the detection log...
*/| var detectionSchemaWriteCmd string | ||
| if workflowData.IsDetectionRun { | ||
| structuredOutputParam = fmt.Sprintf(` -c response_schema='%s'`, detectionResponseSchema) | ||
| // --output-schema <file>: constrain model output to the threat detection schema |
There was a problem hiding this comment.
[/diagnose] The printf '%s' '<JSON>' shell command works today because detectionResponseSchema contains no single quotes, but this is an invisible constraint — adding a ' to the schema (e.g. a description or example field) would silently produce a broken shell command that fails at runtime on the agent runner.
💡 More robust alternative
Write the schema file using a Go-generated script that base64-encodes the content:
encoded := base64.StdEncoding.EncodeToString([]byte(detectionResponseSchema))
detectionSchemaWriteCmd = fmt.Sprintf(
"mkdir -p /tmp/gh-aw/threat-detection && echo %s | base64 -d > %s",
encoded, detectionSchemaFilePath,
)This survives any JSON content without shell-quoting concerns.
Alternatively, add a compile-time assertion: if strings.Contains(detectionResponseSchema, "'") { panic(...) } to make the constraint explicit and immediately visible.
| * caused by noisy SSE/tracing output in the log stream. | ||
| * @type {string} | ||
| */ | ||
| const DETECTION_RESULT_FILENAME = "detection_result.json"; |
There was a problem hiding this comment.
[/grill-with-docs] DETECTION_RESULT_FILENAME is defined here as "detection_result.json" and the Go constant detectionResultFilePath is defined separately as "/tmp/gh-aw/threat-detection/detection_result.json". These are the same contract expressed independently — if one changes the other silently diverges, causing the parser to fall back to log-scraping with no error.
💡 Suggestion
Add a comment to each constant explicitly naming its counterpart:
// ⚠️ Must match detectionResultFilePath in pkg/workflow/codex_engine.go
const DETECTION_RESULT_FILENAME = "detection_result.json";// ⚠️ Must match DETECTION_RESULT_FILENAME in actions/setup/js/constants.cjs
const detectionResultFilePath = "/tmp/gh-aw/threat-detection/detection_result.json"Longer-term, a single source of truth (e.g. a shared config file or a test that reads both values) would eliminate the drift risk entirely.
|
@copilot review all comments |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed the review feedback in commit 492b2ab.
|
|
@copilot merge main and recompile |
…uctured-outputs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
|
|
@copilot lint go, update wasm golden https://github.com/github/gh-aw/actions/runs/26481384175/job/77981328795 |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 8d3d63f: ran Go lint fixes and updated wasm golden output. |
|
|
Codex threat detection produces false
parse_errorwarnings when SSE/tracing noise in the log stream preventsTHREAT_DETECTION_RESULT:{...}from being scraped correctly, even when the model's verdict is clean. This replaces log scraping with Codex CLI's native structured output mechanism.Engine changes (
codex_engine.go)Switches from
-c response_schema='...'(config flag, result embedded in log stream) to:The schema file is written before Codex runs via a
detectionSchemaWriteCmd:PathSetup(host-side, before the container starts;/tmp/gh-aw/is the shared read-write tree)Parser changes (
parse_threat_detection_results.cjs)Adds
parseStructuredResultFile()— reads and validatesdetection_result.json(checks boolean types on all three verdict fields). InrunMain(), the structured file is tried first:Test updates
codex_engine_test.go: asserts--output-schema/-oflags and correct file paths; asserts legacyresponse_schemaflag is absentparse_threat_detection_results.test.cjs: addsparseStructuredResultFiledescribe block; fs-dependent cases follow existingdescribe.skippattern for the CJS mock limitation