Skip to content

Use Codex structured outputs for threat detection parsing#35061

Merged
pelikhan merged 9 commits into
mainfrom
copilot/use-codex-structured-outputs
May 27, 2026
Merged

Use Codex structured outputs for threat detection parsing#35061
pelikhan merged 9 commits into
mainfrom
copilot/use-codex-structured-outputs

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 26, 2026

Codex threat detection produces false parse_error warnings when SSE/tracing noise in the log stream prevents THREAT_DETECTION_RESULT:{...} from being scraped correctly, even when the model's verdict is clean. This replaces log scraping with Codex CLI's native structured output mechanism.

Engine changes (codex_engine.go)

Switches from -c response_schema='...' (config flag, result embedded in log stream) to:

--output-schema /tmp/gh-aw/threat-detection/detection_schema.json \
-o /tmp/gh-aw/threat-detection/detection_result.json

The schema file is written before Codex runs via a detectionSchemaWriteCmd:

  • AWF mode: written in PathSetup (host-side, before the container starts; /tmp/gh-aw/ is the shared read-write tree)
  • Non-AWF mode: prepended as a shell line in the command preamble

Parser changes (parse_threat_detection_results.cjs)

Adds parseStructuredResultFile() — reads and validates detection_result.json (checks boolean types on all three verdict fields). In runMain(), the structured file is tried first:

  • File present and valid → use verdict directly, skip log parsing
  • File present but malformed → warn and fall back to log scraping (conservative)
  • File absent → fall back to log scraping (backward compatible with non-Codex engines)

Test updates

  • codex_engine_test.go: asserts --output-schema / -o flags and correct file paths; asserts legacy response_schema flag is absent
  • parse_threat_detection_results.test.cjs: adds parseStructuredResultFile describe block; fs-dependent cases follow existing describe.skip pattern for the CJS mock limitation

Copilot AI and others added 2 commits May 26, 2026 22:15
- codex_engine.go: switch from -c response_schema config flag to
  --output-schema <file> and -o/--output-last-message <file> flags.
  Write detection JSON schema to detectionSchemaFilePath before
  Codex runs (AWF PathSetup for AWF mode; command preamble for
  non-AWF mode). Codex writes the structured verdict directly to
  detectionResultFilePath.

- parse_threat_detection_results.cjs: add parseStructuredResultFile()
  helper that reads and validates detection_result.json. In runMain(),
  try the structured result file before falling back to log scraping.
  Export the new function.

- constants.cjs: add DETECTION_RESULT_FILENAME constant.

- codex_engine_test.go: update TestCodexEngineDetectionRunUsesStructuredOutputSchema
  to verify --output-schema/-o flags and schema/result file paths
  instead of the old response_schema config flag.

- parse_threat_detection_results.test.cjs: add parseStructuredResultFile
  describe block with null-check test and describe.skip for fs-dependent
  scenarios (matching existing CJS mock limitation pattern).

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- parse_threat_detection_results.cjs: remove potentially sensitive file
  content excerpt from JSON parse error message
- codex_engine.go: eliminate duplicated non-AWF command construction by
  using a conditional schemaWriteLine variable instead of four separate
  fmt.Sprintf blocks

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Use Codex structured outputs for threat detection parsing Use Codex structured outputs for threat detection parsing May 26, 2026
Copilot AI requested a review from pelikhan May 26, 2026 22:18
@pelikhan pelikhan marked this pull request as ready for review May 26, 2026 22:39
Copilot AI review requested due to automatic review settings May 26, 2026 22:39
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #35061 does not have the 'implementation' label and has 92 new lines (≤100 threshold) in default business logic directories.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces log scraping of THREAT_DETECTION_RESULT: with Codex CLI's native structured output (--output-schema + -o) to eliminate false parse_error warnings from noisy SSE/tracing logs. The parser tries the structured result file first and falls back to log scraping if the file is missing or malformed.

Changes:

  • codex_engine.go: emits --output-schema <schema> / -o <result> for detection runs and writes the schema file via PathSetup (AWF) or a preamble shell line (non-AWF).
  • parse_threat_detection_results.cjs: adds parseStructuredResultFile() and uses it as the primary verdict source, with conservative fallback to log parsing.
  • Tests/constants: new DETECTION_RESULT_FILENAME, updated Codex engine flag assertions, new (largely describe.skip-pended) parser test cases.
Show a summary per file
File Description
pkg/workflow/codex_engine.go Switch detection runs from -c response_schema=... to --output-schema/-o, write schema file via PathSetup or preamble.
pkg/workflow/codex_engine_test.go Assert new flags and absence of legacy response_schema.
actions/setup/js/constants.cjs Add DETECTION_RESULT_FILENAME constant.
actions/setup/js/parse_threat_detection_results.cjs Add parseStructuredResultFile; consume structured file first in runMain, fall back to log parsing.
actions/setup/js/parse_threat_detection_results.test.cjs Add parseStructuredResultFile test block (most cases skipped due to fs mock limitation).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 3

Comment on lines +500 to +536
// Jump directly to verdict evaluation — no log parsing needed.
const verdict = structuredResult.verdict;
core.info("📋 Threat detection verdict (from structured result file):");
core.info(` prompt_injection : ${verdict.prompt_injection}`);
core.info(` secret_leak : ${verdict.secret_leak}`);
core.info(` malicious_patch : ${verdict.malicious_patch}`);
if (verdict.reasons && verdict.reasons.length > 0) {
core.info(` reasons (${verdict.reasons.length}):`);
verdict.reasons.forEach((reason, i) => core.info(` [${i + 1}] ${reason}`));
} else {
core.info(" reasons : (none)");
}

// ── Evaluate verdict and set conclusion ───────────────────────────
const threatsDetected = verdict.prompt_injection || verdict.secret_leak || verdict.malicious_patch;
if (threatsDetected) {
const threats = [];
if (verdict.prompt_injection) threats.push("prompt injection");
if (verdict.secret_leak) threats.push("secret leak");
if (verdict.malicious_patch) threats.push("malicious patch");
const reasonsText = verdict.reasons && verdict.reasons.length > 0 ? "\nReasons: " + verdict.reasons.join("; ") : "";
core.error("🚨 Security threats detected: " + threats.join(", "));
if (verdict.reasons && verdict.reasons.length > 0) {
core.error(" Reasons: " + verdict.reasons.join("; "));
}
setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`);
} else {
core.info("✅ No security threats detected. Safe outputs may proceed.");
core.setOutput("conclusion", "success");
core.setOutput("success", "true");
core.setOutput("reason", "");
}

core.info("════════════════════════════════════════════════════════");
core.info("🛡️ Threat detection conclusion complete.");
core.info("════════════════════════════════════════════════════════");
return;
Comment on lines +520 to +525
const reasonsText = verdict.reasons && verdict.reasons.length > 0 ? "\nReasons: " + verdict.reasons.join("; ") : "";
core.error("🚨 Security threats detected: " + threats.join(", "));
if (verdict.reasons && verdict.reasons.length > 0) {
core.error(" Reasons: " + verdict.reasons.join("; "));
}
setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`);
Comment thread pkg/workflow/codex_engine.go Outdated
Comment on lines +350 to +371
// Optionally prepend the detection schema write command for detection runs.
schemaWriteLine := ""
if workflowData.IsDetectionRun {
schemaWriteLine = detectionSchemaWriteCmd + "\n"
}

if harnessScriptName != "" {
// Harness handles prompt reading via --prompt-file; no INSTRUCTION variable needed.
command = fmt.Sprintf(`set -o pipefail
printf '%%s' "$(date +%%s%%3N)" > %s
touch %s
(umask 177 && touch %s)
mkdir -p "$CODEX_HOME/logs"
%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, codexCommand, logFile)
%s%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, schemaWriteLine, codexCommand, logFile)
} else {
command = fmt.Sprintf(`set -o pipefail
printf '%%s' "$(date +%%s%%3N)" > %s
touch %s
(umask 177 && touch %s)
INSTRUCTION="$(cat "$GH_AW_PROMPT")"
mkdir -p "$CODEX_HOME/logs"
%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, codexCommand, logFile)
%s%s 2>&1 | tee %s`, AgentCLIStartMsPath, AgentStepSummaryPath, logFile, schemaWriteLine, codexCommand, logFile)
@github-actions github-actions Bot mentioned this pull request May 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 95/100 — Excellent

Analyzed 2 active test(s) across 1 Go file and 1 JavaScript file: 2 design tests, 0 implementation tests, 0 guideline violations. 8 additional tests exist but are skipped due to a CJS fs mock limitation — noted below.

📊 Metrics & Test Classification (2 active tests analyzed)
Metric Value
New/modified tests analyzed 2 active (+ 8 skipped)
✅ Design tests (behavioral contracts) 2 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 2 (100%)
Duplicate test clusters 0
Test inflation detected No
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
parseStructuredResultFile - should return null when file does not exist actions/setup/js/parse_threat_detection_results.test.cjs ✅ Design None — uses real fs, verifies null return for absent file
TestCodexEngineDetectionRunUsesStructuredOutputSchema (modified) pkg/workflow/codex_engine_test.go ✅ Design None — table-driven, covers detection and agent run modes, positive and negative assertions

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 1 test — unit (//go:build !integration)
  • 🟨 JavaScript (*.test.cjs): 1 active test + 8 skipped (vitest)
⚠️ Noted: 8 Skipped Tests in `describe.skip` (informational)

The describe.skip("with structured result file present (CJS fs mock limitation)") block contains 8 well-written tests for parseStructuredResultFile that are skipped because vi.mock("fs") does not intercept require("fs") for built-in modules in the CJS+vitest configuration used by this project (same limitation documented in safe_output_validator.test.cjs).

The skipped tests cover important behavioral contracts:

  • Valid clean JSON → returns verdict object
  • Reasons field populated → verdict.reasons contains expected values
  • Invalid JSON → returns { error } object
  • Empty file → error contains "empty"
  • Missing required boolean field → error names the missing field
  • Boolean field with wrong type → error names the incorrect field
  • JSON root is an array → error contains "array"
  • readFileSync throws → error contains "Failed to read"

These are not counted in the score (skipped tests provide no runtime coverage), but they represent good behavioral thinking. Consider whether there is a way to exercise this logic in integration/E2E tests or by restructuring parseStructuredResultFile to accept an injectable fs adapter, which would allow proper unit testing.

Verdict

Check passed. 0% of active tests are implementation tests (threshold: 30%). Both active tests verify observable behavioral contracts with edge-case coverage.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.2M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 95/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). Both active tests verify observable behavioral contracts with edge-case coverage.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — The structured-output approach is the right fix for the log-scraping false-positive problem, but several issues need to be resolved before merging.

Blocking themes

Verdict evaluation logic is duplicated

The structured-result path (new Step 2) re-implements the full verdict evaluation — logging each field, computing threatsDetected, assembling threats/reasonsText, calling setDetectionFailure/core.setOutput, printing the divider banner — identically to the existing Steps 6–7 log-parsing path. Any change to verdict handling now has to be made twice and the two branches will silently drift. Extract a shared evaluateAndReportVerdict(verdict, source) helper. (Existing comment.)

Non-AWF schema write failure is silently swallowed

In the non-AWF command preamble schemaWriteLine is a plain newline-terminated shell line with no && joining it to codexCommand. If mkdir/printf fails, the script continues and Codex runs with --output-schema pointing at a non-existent file, producing a confusing Codex CLI error instead of a clear schema-write failure. Chain with && to match AWF mode. (Existing comment.)

Concurrent jobs clobber shared schema/result files

Both schema and result file paths are fixed strings under /tmp/gh-aw/threat-detection/. Two detection jobs running in parallel on the same runner will overwrite each other's files, producing a wrong verdict for one job with no error raised. Scope the directory to a per-run/per-job identifier. (New comment on codex_engine.go.)

reasons array items not validated as strings

Array.isArray(parsed.reasons) ? parsed.reasons : [] admits non-string elements that coerce to "[object Object]" or "null" in downstream join() calls, silently producing misleading threat reasons. (New comment.)

describe.skip leaves the new parsing function entirely untested

All interesting code paths (valid JSON, threat verdicts, wrong types, empty file, read error) are skipped. The CJS mock limitation is avoidable by writing real temp files in the test. (New comment.)

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 1.8M


// detectionResultFilePath is the path where Codex writes the final structured
// verdict via --output-last-message. The parser reads this file directly instead
// of scraping the log stream, eliminating false parse_error warnings from noisy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent detection runs on the same runner will clobber each other's schema and result files, producing a wrong or missing structured verdict with no error surfaced.

💡 Details

Both detectionSchemaFilePath and detectionResultFilePath are fixed paths under /tmp/gh-aw/threat-detection/. In a self-hosted runner pool — or a single runner handling two detection jobs in parallel — Job A's detection_result.json can be overwritten by Job B's Codex run before Job A's parser reads it. Job A then reads Job B's verdict, with no indication that anything is wrong.

Fix: scope the directory to a per-run/per-job identifier:

// Use GITHUB_RUN_ID + GITHUB_JOB, injected as env vars at generation time
detectionDir := fmt.Sprintf("/tmp/gh-aw/threat-detection/%s-%s", runID, jobName)

Alternatively, write the dir path into a step output or env var so the parser can read the same job-unique path.

malicious_patch: parsed.malicious_patch,
reasons: Array.isArray(parsed.reasons) ? parsed.reasons : [],
},
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasons array items are not validated to be strings, so non-string elements (objects, numbers, nulls) flow silently through to core.error() and setDetectionFailure().

💡 Details

The check Array.isArray(parsed.reasons) ? parsed.reasons : [] accepts any array, including [{"nested":true}, null, 42]. Downstream, verdict.reasons.join("; ") coerces those to "[object Object]", "null", "42", producing misleading threat reasons in the Actions log and failure message. If the model output is malformed, this silently hides the issue.

Add an element-level validation:

if (!parsed.reasons.every(r => typeof r === "string")) {
  return { error: `"reasons" array must contain only strings` };
}


// Note: The following tests are skipped because mocking fs for CJS modules
// is difficult in vitest (vi.mock("fs") does not intercept require("fs") for
// built-in modules in this CJS+vitest setup, same as safe_output_validator.test.cjs).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All meaningful test paths for parseStructuredResultFile are unconditionally skipped, meaning the new feature ships with effectively zero test coverage for its primary code paths.

💡 Details

The one test that actually runs checks only that the function returns null when the file does not exist — the trivial fallthrough case that requires no parsing logic at all. Every interesting path (valid JSON → clean verdict, threat verdict, invalid JSON, wrong field types, empty file, read error) lives in describe.skip and never executes in CI.

The stated reason is "CJS fs mock limitation", but this is solvable without mocking the real fs module. A simple alternative: create real temporary files in the test:

const os = require("os");
const tmpFile = path.join(os.tmpdir(), `detection_result_${Date.now()}.json`);
fs.writeFileSync(tmpFile, JSON.stringify({ prompt_injection: false, ... }));
try {
  const result = parseStructuredResultFile(tmpFile);
  expect(result.verdict.prompt_injection).toBe(false);
} finally {
  fs.unlinkSync(tmpFile);
}

This avoids the CJS mock limitation entirely and tests the real I/O path.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /zoom-out, /tdd, /grill-with-docs, and /diagnose — no blocking issues, but four things worth addressing before merge.

📋 Key Themes & Highlights

Key Themes

  • Duplicated verdict evaluation (/zoom-out): The ~25-line threat/success evaluation block is copy-pasted into the structured-file branch and the log-parsing branch. Extracting a evaluateAndSetConclusion(verdict) helper would remove the drift risk.
  • Fragile non-skipped test (/tdd): The one live test for parseStructuredResultFile passes only because a specific /tmp path doesn't exist on the runner — a stale file breaks it silently. Use a UUID-suffixed temp path instead.
  • -o vs --output-last-message inconsistency (/grill-with-docs): The JS comment says --output-last-message; the Go engine uses -o. Clarify whether these are aliases.
  • Shell quoting fragility (/diagnose): printf '%s' '<JSON>' breaks silently if a single quote is ever added to the schema constant. Consider base64 or a compile-time assertion.
  • Cross-language contract undocumented (/grill-with-docs): DETECTION_RESULT_FILENAME (JS) and detectionResultFilePath (Go) must stay in sync but have no cross-reference comment.

Positive Highlights

  • ✅ Clean fallback chain — structured file → log scraping → error, with no silent swallowing
  • ✅ Conservative error handling: a malformed result file warns and falls back rather than silently clearing threats
  • ✅ Good test coverage for the new Go flags (presence of --output-schema, absence of legacy response_schema)
  • ✅ Two-mode (AWF PathSetup vs non-AWF preamble) schema-file write is well-commented

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.8M

core.info("✔️ Structured result file found and parsed successfully.");
// Jump directly to verdict evaluation — no log parsing needed.
const verdict = structuredResult.verdict;
core.info("📋 Threat detection verdict (from structured result file):");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Verdict evaluation is duplicated between the structured-file path and the log-parsing path — any future change (new threat field, output format, conclusion logic) must be applied in two places.

💡 Suggested refactor

Extract a shared helper:

function evaluateAndSetConclusion(verdict) {
  const threatsDetected = verdict.prompt_injection || verdict.secret_leak || verdict.malicious_patch;
  if (threatsDetected) {
    const threats = [];
    if (verdict.prompt_injection) threats.push("prompt injection");
    if (verdict.secret_leak) threats.push("secret leak");
    if (verdict.malicious_patch) threats.push("malicious patch");
    const reasonsText = verdict.reasons?.length ? "\nReasons: " + verdict.reasons.join("; ") : "";
    core.error("🚨 Security threats detected: " + threats.join(", "));
    setDetectionFailure("threat_detected", `${ERR_VALIDATION}: ❌ Security threats detected: ${threats.join(", ")}${reasonsText}`);
  } else {
    core.info("✅ No security threats detected.");
    core.setOutput("conclusion", "success");
    core.setOutput("success", "true");
    core.setOutput("reason", "");
  }
}

Both the structured-file branch and the log-parsing branch then call evaluateAndSetConclusion(verdict) instead of duplicating ~25 lines.

const result = parseStructuredResultFile("/tmp/gh-aw/threat-detection/detection_result.json");
expect(result).toBeNull();
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The single non-skipped test implicitly relies on /tmp/gh-aw/threat-detection/detection_result.json not existing on the test runner. A stale file left by a prior run would cause the function to enter the parse path rather than returning null, making this a passing-but-wrong assertion.

💡 Suggested fix

Use a guaranteed-unique path:

const crypto = require("crypto");
const os = require("os");

it("should return null when file does not exist", () => {
  const nonExistentPath = path.join(os.tmpdir(), `detection_result_${crypto.randomUUID()}.json`);
  const result = parseStructuredResultFile(nonExistentPath);
  expect(result).toBeNull();
});

This removes the implicit filesystem-state dependency entirely.

@@ -112,6 +112,15 @@ const GITHUB_RATE_LIMITS_JSONL_PATH = `${TMP_GH_AW_PATH}/github_rate_limits.json
*/
const DETECTION_LOG_FILENAME = "detection.log";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The JSDoc says the file is written via \--output-last-message`but the Go engine actually passes-o. If -ois an alias for--output-last-message` that should be confirmed and documented; if they are different flags this comment is misleading about what actually happens at runtime.

💡 Suggested update
/**
 * Filename of the structured threat detection result written by the Codex engine
 * via `-o` (`--output-last-message`). When present, the parser reads this file
 * directly instead of scraping the detection log...
 */

var detectionSchemaWriteCmd string
if workflowData.IsDetectionRun {
structuredOutputParam = fmt.Sprintf(` -c response_schema='%s'`, detectionResponseSchema)
// --output-schema <file>: constrain model output to the threat detection schema
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The printf '%s' '<JSON>' shell command works today because detectionResponseSchema contains no single quotes, but this is an invisible constraint — adding a ' to the schema (e.g. a description or example field) would silently produce a broken shell command that fails at runtime on the agent runner.

💡 More robust alternative

Write the schema file using a Go-generated script that base64-encodes the content:

encoded := base64.StdEncoding.EncodeToString([]byte(detectionResponseSchema))
detectionSchemaWriteCmd = fmt.Sprintf(
    "mkdir -p /tmp/gh-aw/threat-detection && echo %s | base64 -d > %s",
    encoded, detectionSchemaFilePath,
)

This survives any JSON content without shell-quoting concerns.

Alternatively, add a compile-time assertion: if strings.Contains(detectionResponseSchema, "'") { panic(...) } to make the constraint explicit and immediately visible.

* caused by noisy SSE/tracing output in the log stream.
* @type {string}
*/
const DETECTION_RESULT_FILENAME = "detection_result.json";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] DETECTION_RESULT_FILENAME is defined here as "detection_result.json" and the Go constant detectionResultFilePath is defined separately as "/tmp/gh-aw/threat-detection/detection_result.json". These are the same contract expressed independently — if one changes the other silently diverges, causing the parser to fall back to log-scraping with no error.

💡 Suggestion

Add a comment to each constant explicitly naming its counterpart:

// ⚠️ Must match detectionResultFilePath in pkg/workflow/codex_engine.go
const DETECTION_RESULT_FILENAME = "detection_result.json";
// ⚠️ Must match DETECTION_RESULT_FILENAME in actions/setup/js/constants.cjs
const detectionResultFilePath = "/tmp/gh-aw/threat-detection/detection_result.json"

Longer-term, a single source of truth (e.g. a shared config file or a test that reads both values) would eliminate the drift risk entirely.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

Copilot AI and others added 3 commits May 26, 2026 22:57
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 26, 2026

@copilot review all comments

Addressed the review feedback in commit 492b2ab.

  • Consolidated structured-file and log-parsing verdict handling into a shared helper in parse_threat_detection_results.cjs.
  • Added a defensive fallback when a structured result is parsed without a verdict.
  • Updated non-AWF Codex execution to chain schema write with && so schema write failures stop execution early.
  • Fixed missing argument separator so -o ... --prompt-file is always correctly spaced.
  • Added/updated Codex engine tests for non-AWF schema chaining and argument spacing.
  • Ran targeted validations (make fmt-check-cjs, make build-js, parser vitest, and go test ./pkg/workflow -run 'TestCodexEngineDetectionRun|TestCodexEngine').

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot merge main and recompile

Copilot AI and others added 2 commits May 26, 2026 23:33
…uctured-outputs

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 26, 2026

@copilot merge main and recompile

Done in a317363: merged main into this branch and recompiled workflows.

@pelikhan pelikhan removed their request for review May 26, 2026 23:48
@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments
Please address the unresolved review feedback on this PR.

Generated by 👨‍🍳 PR Sous Chef · gpt54 17.2M ·

@pelikhan
Copy link
Copy Markdown
Collaborator

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 27, 2026

@copilot lint go, update wasm golden

https://github.com/github/gh-aw/actions/runs/26481384175/job/77981328795

Done in 8d3d63f: ran Go lint fixes and updated wasm golden output.

Copilot AI requested a review from pelikhan May 27, 2026 00:11
@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments
Please address the reviewer note in the verdict-reporting path and post a brief update.

Generated by 👨‍🍳 PR Sous Chef · gpt54 23.9M ·

@pelikhan pelikhan merged commit c017f7b into main May 27, 2026
32 checks passed
@pelikhan pelikhan deleted the copilot/use-codex-structured-outputs branch May 27, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Codex structured outputs for threat detection parsing

3 participants