Skip to content

fix(manifest/bazel): harden Maven extraction completeness and show_extension handling#1364

Merged
Simon (simonhj) merged 7 commits into
v1.xfrom
simon/bazel-maven-hardening-v1x
Jun 16, 2026
Merged

fix(manifest/bazel): harden Maven extraction completeness and show_extension handling#1364
Simon (simonhj) merged 7 commits into
v1.xfrom
simon/bazel-maven-hardening-v1x

Conversation

@simonhj

@simonhj Simon (simonhj) commented Jun 15, 2026

Copy link
Copy Markdown

What

Test driven hardening of the maven bazel extractor, bundled together in one PR. I found these when expanding the corpus with more tests.

Fixes

One bad one:

  • --lockfile_mode=off on show_extension so a read-only scan never
    mutates the user's MODULE.bazel.lock.

Mostly cosmetic:

  • Honest incomplete-extraction signal. When extraction is partial (some
    hubs succeed, others fail) the SBOM is still emitted, but the run now carries
    a machine-readable completeness signal and warns loudly naming the incomplete
    hubs, on both the --auto-manifest and explicit socket manifest bazel
    paths. A run is only reported complete when it is.
  • Committed-lockfile gate scoped to the workspace root (depth-0). The gate
    that skips re-emitting a hub already covered by a committed
    maven_install.json now checks only the hub's own workspace root, not the
    whole subtree. A nested or fixture lockfile can no longer mask an uncovered
    root hub and drop its coordinates.
  • bazel mod show_extension non-zero exits are classified, not blanket-handled.
    A repo that simply doesn't use rules_jvm_external exits non-zero
    (No module ... exists in the dependency graph) — that's an authoritative
    "no Maven here" and must not fail the scan. A genuine MODULE.bazel
    evaluation failure (unbound name, syntax error) is indeterminate and must not
    be reported as complete. The classifier distinguishes them; the
    not-in-graph and eval-failure stderr shapes were verified against real
    bazel mod show_extension output.
  • cquery timeouts detected by the actual kill shape the process spawner
    emits (was checking a shape that never occurs).

Testing

  • pnpm check (lint + type-check): clean (one pre-existing, unrelated lint
    warning on v1.x left untouched).
  • pnpm test:unit for the Bazel + auto-manifest suites: all green (287 passed)
    with dist built.
  • The show_extension classification was validated against real bazel
    output on JVM and non-JVM bzlmod repos and a forced MODULE.bazel
    evaluation failure.

Note

Medium Risk
Changes dependency-discovery and completeness semantics for security scanning; misclassification could still under-report or over-fail scans, but behavior is heavily unit-tested and biases toward not aborting no-Maven repos.

Overview
Hardens Bazel Maven manifest extraction so incomplete or ambiguous scans are not presented as complete SBOMs.

Completeness signaling: Extraction returns complete, per-workspace/per-hub workspaceOutcomes, and writes socket-bazel-manifest-summary.json. Runs are complete only when every hub succeeded, no workspace failed to load, and no probe or show_extension enumeration was indeterminate. Partial runs still emit manifests but get prominent CLI/auto-manifest warnings plus complete=false logging.

Discovery correctness: Adds classifyShowExtensionResult to separate “no rules_jvm_external / no Maven” (authoritative not-defined, including common non-zero exits) from real MODULE.bazel eval failures (indeterminate). Hub probes and thrown probes now map unrecognized failures to indeterminate instead of silent not-defined.

Committed lockfile gate: Skips synthetic manifests only when the hub’s lockfile (maven_install.json or <hub>_maven_install.json) sits at depth-0 in that workspace; ignores CLI output dirs and nested fixture lockfiles so uncovered root hubs are still extracted.

CLI / tooling: socket manifest bazel gains --per-repo-timeout (default 120s, socket.json support). cquery timeouts detect killed + SIGTERM/SIGKILL (not a nonexistent timedOut flag). bazel mod show_extension for Maven/PyPI adds --lockfile_mode=off.

Reviewed by Cursor Bugbot for commit 632d5e9. Configure here.

The mod show_extension argv builders for the Maven and pip extensions did
not pass --lockfile_mode=off, so a read-only dependency scan could rewrite
the user's MODULE.bazel.lock. Mirror the query/cquery builders and pin the
lockfile read-only before user flags.
…mits

The cquery timeout detection tested for a `timedOut` flag the registry spawn
never sets; on a timeout the child is killed and Node reports `killed: true`
with `signal: SIGTERM` (or SIGKILL). Drop the phantom `timedOut` branch and
update the test to assert the real kill shape.
A Bazel Maven extraction that under-reports dependencies must never be
presented downstream as a complete SBOM. This reworks the shared status
model so the CLI is honest about partial and unanalyzable runs:

- Add an `indeterminate` probe state. An unrecognized non-zero probe exit
  or a thrown probe is no longer swallowed as `not-defined` ("no Maven
  here"); it is propagated so the run is never reported complete.
- A workspace that fails to load is recorded as a load failure rather than
  silently skipped: a hard failure when nothing else was analyzable, a
  partial otherwise.
- Enrich the extraction result with a machine-readable completeness signal
  (a `complete` flag plus per-workspace / per-hub outcomes) and write a
  completeness summary alongside the manifests for downstream consumers.
  The partial SBOM is still emitted and the warning stays loud.
- Make both the explicit command and auto-manifest honest: exit non-zero
  only on hard failure, exit 0 on partial with a prominent warning and the
  completeness signal echoed.
- Gate synthetic hub manifests on committed lockfiles: when a committed
  maven_install.json / <hub>_maven_install.json already covers a hub, skip
  re-emitting it since the server already ingests committed lockfiles.
- Wire a --per-repo-timeout flag (and socket.json setting) with a 120s
  default for the explicit command, longer than the 60s auto-manifest
  default; drop the misleading comment claiming the default already existed.

Per-state diagnostics are logged under --verbose.
…lookup

Array.prototype.at requires the es2022 lib; the stricter typechecker flagged
it on the mock.calls tuple type. Use length-1 index access, which is lib-agnostic.
…ce root

The committed-lockfile gate walked the whole tree under a workspace root
and matched a maven_install.json at any depth. A nested workspace or test
fixture lockfile then wrongly marked the root @maven hub as already
covered, so its synthetic manifest was skipped and its distinct
coordinates were never emitted while the run could still report complete.

Scope the gate to depth-0: a committed lockfile only covers the workspace
it lives directly in, since each workspace is analyzed independently and
the server walker ingests every committed lockfile against its own
workspace. Also exclude the CLI's own synthetic output directories
(.socket-auto-manifest, bazel-manifests) by name so a stale prior-run
manifest can never be misread as a committed lockfile.

Mark Maven hub enumeration indeterminate when `bazel mod show_extension`
fails to execute (non-zero exit), distinct from a clean run that finds no
maven extension. A failed enumeration may have missed custom-named hubs,
so the run is reported known-incomplete (partial when other hubs
succeeded, hard failure when nothing is analyzable) instead of silently
falling back to conventional-name probes and reporting complete.
…f blanket-flagging

bazel mod show_extension for the maven extension exits non-zero on every
bzlmod repo that doesn't depend on rules_jvm_external, because its argument
resolution throws before any Starlark runs. Treating any non-zero exit as an
indeterminate enumeration wrongly flipped those legitimate no-Maven repos to
a hard failure, aborting the user's whole auto-manifest scan.

Classify the result by stderr shape via classifyShowExtensionResult:
- extension/module not in the dependency graph (arg-resolution error) is a
  legitimate not-defined, contributing no Maven (noEcosystem)
- a genuine MODULE.bazel evaluation/load failure, or a missing binary
  (normalized code -1), stays indeterminate so the run is never complete
- an unrecognized non-zero exit biases to not-defined; only positive
  eval-failure stderr escalates to indeterminate

Replace the old hard-fail test (whose mock was a legitimate no-Maven repo)
with a noEcosystem assertion, and add coverage for the not-in-graph and
genuine-eval-failure paths.
The not-in-graph classifier relied on the conservative default for Bazel's
actual phrasing ('No module with the apparent repo name X exists in the
dependency graph'). Add that verified wording explicitly and pin both the
real not-in-graph (exit 2 -> not-defined) and real unbound-name eval failure
(exit 2 -> indeterminate, eval-failure checked first) with tests.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 632d5e9. Configure here.

Comment thread src/commands/manifest/bazel/bazel-query-runner.mts

@mtorp Martin Torp (mtorp) left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@simonhj Simon (simonhj) merged commit 0e41460 into v1.x Jun 16, 2026
13 checks passed
@simonhj Simon (simonhj) deleted the simon/bazel-maven-hardening-v1x branch June 16, 2026 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants