Skip to content

[pull] main from jsr-io:main#101

Merged
pull[bot] merged 1 commit into
code:mainfrom
jsr-io:main
Jun 10, 2026
Merged

[pull] main from jsr-io:main#101
pull[bot] merged 1 commit into
code:mainfrom
jsr-io:main

Conversation

@pull

@pull pull Bot commented Jun 10, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

## Problem

Several recent reports (`@sys/tools@0.0.450`,
`@alice-and-bot/core@0.0.323` in #1448, and "jump to latest" looping on
its own version) are the same failure: a **publishing task stranded in a
non-terminal state, leaving the package-level `meta.json` — the index
Deno's resolver reads — never regenerated.**

The publish state machine has two phases:
- `Pending → Processed`: writes the immutable per-version `_meta.json`
and atomically commits the version row + npm tarball.
- `Processed → Success`: regenerates `meta.json` from the DB, uploads
it, and purges the CDN.

If the queue worker dies between them (Cloud Run timeout, cancelled CI
run during Sigstore provenance signing, transient S3/Cloudflare error),
the task strands:
- **`processed`** → version row + npm mirror exist, the version page
renders, `_meta.json` returns 200 — but `meta.json` still lacks the
version, so the resolver fails with `Could not find version ... that
matches '<v>'`.
- **`processing`** → nothing recovers it (`publish_task` just errors
`already processing`), and the `status != 'failure'` guard in
`create_publishing_task` blocks re-publishing that exact version. This
is the "stale transaction lock" described in #1448.

A secondary contributor: the manifest `Cache-Control` used
`s-maxage=86400`, so a *failed* best-effort CDN purge could keep a
correctly-regenerated `meta.json` invisible at the edge for up to 24h.

## Fix

1. **Reaper** — new `POST /tasks/requeue_stuck_publishing_tasks`,
scheduled every 5 min via Cloud Scheduler. It finds tasks stuck in
`processing`/`processed` for >30 min, resets stale `processing` tasks to
`pending` (safe — their version row never committed, the finalize txn is
atomic), and re-buffers everything through the publish queue, which
drives `publish_task`'s state machine to completion (regenerating
`meta.json`). This is the automated counterpart to the existing manual
admin requeue endpoint.
2. **Cache-control** — `s-maxage` on `CACHE_CONTROL_MANIFEST` lowered
`86400 → 60`, so a failed purge self-heals in ~60s via
`stale-while-revalidate` with no added resolver latency.

## Notes

- Existing stranded packages still need a one-off manual `POST
/api/admin/publishing_tasks/:id/requeue` until this deploys + schedules.
- The client/server-side asks in #1448 (timeouts on Sigstore signing,
releasing the lock on client disconnect) are real but separate from this
server-side stranding fix.

## Test plan

- New DB test `list_stale_publishing_tasks` covering the status filter
and time threshold.
- `cargo clippy --all-targets`, `cargo fmt --check`, and
`SQLX_OFFLINE=true cargo check --all-targets` all clean; sqlx offline
cache regenerated.
@pull pull Bot locked and limited conversation to collaborators Jun 10, 2026
@pull pull Bot added the ⤵️ pull label Jun 10, 2026
@pull pull Bot merged commit d768249 into code:main Jun 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant