Skip to content

[pull] main from jsr-io:main#100

Merged
pull[bot] merged 2 commits into
code:mainfrom
jsr-io:main
Jun 9, 2026
Merged

[pull] main from jsr-io:main#100
pull[bot] merged 2 commits into
code:mainfrom
jsr-io:main

Conversation

@pull

@pull pull Bot commented Jun 9, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

crowlKats added 2 commits June 9, 2026 11:59
The `err` attribute would mark spans as errors even if they actually
returned 400s.
…te limit) (#1444)

## Why

docs/diff/source origin load is dominated by **crawler cardinality**,
confirmed in the live data: the lb cache key is the full URL, and these
pages fan out across every `symbol × entrypoint × version` (and for
diff, every `old→new` version pair). So each *distinct* URL is a cold
docs render at the origin that caching fundamentally can't collapse —
and as the origin gets healthier, crawlers complete more of the walk,
pushing load onto this uncacheable tail. Cutting the crawl is the lever.

This PR attacks it from two complementary angles: stop *inviting* the
crawl (noindex), and *cap* it when it happens anyway (per-IP rate
limit).

## Change

### 1. Unconditional `noindex` on high-cardinality, zero-SEO-value pages

- **Per-symbol doc pages** (`.../doc/.../~/:symbol`) — previously
`noindex` only for pinned `@version` URLs, so the **latest** symbol
pages (exactly what crawlers walk) stayed indexable.
- **All diff pages** — they key on `oldVersion`/`newVersion`, never
`:version`, so the existing `ctx.params.version` guard was **dead code**
and *every* diff page was indexable (N² version-pairs × symbols, no SEO
value).

Canonical, low-cardinality pages stay indexable: the package index
(`/@scope/pkg`), the docs landing/per-entrypoint pages, and
`all_symbols`.

### 2. Stricter per-IP rate limit on docs/diff/source pages

A new `DOCS_RATELIMIT` (60 req / 60s) at the lb, applied **only** to the
doc, diff, and source package pages via `isDocsDiffSourceRoute` in
`lb/main.ts`. Keyed on `CF-Connecting-IP` — the real client IP at the
Cloudflare edge — and stacked on top of the existing
`FRONTEND_RATELIMIT` (240/60s), which stays as the broad frontend
backstop.

This works where rate limiting was previously thought infeasible: the
*API server* only sees the frontend's identity on SSR-driven renders,
but the **edge lb sees the real client IP on every route**, so the gate
sits in front of the SSR render that drives the origin load.

## Notes

- `noindex` reduces crawl *over time* (Google stops re-crawling and
following into noindexed URLs); it's not instant, since a crawler must
fetch a page once to see the header. The rate limit caps load
immediately, including for clients that ignore `robots`/`noindex`.
- The limit is per-IP, so it fully bites a single/few-IP scraper; a
widely distributed crawl needs the noindex lever (and the render-cost
work) to do the heavy lifting.
- Scoped to the **pages** only. The backing `/api/.../docs` and
`/api/.../diff` endpoints remain directly reachable without this limit
(`isAPIRoute` routes them to `handleAPIRequest`, which has no guard) —
capping the pages caps the SSR-driven renders, but a direct-API crawl is
a separate vector left for follow-up.
@pull pull Bot locked and limited conversation to collaborators Jun 9, 2026
@pull pull Bot added the ⤵️ pull label Jun 9, 2026
@pull pull Bot merged commit 180d064 into code:main Jun 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant