[pull] main from jsr-io:main by pull[bot] · Pull Request #100 · code/lib-jsr

pull · 2026-06-09T18:11:06Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

The `err` attribute would mark spans as errors even if they actually returned 400s.

…te limit) (#1444) ## Why docs/diff/source origin load is dominated by **crawler cardinality**, confirmed in the live data: the lb cache key is the full URL, and these pages fan out across every `symbol × entrypoint × version` (and for diff, every `old→new` version pair). So each *distinct* URL is a cold docs render at the origin that caching fundamentally can't collapse — and as the origin gets healthier, crawlers complete more of the walk, pushing load onto this uncacheable tail. Cutting the crawl is the lever. This PR attacks it from two complementary angles: stop *inviting* the crawl (noindex), and *cap* it when it happens anyway (per-IP rate limit). ## Change ### 1. Unconditional `noindex` on high-cardinality, zero-SEO-value pages - **Per-symbol doc pages** (`.../doc/.../~/:symbol`) — previously `noindex` only for pinned `@version` URLs, so the **latest** symbol pages (exactly what crawlers walk) stayed indexable. - **All diff pages** — they key on `oldVersion`/`newVersion`, never `:version`, so the existing `ctx.params.version` guard was **dead code** and *every* diff page was indexable (N² version-pairs × symbols, no SEO value). Canonical, low-cardinality pages stay indexable: the package index (`/@scope/pkg`), the docs landing/per-entrypoint pages, and `all_symbols`. ### 2. Stricter per-IP rate limit on docs/diff/source pages A new `DOCS_RATELIMIT` (60 req / 60s) at the lb, applied **only** to the doc, diff, and source package pages via `isDocsDiffSourceRoute` in `lb/main.ts`. Keyed on `CF-Connecting-IP` — the real client IP at the Cloudflare edge — and stacked on top of the existing `FRONTEND_RATELIMIT` (240/60s), which stays as the broad frontend backstop. This works where rate limiting was previously thought infeasible: the *API server* only sees the frontend's identity on SSR-driven renders, but the **edge lb sees the real client IP on every route**, so the gate sits in front of the SSR render that drives the origin load. ## Notes - `noindex` reduces crawl *over time* (Google stops re-crawling and following into noindexed URLs); it's not instant, since a crawler must fetch a page once to see the header. The rate limit caps load immediately, including for clients that ignore `robots`/`noindex`. - The limit is per-IP, so it fully bites a single/few-IP scraper; a widely distributed crawl needs the noindex lever (and the render-cost work) to do the heavy lifting. - Scoped to the **pages** only. The backing `/api/.../docs` and `/api/.../diff` endpoints remain directly reachable without this limit (`isAPIRoute` routes them to `handleAPIRequest`, which has no guard) — capping the pages caps the SSR-driven renders, but a direct-API crawl is a separate vector left for follow-up.

crowlKats added 2 commits June 9, 2026 11:59

fix: more accurately report errors in otel (#1443)

9a8b93b

The `err` attribute would mark spans as errors even if they actually returned 400s.

pull Bot locked and limited conversation to collaborators Jun 9, 2026

pull Bot added the ⤵️ pull label Jun 9, 2026

pull Bot merged commit 180d064 into code:main Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from jsr-io:main#100

[pull] main from jsr-io:main#100
pull[bot] merged 2 commits into
code:mainfrom
jsr-io:main

pull Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull Bot commented Jun 9, 2026 •

edited

Loading