[pull] main from jsr-io:main#100
Merged
Merged
Conversation
The `err` attribute would mark spans as errors even if they actually returned 400s.
…te limit) (#1444) ## Why docs/diff/source origin load is dominated by **crawler cardinality**, confirmed in the live data: the lb cache key is the full URL, and these pages fan out across every `symbol × entrypoint × version` (and for diff, every `old→new` version pair). So each *distinct* URL is a cold docs render at the origin that caching fundamentally can't collapse — and as the origin gets healthier, crawlers complete more of the walk, pushing load onto this uncacheable tail. Cutting the crawl is the lever. This PR attacks it from two complementary angles: stop *inviting* the crawl (noindex), and *cap* it when it happens anyway (per-IP rate limit). ## Change ### 1. Unconditional `noindex` on high-cardinality, zero-SEO-value pages - **Per-symbol doc pages** (`.../doc/.../~/:symbol`) — previously `noindex` only for pinned `@version` URLs, so the **latest** symbol pages (exactly what crawlers walk) stayed indexable. - **All diff pages** — they key on `oldVersion`/`newVersion`, never `:version`, so the existing `ctx.params.version` guard was **dead code** and *every* diff page was indexable (N² version-pairs × symbols, no SEO value). Canonical, low-cardinality pages stay indexable: the package index (`/@scope/pkg`), the docs landing/per-entrypoint pages, and `all_symbols`. ### 2. Stricter per-IP rate limit on docs/diff/source pages A new `DOCS_RATELIMIT` (60 req / 60s) at the lb, applied **only** to the doc, diff, and source package pages via `isDocsDiffSourceRoute` in `lb/main.ts`. Keyed on `CF-Connecting-IP` — the real client IP at the Cloudflare edge — and stacked on top of the existing `FRONTEND_RATELIMIT` (240/60s), which stays as the broad frontend backstop. This works where rate limiting was previously thought infeasible: the *API server* only sees the frontend's identity on SSR-driven renders, but the **edge lb sees the real client IP on every route**, so the gate sits in front of the SSR render that drives the origin load. ## Notes - `noindex` reduces crawl *over time* (Google stops re-crawling and following into noindexed URLs); it's not instant, since a crawler must fetch a page once to see the header. The rate limit caps load immediately, including for clients that ignore `robots`/`noindex`. - The limit is per-IP, so it fully bites a single/few-IP scraper; a widely distributed crawl needs the noindex lever (and the render-cost work) to do the heavy lifting. - Scoped to the **pages** only. The backing `/api/.../docs` and `/api/.../diff` endpoints remain directly reachable without this limit (`isAPIRoute` routes them to `handleAPIRequest`, which has no guard) — capping the pages caps the SSR-driven renders, but a direct-API crawl is a separate vector left for follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )