perf(pm): probe — #2818 minus worker-pool by elrrrrrrr · Pull Request #2836 · utooland/utoo

elrrrrrrr · 2026-04-27T11:49:13Z

Summary

Take #2818's full bundle (rebased in #2834, gives p1_resolve -52%) and revert ONLY the preload worker-pool commit (`ce574d58 perf(ruborist): preload worker-pool replaces FuturesUnordered`).

If perf still -52% → worker-pool isn't part of the driver mix, the gain is purely from network/cache layer (aws-lc-rs, OnceMap, DNS, etc.)
If perf drops back toward baseline → worker-pool IS part of the synergistic driver

Context

probe	p1_resolve	takeaway
#2832 mt-pool only	4.59s ±1.66	small mean drop, huge σ
#2835 aws-lc-rs only	6.13s ±1.00	no improvement
#2834 all 101 commits	2.62s ±0.07	-52%, very tight σ
this = #2834 − worker-pool	TBD

Test plan

cargo build pass
CI bench-phases-linux

🤖 Generated with Claude Code

Replace intra-package `par_iter` with a sequential loop when writing extracted tar entries to disk. Each tar entry is typically small and writes complete in microseconds, so splitting them into rayon tasks was causing heavy work-stealing (futex park/unpark) and dominating context switches on large dep graphs. Cross-package parallelism is preserved by the outer `rayon::spawn` in `extract_tarball`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Cold bench: drop `| tail -1` so hyperfine's full summary (mean, stddev, range) reaches the log. Failure detection now uses exit status instead of piping. - `BENCH_WARM_RUNS=0` skips the warm phase entirely (previously the warm function always ran and hyperfine would reject --runs 0). - Result aggregator tolerates empty or malformed export-json files (e.g. when a PM's cold install fails): the offending file is reported and skipped instead of crashing the whole summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the sequential `for` loop over extracted tar entries with `par_chunks(WRITE_CHUNK_SIZE)` — each rayon task writes a contiguous run of 32 files sequentially. This retains multi-core IO overlap for large packages while cutting the rayon task count (and its work- stealing futex traffic) by the chunk factor versus a per-file par_iter. Cross-package parallelism is preserved by the outer rayon::spawn in extract_tarball. Local (macOS, antd-test, 3 runs avg): before par_iter: wall 17.2s sys 6.18s ivcsw 208k for-loop: wall 15.3s sys 2.36s ivcsw 61k par_chunks(32): wall 13.9s sys 5.77s ivcsw 191k chunks wins wall but loses the ctx-switch reduction relative to the pure sequential version; CI with a large dep graph (ant-design-x) is the authoritative measurement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Accumulate wall microseconds for download, extract, and clone across all packages during install. Print a one-line summary alongside the existing `added / reused / downloaded` counts, e.g. + 513 added · 3017 reused · 123 downloaded download 135.8s · extract 2.3s · clone 0.4s · 19.0 MB fetched The sums are non-exclusive across cores: dividing by wall clock gives the effective concurrency for each phase, and the ratio between phases shows where cold-install CPU time actually lands. Overhead is three atomics per downloaded tarball. Local antd-test (macOS, npmmirror, 77 packages, wall 16s): download dominates 98% of the CPU budget, extract 1.6%, clone 0.3% — reshapes where we should look for cold-install wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Needed so the per-phase timings line (`download · extract · clone · bytes`) printed at the end of each install reaches the CI log. Trade-off is noisier logs — registry INFO/WARN lines come through — but that's the price for visibility into where cold-install CPU actually lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Separates three independent measurements for utoo vs bun so each phase's improvement can be judged on its own baseline: Phase 1 · resolve utoo deps / bun install --lockfile-only Phase 3 · cold install utoo install / bun install (empty cache) Phase 4 · warm link utoo install / bun install (cache warm) Phase 3 uses the lockfile generated by phase 1, with cache reset between iterations. Phase 4 resets only node_modules so only the cache → node_modules link step is measured. Uses hyperfine --show-output so utoo's phase-timings line (\`download · extract · clone · bytes\`) reaches the CI log alongside the wall-clock summary. Triggered via workflow_dispatch with configurable project / registry / runs. Defaults to ant-design against npmjs.org, 3 runs per phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anch merge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous inline bash -c prepare was silently no-op on CI: utoo's run 2/3 showed '3280 reused' meaning the cache wasn't actually cleared, and bun hit InvalidNPMLockfile because utoo's package-lock.json leaked across iterations. Now each phase writes a dedicated prepare shell script per-PM that: - always drops node_modules (incl. workspace package trees), - clears exactly the lockfiles that would confuse this PM, - wipes the right cache for this phase, - prints a '[prep]' line so the CI log proves prepare ran. Also factored out seed_for_phase so lockfile / cache warmup happens once before the benchmark, not leaking into the measurement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…che wipe Path-based rm -rf of $HOME/.cache/nm wasn't actually emptying the cache on the CI runner — utoo runs 2/3 of phase 3 still showed '3280 reused', wall was 0.8-1.1s instead of the 10s cold-install baseline, hyperfine itself warned about caches not being filled until after run 1. Let each PM clean its own cache via its CLI so we don't rely on guessing where it stores things. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`utoo clean` / `bun pm cache rm` didn't empty the cache on the CI runner either — so now use explicit bench-local paths the rm -rf prepare can guarantee to wipe: utoo: --cache-dir=/tmp/utoo-bench-cache on every invocation bun: BUN_INSTALL_CACHE_DIR=/tmp/bun-bench-cache (env var) Gets us deterministic cold/warm state between hyperfine iterations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop into diagnostic mode to figure out why hyperfine's --prepare still leaves utoo's cache intact across iterations despite the explicit --cache-dir. Prints the generated prepare script, and logs each per-iteration invocation's before/after du -sh of both caches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The case $phase in p1) p3) p4) \-style patterns never matched against actual phase strings like "p1_resolve" / "p3_cold_install" / "p4_warm_link". Result: write_prepare produced a script containing only the common header and no phase-specific cache-wipe logic, so every run after the first hit a warm cache and timings collapsed. Same off-by-name bug in seed_for_phase: "p3:utoo" pattern never matched "p3_cold_install:utoo", skipping lockfile seeding and warm-cache priming. Switched both to "p*_*" globs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The cache-size before/after logs + generated-script dumps were diagnostic scaffolding used to trace the p* vs p*_resolve pattern mismatch. With that fixed, keep the plain hyperfine --prepare invocation so CI logs are readable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…time Each hyperfine iteration now runs inside a metrics wrapper that greps /usr/bin/time -v output for RSS, voluntary/involuntary context switches, page faults, and IO read/write counts. Per-PM per-phase averages across the 3 runs are shown alongside the wall-clock table so we can see, e.g., whether utoo's resolve phase costs more syscalls than bun's, or whether its warm-link advantage comes at a memory cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Expand the metrics wrapper to collect everything that's cheap on Linux: - user / sys CPU seconds (from /usr/bin/time -v, lets us see CPU share) - RSS, voluntary + involuntary ctx, major + minor page faults - network RX / TX bytes (system-wide /proc/net/dev delta, excludes lo) - disk page-in / page-out bytes (/proc/vmstat pgpg{in,out} × 4K pages) Summary prints two tables per phase: A. wall / ±σ / user / sys / RSS / minor faults B. vCtx / iCtx / net RX / net TX / disk R / disk W This makes resolve-phase vs link-phase comparison legible: e.g. network cost should dominate download phases while disk writes dominate link. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous run attributed 525MB of writes to utoo's resolve phase when local check showed utoo only wrote ~28MB to its cache. The overshoot came from /proc/vmstat pgpgout being system-wide — it picked up ext4 journal, page-cache writeback, and other kernel activity unrelated to the benchmarked process. Switch to du-before/after on the paths that matter (cache dir, project node_modules, lockfiles) for a per-PM figure that reflects what the install actually produced. Summary now shows Δcache / Δnode_mod / Δlock per phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Measuring disk footprint via du before+after each iteration added 2-3s of traversal to every run (wall jumped from 2.3s → 4.9s on the warm-link phase). Both snapshots happened inside hyperfine's timed region because the wrapper runs as the benchmark command. Hot path keeps only /usr/bin/time + /proc/net/dev snapshots now. After hyperfine exits, capture_footprint does one du pass per phase/PM to record the final on-disk size of the cache, node_modules, and lockfile. Summary prints absolute sizes instead of per-iteration deltas — single sample is enough to compare what each PM produced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

parseKey matched both `_${phase}_${pm}.json` (hyperfine export) and `_${phase}_${pm}_footprint.json` (our new du snapshot), so the loop tried to read .results[0] off the footprint and crashed the whole summary. Add footprint suffix to the exclusion filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

npm registries compress manifest responses ~13× (antd abbreviated goes from 4.2MB to 309KB with gzip), but ruborist's reqwest client had neither compression feature enabled — so it never advertised `Accept-Encoding: gzip,br` and the server delivered raw JSON. Adding `gzip` + `brotli` to the feature list cuts the cold `utoo deps` manifest traffic on ant-design from ~275 MB of JSON over the wire to ~21 MB. Wall improvement is modest on high-latency links (connection setup dominates) but the bandwidth reduction is real and the CPU cost of decompression is negligible next to simd_json. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

reqwest's HTTP/2 client multiplexes every manifest fetch over a SINGLE TCP connection to each registry host. Bun opens ~10 parallel HTTP/2 connections and gets proportional extra bandwidth; we can't reproduce that through reqwest without custom pooling. Falling back to HTTP/1.1 with pool_max_idle_per_host(64) lets the pool open independent connections (one request per connection, 64 parallel). Local cold `utoo deps` on ant-design against registry.antgroup-inc.cn: HTTP/2 single connection: 4.9s avg HTTP/1.1 + pool of 64: 4.0s avg (-18%) bun (reference): 3.2s Full parity with bun still wants multi-connection HTTP/2 (bun's strategy), which reqwest doesn't expose without a custom client pool — future work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…etching" This reverts commit 51b5ede.

Temporary diagnostic. Tracks send_us / body_us / bytes per fetch_full_manifest call and prints p50/p90/p99/max every 500 samples so the final output reflects the tail distribution of the full run. Remove before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…wrapper)

reqwest multiplexes all requests over a single HTTP/2 connection by default, which causes head-of-line blocking on npm registries with high RTT: a slow tail response stalls the whole manifest fetch phase. An HTTP/1.1 pool lets concurrent manifest requests open independent TCP streams, so a single slow response no longer blocks the rest. Locally on ant-design with npmjs, this cut cold deps-resolve from ~121s (H2 single) to ~21s (H1 pool) — 5.75× faster. On low-latency registries (antgroup) the two are neutral, so there is no downside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a per-name single-flight gate to UnifiedRegistry::resolve_full_manifest. Concurrent callers for the same package name now serialize on a per-name mutex; the first caller hits the network and populates the memory cache, the rest re-check the cache after the gate and return the cached manifest. On ant-design cold deps this eliminates ~100+ duplicate full-manifest fetches observed when many deps point at the same transitive package. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reverts the temporary record_sample() and per-request timing diagnostics added in 14f2777 / 50a7014. The distribution data was used to identify HTTP/2 head-of-line blocking; now that H1 + pool and dedup are in, the diagnostic prints are no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Runs the complete cold install (utoo install / bun install) with everything wiped — lockfile, all caches, node_modules. Matches the end-to-end "freshly cloned repo" user scenario and is directly comparable to pm-bench.yml's cold install number. Reported alongside the existing p1_resolve / p3_cold_install / p4_warm_link phases; does not replace any of them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

reqwest pins every new connection to the first resolved IP even when DNS returns multiple A records. On registries backed by a CDN with many IPs (antgroup returns 8, npm/Cloudflare returns 2-4) this means all concurrent pool connections land on one IP, which caps effective parallelism regardless of `pool_max_idle_per_host`. Rotate the returned address list by an atomic counter on every `resolve` call so reqwest's connect loop picks a different IP per new connection. Connections end up uniformly distributed across all A records returned by DNS. Measured on ant-design / antgroup registry (cold deps, local): - utoo-h1 (single IP): 5.38s HTTP phase, 120 conn on 1 IP - utoo-h1 + DNS rotation: 3.95s HTTP phase, 8 IPs × 8 conn each - bun baseline: 3.72s HTTP phase, 4 IPs × 64 conn each Total deps-resolve wall time now matches bun (~3.3s vs 3.3s). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Local antgroup runs show DNS rotation cuts utoo's resolve HTTP phase from 5.38s to 3.95s (matching bun). On CI against npmjs however the resolve wall time is flat — possibly because: - npmjs from GH Actions returns fewer A records (Cloudflare Anycast) - low RTT already masks HOL tail Capture a single cold resolve run per PM under tcpdump so we can see the actual connection topology on CI and compare against the local antgroup evidence. Output uploaded as pm-bench-pcap artifact. Runs once after the main phased bench; reuses the already-cloned project directory and wipes lockfiles + caches itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pcap comparison against bun on both local (antgroup) and CI (npmjs) consistently shows bun opens ~256 parallel TCP connections during a cold install (4 IPs × 64 conn each), while utoo was capped at 64 — ~1/4 the effective parallelism even after the DNS round-robin fix, because reqwest treats all addresses of a host as a single pool rather than per-IP like bun. Raise the default concurrent manifest fetch count from 64 to 256 to match bun's observed network footprint. The CLI flag `--manifests-concurrency-limit` still overrides it. Pool idle cap bumped to 256 so the keep-alive pool can park every in-flight connection without churning. Risk: with DNS returning few A records the 256 connections may concentrate on one IP and trigger per-IP rate limits. Pushing to CI to measure before committing to this as the default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Standalone manifest-bench cap=128 hits avg_conc=95 with the same reqwest stack; ruborist stalls at avg_conc=56. Per-completion indicatif Mutex contention is the remaining gap source after dropping log_progress(format!()) (commit f455a0b) and reverting the over-aggressive dedup-by-name. Each PreloadQueued / PreloadProgress event calls PROGRESS_BAR.inc[_length](1), each grabbing indicatif's internal ProgressBar Mutex. With 4571 dispatches + 4571 completions the main task pays ~9000 lock acquisitions during a 3-4 s phase, all contending with the steady_tick draw thread (100 ms). That cap on main loop throughput is what holds avg_conc at 56 vs the standalone reqwest-only sweep's 95. Drop the per-event bar updates entirely during preload. Phase spinner still animates via steady_tick so the user sees activity; PreloadComplete prints the final ok/fail summary. The numeric during-preload counter is gone but the phase is short (3-4 s) and the user sees the finished totals. Expected: ruborist p1_resolve preload wall drops toward standalone manifest-bench's 2.4 s, closing most of the remaining gap to bun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Standalone manifest-bench cap=128 hits avg_conc=95 with the same reqwest stack; ruborist stuck at avg_conc=56 even after dropping indicatif Mutex calls (commit 2b89d0b). Same-CI-run comparison under matched Cloudflare conditions: standalone wall=2.06s vs ruborist wall=3.09s — 15-conc gap that isn't HTTP, isn't parse, and isn't progress-bar lock contention. Hypothesis: `MemoryCache::get_full_manifest` returned `FullManifest` by value, deep-cloning the per-version `HashMap<String, Arc<simd_json::OwnedValue>>` (100-500 entries, key Strings + Arc bumps per entry) on every cache hit. Each `resolve_package` call issues this read at line 226 of registry.rs as its first sync step, running on the main task that owns `FuturesUnordered` — so the deep clone serialises directly with the fill-and-drain loop and caps in-flight count. Change cache storage to `Arc<FullManifest>`: - `MemoryCache.full_manifests: RwLock<HashMap<String, Arc<FullManifest>>>` - `get_full_manifest -> Option<Arc<FullManifest>>` (atomic-bump clone) - `set_full_manifest(name, Arc<FullManifest>)` (avoid wrapping at boundary) - `FullManifestResult::Full(Arc<FullManifest>)` so OnceMap dedup also hands shared `Arc`s to coalesced waiters instead of cloning the whole struct per caller `UnifiedRegistry::resolve_full_manifest` constructs the `Arc` once on the network path (line 281, 318) and passes the same handle to both `cache.set` and `Ok(FullManifestResult::Full)`. Trait method `get_cached_full_manifest` keeps its `Option<FullManifest>` signature (one external caller is `ut view`, off the hot path) and deep-clones on demand from the `Arc`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Final hypothesis after Arc<FullManifest> didn't lift the avg_conc=56 ceiling: ruborist hot paths emit ~5-10 `tracing::debug!()` per resolved manifest (cache hits, preload events, BFS dispatch). With 2730+ manifests during cold preload that's 15-30k events. Even through tracing_appender's non_blocking channel, each event pays format/serialise CPU on the resolving thread before the channel send. The standalone manifest-bench has zero tracing calls and hits avg_conc=92 at cap=128 with the same reqwest stack. Drop file-layer default from `utoo=debug` to `utoo=info`. The hot debug events stop firing entirely (no format, no channel send). Override path preserved: `UTOO_FILE_LOG=debug` (or any RUST_LOG-style spec) re-enables verbose file capture when actually diagnosing. Console filter behaviour unchanged. Expected: avg_conc lifts from 56 toward standalone's 92, p1_resolve preload wall drops toward standalone's 2.0-2.4 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`resolve_package`'s full-manifest cache-hit branch (registry.rs:541) was cloning the entire `versions.keys: Vec<String>` (100-500 entries per package) just to pass `&[String]` to `resolve_target_version`. Cold ant-design preload hits this branch ~1800 times (every dep beyond the first unique-(name) pop falls through here once preload has populated the full manifest). 1800 × ~200 entries = ≈360k String allocations on the resolver worker pool — global allocator contention that doesn't show up in our HTTP/parse diag because it runs on resumed-future threads, not the main task. Borrow `&full_manifest.versions.keys` directly; `Arc<FullManifest>` auto-derefs and the slice coercion satisfies the API. Zero alloc. Diagnostic context: standalone manifest-bench cap=128 hits avg_conc=92 with the same reqwest stack; ruborist held at 55-57 even after Mutex/clone hot-path eliminations elsewhere. Allocator pressure on resolver threads is a remaining structural source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`normalize_spec` unconditionally allocated `(String, String)` — including the ~99 % case where the spec has no `npm:` or `workspace:` prefix and no normalisation is needed. ~5460 String allocs per ant-design preload (2 per `resolve_package` call × 2730 unique deps), all on resolver futures driven by main task's cooperative polling. Switch return type to `(Cow<'a, str>, Cow<'a, str>)`. Common path returns `Cow::Borrowed` and pays zero allocations. `npm:` / `workspace:` prefix paths still build the substring borrow without allocating (they're already slices into the input). Callers (3 sites: traits/registry.rs, service/registry.rs, resolver/registry.rs) work unchanged thanks to Cow's `Deref<Target=str>`. Diagnostic context: standalone manifest-bench cap=128 reaches avg_conc=92 with the same reqwest stack; ruborist held at 55-58 even after Mutex / FullManifest / progress-bar / tracing / keys.clone() eliminations. Allocator pressure on the resolver worker pool — each per-future hot-path String alloc compounds across 2700+ futures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Old design: main task owned `FuturesUnordered`, polled all preload futures cooperatively, and ran every per-future continuation (post-await body, completion handler, dispatch refill) on the same single task. The deeper await chain inside `resolve_package` (cache check + `OnceMap::get_or_init` + `RetryIf` + `request.send` + `bytes` + parse `spawn_blocking`) made each future yield 5+ times, and every yield round-tripped through main — saturating it. CI ant-design preload sustained avg_conc=55-61 even after Mutex / allocator hot-path eliminations, while the standalone manifest-bench (same reqwest stack, no resolver) hit 92 at the same cap. New design: N long-lived `tokio::spawn` workers pulling from a shared lock-free `SegQueue<Dep>` with `DashSet` dedup. Each worker owns an `Arc<R>` clone and runs `resolve_package` on tokio's global executor — futures progress fully independently, no cooperative poll bottleneck. Main task only drains an `mpsc::unbounded_channel` of completions to fire receiver events + on_manifest callback. Termination: workers track `dispatched`/`completed: AtomicUsize` and park on a shared `Notify` when the queue is empty. When the last completion makes `completed == dispatched` and the queue is empty, the finishing worker raises a `shutdown` flag and wakes others; all workers drop their result_tx clones, the channel closes, and the main `recv().await` loop exits. Trait surface change: - `RegistryClient`'s default-method futures gained `+ Send` bounds (and `Self: Sync` where blanket-default fn calls into `&self`) - `MockRegistryClient` + `MockPackage` now `derive(Clone)` so tests can wrap the mock in `Arc` for the new signature - `preload_manifests` takes `registry: Arc<R>` (was `&R`); call site in `run_preload_phase` clones the borrowed registry into a fresh `Arc`. Bound at every public surface up the chain bumped to `R: RegistryClient + Clone + Send + Sync + 'static`, `R::Error: Send`. - `resolve_package` / `resolve_registry_dep` / `process_dependency` helper bounds gained `+ Sync` (their `R::Future: Send` bounds are inherited from the trait change above). Local npmmirror smoke (cap=256 via DEFAULT_CONCURRENCY): avg_conc jumped from ~55 (old) to 86.8 (new). Worker-pool delivers the parallelism standalone manifest-bench was already showing. Tests use `#[tokio::test(flavor = "multi_thread", worker_threads = 2)]` since worker-pool needs spawn-able runtime; ruborist's dev-dependencies on `tokio` add the `rt-multi-thread` feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Worker-pool preload (ruborist ed7b551) sustains avg_conc=66 at cap=96 on CI vs the prior FuturesUnordered's 58 — and same-run standalone manifest-bench reached 93/2.14s at cap=128 with the identical reqwest stack. With workers running independently on tokio's global executor (no cooperative-poll serialisation through one task), more cap slots translate directly to more parallel TCP requests in flight. The Cloudflare per-req throttle curve we measured under the old architecture (per-req wall doubled at cap 128→256) was conflated with the FuturesUnordered ceiling. With workers decoupled the curve needs re-measurement; cap=128 is the cheapest experiment that brings ruborist to standalone parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Worker-pool sweep on CI ant-design p1_resolve: cap=96: wall=2.23s avg_conc=66 per-req=53ms cap=128: wall=2.15s avg_conc=84 per-req=66ms → per-req drops with cap (refutes the FuturesUnordered-era "server throttle past 70 conc" reading; that was main-task saturation). Same-run standalone manifest-bench cap=192 hit 130 conc / 2.10s, so cap=160 should bring another 0.1-0.2s out of preload before the curve flattens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Worker-pool preload at cap=160 surfaced parse blocking-pool queue saturation: parse diag showed `queue p95=200ms sum=70-89s` over 2730 manifests — ~26ms average queue wait per parse. That accounted for the entire ruborist-vs-standalone per-req gap (55ms vs 28ms under identical Cloudflare conditions). Cause: blocking pool is sized to `worker_threads` (= num_cpus = 4 on CI). Worker-pool preload sustains 80+ concurrent fetches; each spawn_blocking parse goes into a 4-slot queue and waits behind others. Original spawn_blocking offload was justified under FuturesUnordered + main-task polling (would have stalled the single poll loop), but worker-pool runs each future on tokio's global executor — a brief 1-5ms sync CPU burst on a worker is cheaper than spawn_blocking dispatch + queue wait. Inline simd_json parse on the resolving worker. Each worker thread parses its own response immediately after `bytes().await`; no extra hop. Worker-pool's independent task scheduling means one stalled worker doesn't starve the others — we just lose ~5ms of one worker's cycle, which is far less than the dispatch-and-queue round-trip we were paying. Both fetch sites updated (`fetch_full_manifest` for npmjs full manifest path, `fetch_version_manifest` for semver registries like npmmirror). Expected: ruborist preload per-req drops from 55-66ms → ~30-40ms (matching standalone), wall toward ~1.7s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cap=160 + inline parse pushed avg_conc to 119 — past the per-source Cloudflare throttle threshold. Per-req inflated 55 ms → 93 ms; net wall flat at 2.14s. cap=128 + inline parse: avg_conc target ~85-95 (matching standalone manifest-bench cap=128 = 70-90 / 1.6-2.0s under similar Cloudflare conditions). Inline parse alone (no spawn_blocking queue) plus sane concurrency should land preload at ~1.7-1.8s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`find_workspaces_from_pkg` was reading every workspace's package.json sequentially in a `for path in matched_paths { read_package_json(...).await }` loop. Ant-design has ~200 workspace packages; at ~1 ms per single-file async FS round-trip on CI runners that's ~150-200 ms of serial I/O — the largest unmeasured chunk between preload completion and lockfile write (hyperfine total p1 minus instrumented sub-phases). Collect workspace paths from every glob pattern first, then dispatch all `read_package_json` calls into a `FuturesUnordered` for parallel execution. Each read is small (typical workspace package.json < 4 KB) so completion order is irrelevant — just push results as they land. Expected: ant-design p1_resolve hyperfine wall drops by 100-150 ms (toward ~2.40s vs current 2.58s). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

p1_resolve hyperfine still has ~80 ms of unmeasured wall after parallel workspace reads (commit bf14995). Suspected: 2-3 MB package-lock.json serialize + atomic-write-rename. Add per-step timing log so we know which knob to turn (compact-json, to_writer streaming, async fs::rename quirks, etc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add timer covering find_root_path → read root package.json → engines inject → graph init → root edges → workspace discovery → workspace nodes/edges. This is the chunk between hyperfine start and build_deps entry — currently uninstrumented and the residual ~85ms gap source after lockfile timing showed save is only 11ms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Linter-applied formatting cleanup, no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Original cap was sized for the FuturesUnordered preload that dispatched 128 simd_json parses through `spawn_blocking` in a burst — letting the default 512 cap run gave bimodal wall (M2: 2.7s fast / 6.9s thrash). Capping at `worker_threads` eliminated the thrash peak. After commit f3f616d (inline parse) preload no longer uses the blocking pool. The dominant consumer is now `cloner.rs` during the install phase: every file's hardlink / clonefile / copy goes through `spawn_blocking`, ~50000 short syscalls per ant-design install. Each syscall is near-instant, so the cap rarely backpressures, but cap=4 on CI does limit how fast cloner can fire syscalls in parallel. Raise cap to `max(worker_threads * 4, 32)`: enough headroom for cloner to keep multiple syscalls in flight, low enough that the historical thrash regime (hundreds of churning threads) stays avoided. Pool is per-runtime; idle threads die after 10s. Expected: small p3_cold_install improvement (current utoo 5.74s vs bun 7.71s); preload phase unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… 32)" This reverts commit 132ef36.

A/B test: replace `entries.par_chunks(WRITE_CHUNK_SIZE).try_for_each` with a plain sequential `for entry in &entries` loop. Each tarball still runs in its own outer `rayon::spawn` task (cross-package parallelism preserved); only the within-tarball write fan-out is removed. Goal: measure whether rayon's intra-package parallelism still earns its keep after the worker-pool preload rewrite. Cross-package parallelism alone may already saturate IO; if so, removing the inner par_chunks cuts work-stealing futex traffic + thread sync overhead with zero throughput cost. If p3_cold_install regresses ≥0.3s → intra-package writes are genuinely IO-bound across cores, restore par_chunks. If p3 unchanged or improves → simpler sequential code wins. This is a test commit. Will be reverted if regression measured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…act" This reverts commit c7c847d.

`clone_dir` (Linux hardlink/copy path) was using `tokio::task::spawn_blocking` per package — at default cap=4 on CI, only 4 packages cloned at once, each running all file hardlinks sequentially internally. ~3500 packages × N files per install all funneled through that bounded pool. Switch to the same pattern `extractor.rs` already uses: - `rayon::spawn` per package replaces `spawn_blocking` (cross-package parallelism via rayon work-stealing — global pool, not capped at worker_threads) - `par_chunks(CLONE_CHUNK_SIZE)` for the inner hardlink/copy loop (intra-package fan-out across cores; same chunk size = 32 as extractor) Trade-offs: - EXDEV `force_copy` latch is now per-chunk instead of global per clone — chunks each rediscover cross-device errors and fall back locally. A few extra hardlink-then-copy round-trips at chunk boundaries, acceptable for the rare cross-device install. - Pool unification: tokio blocking pool now mostly idle (just git + http tarball + a few one-shot commands), rayon handles all the high-volume IO. Cuts the 3-pool fragmentation observed earlier. Tested: - Iter 1 of this loop (cap bump from N to max(N*4, 32)): no p3 win, p4 regressed → cap raise alone wasn't the answer. - Iter 2 (drop intra-package par_chunks in extractor): p3 +3.67s, σ exploded 0.04 → 2.85s → intra-package fan-out is essential. - This commit applies the same fan-out to clone_dir for the same reason. macOS `clonefile` path (target_os = "macos") unchanged — clonefile is a single syscall per file, different perf profile. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit 9229e16.

- delete crates/manifest-bench (debug-only, never merged) - tombi format crates/ruborist/Cargo.toml - typos: unparseable → unparsable in bench/pm-bench.sh

This reverts commit ce574d5.

gemini-code-assist

Code Review

This pull request introduces significant performance optimizations to the dependency resolver and installer, targeting CPU overhead, memory allocations, and network efficiency. Key enhancements include refactoring manifest parsing to use lazy simd_json subtrees with memoization, integrating aws-lc-rs for faster TLS handshakes, and adopting lock-free queues for dependency management. The update also features a round-robin DNS resolver, parallelized workspace discovery, and fire-and-forget disk cache writes. Detailed diagnostic logging for HTTP and parsing has been added to monitor pipeline performance. Review feedback suggests further reducing allocations in the DNS rotation logic and addressing a thundering herd risk in manifest fetch deduplication.

gemini-code-assist · 2026-04-27T11:52:37Z

+    fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> {
+        if addrs.is_empty() {
+            return Vec::new();
+        }
+        let rotate = |slice: &[SocketAddr]| -> Vec<SocketAddr> {
+            if slice.is_empty() {
+                return Vec::new();
+            }
+            let start = offset % slice.len();
+            slice[start..]
+                .iter()
+                .chain(&slice[..start])
+                .copied()
+                .collect()
+        };
+        let v6: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv6()).copied().collect();
+        let v4: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv4()).copied().collect();
+        let v6_rot = rotate(&v6);
+        let v4_rot = rotate(&v4);
+        // Preserve v6-first ordering if that's what the resolver gave us;
+        // Happy Eyeballs will still prefer v6 when it's reachable.
+        let v6_first = addrs.first().map(|a| a.is_ipv6()).unwrap_or(true);
+        if v6_first {
+            v6_rot.into_iter().chain(v4_rot).collect()
+        } else {
+            v4_rot.into_iter().chain(v6_rot).collect()
+        }
+    }


The rotate_addrs function is on the hot path for connection establishment and currently performs up to 5 allocations per call (two for filtering families, two for rotating them, and one for the final collection). Given the PR's focus on reducing allocator pressure, this can be optimized to a single allocation by iterating over the input slice multiple times instead of creating intermediate vectors.

fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> { if addrs.is_empty() { return Vec::new(); } let mut result = Vec::with_capacity(addrs.len()); let v6_first = addrs[0].is_ipv6(); let mut append_family = |is_v6: bool| { let family_count = addrs.iter().filter(|a| a.is_ipv6() == is_v6).count(); if family_count == 0 { return; } let start_offset = offset % family_count; // First pass: from start_offset to end let mut current_count = 0; for addr in addrs.iter().filter(|a| a.is_ipv6() == is_v6) { if current_count >= start_offset { result.push(*addr); } current_count += 1; } // Second pass: from beginning to start_offset current_count = 0; for addr in addrs.iter().filter(|a| a.is_ipv6() == is_v6) { if current_count < start_offset { result.push(*addr); current_count += 1; } else { break; } } }; if v6_first { append_family(true); append_family(false); } else { append_family(false); append_family(true); } result }

gemini-code-assist · 2026-04-27T11:52:37Z

+        let shared = self
+            .inflight
+            .get_or_init(name.to_string(), || async {
+                self.fetch_full_manifest_network(name).await.ok()
+            })
+            .await;
+
+        match shared {
+            Some(arc) => Ok((*arc).clone()),
+            None => {
+                // OnceMap clears the key on None, so the next caller
+                // retries the fetch. Retry once here with a fresh error
+                // so we surface a useful message to this caller.
+                self.fetch_full_manifest_network(name).await
+            }
+        }


This implementation introduces a thundering herd problem when a manifest fetch fails. The OnceMap::get_or_init closure returns None on failure (via .ok()), which typically causes the OnceMap to clear the entry. Consequently, all concurrent callers waiting for the same package will receive None and proceed to execute fetch_full_manifest_network simultaneously at line 247.

To fix this, consider storing a Result<Arc<FullManifestResult>, Arc<RegistryError>> in the OnceMap so that failures are also deduplicated and shared among all waiters.

github-actions · 2026-04-27T12:09:32Z

📊 pm-bench-phases · `a263293` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	9.15s	0.14s	10.05s	9.87s	636M	325.2K
utoo-npm	12.65s	2.31s	11.33s	13.56s	1.27G	163.7K
utoo	8.80s	1.08s	10.25s	11.93s	2.11G	253.9K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	16.1K	17.1K	1.17G	6M	1.83G	1.72G	1M
utoo-npm	197.1K	172.7K	1.14G	6M	1.68G	1.68G	2M
utoo	112.6K	65.9K	1.13G	6M	1.68G	1.68G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.85s	0.93s	3.78s	1.15s	513M	190.6K
utoo-npm	5.59s	0.13s	5.02s	1.77s	431M	73.4K
utoo	4.27s	0.05s	4.13s	2.06s	1.37G	169.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	11.3K	3.6K	202M	3M	104M	-	1M
utoo-npm	66.1K	2.6K	202M	2M	9M	5M	2M
utoo	82.7K	5.2K	197M	3M	7M	5M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	7.20s	0.86s	6.18s	9.53s	598M	202.5K
utoo-npm	7.56s	1.39s	5.54s	11.04s	932M	117.6K
utoo	9.74s	0.57s	5.50s	10.93s	840M	110.9K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	5.6K	6.7K	993M	4M	1.73G	1.73G	1M
utoo-npm	120.8K	78.0K	965M	3M	1.67G	1.67G	2M
utoo	142.4K	67.0K	966M	5M	1.67G	1.67G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.52s	0.08s	0.16s	2.51s	134M	31.0K
utoo-npm	2.48s	0.21s	0.61s	3.92s	82M	19.1K
utoo	2.07s	0.06s	0.41s	3.40s	62M	13.6K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	194	20	6K	33K	1.84G	1.73G	1M
utoo-npm	50.3K	21.7K	19K	15K	1.67G	1.67G	2M
utoo	16.4K	9.4K	17K	25K	1.68G	1.67G	2M

npmmirror.com

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	25.70s	6.24s	9.35s	9.92s	553M	397.7K
utoo-npm	21.52s	3.34s	8.00s	13.43s	719M	114.7K
utoo	12.80s	2.86s	7.22s	11.53s	770M	119.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	58.7K	5.6K	1.12G	10M	1.85G	1.73G	2M
utoo-npm	236.0K	103.1K	977M	8M	1.67G	1.68G	2M
utoo	153.3K	59.3K	983M	8M	1.67G	1.68G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	1.52s	0.09s	4.04s	1.12s	655M	188.4K
utoo-npm	3.18s	0.08s	1.47s	0.80s	75M	16.3K
utoo	0.86s	0.03s	0.87s	0.34s	81M	17.2K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	5.1K	6.2K	152M	2M	106M	-	2M
utoo-npm	44.4K	1.1K	12M	2M	-	4M	2M
utoo	16.4K	321	16M	2M	-	4M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	18.30s	1.06s	5.85s	8.86s	248M	99.3K
utoo-npm	21.92s	0.87s	6.24s	12.20s	714M	88.9K
utoo	18.95s	6.97s	5.85s	10.98s	667M	84.0K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	36.7K	3.5K	998M	7M	1.73G	1.73G	2M
utoo-npm	195.2K	110.2K	965M	6M	1.67G	1.67G	2M
utoo	135.8K	61.1K	966M	6M	1.67G	1.67G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.19s	0.18s	0.20s	2.40s	136M	31.6K
utoo-npm	2.49s	0.20s	0.60s	3.91s	82M	19.6K
utoo	2.19s	0.10s	0.42s	3.45s	62M	13.6K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	383	22	7M	41K	1.88G	1.72G	2M
utoo-npm	48.2K	20.7K	59K	11K	1.67G	1.67G	2M
utoo	16.7K	8.9K	62K	12K	1.67G	1.67G	2M

github-actions · 2026-04-27T12:21:21Z

📊 pm-bench-phases · `a263293` · mac (`macos-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	14.72s	0.20s	5.56s	14.61s	794M	51.3K
utoo-npm	13.94s	0.51s	7.49s	14.74s	900M	98.3K
utoo	13.62s	1.37s	6.95s	14.11s	1.96G	173.6K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	16.0K	142.6K	-	-	1.76G	1.91G	1M
utoo-npm	13.0K	362.5K	-	-	1.63G	1.86G	2M
utoo	9.8K	224.6K	-	-	1.63G	1.86G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.28s	0.02s	2.46s	1.00s	478M	31.2K
utoo-npm	4.67s	0.21s	3.80s	1.74s	546M	37.4K
utoo	4.31s	0.10s	3.55s	1.98s	1.62G	106.9K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	10	25.5K	-	-	110M	-	1M
utoo-npm	16	77.2K	-	-	28M	5M	2M
utoo	36	91.5K	-	-	27M	5M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	14.79s	4.42s	3.17s	14.23s	520M	33.9K
utoo-npm	11.87s	3.13s	3.25s	12.93s	823M	80.7K
utoo	10.21s	0.42s	3.08s	12.75s	714M	79.8K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	5.4K	133.0K	-	-	1.70G	1.94G	1M
utoo-npm	1.4K	235.7K	-	-	1.60G	1.87G	2M
utoo	1.3K	156.6K	-	-	1.60G	1.87G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	4.17s	0.87s	0.10s	2.06s	52M	3.9K
utoo-npm	2.91s	0.13s	0.48s	2.47s	88M	6.6K
utoo	2.87s	0.40s	0.31s	2.16s	82M	5.9K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	16.7K	1.4K	-	-	1.86G	1.91G	1M
utoo-npm	12.4K	70.8K	-	-	1.60G	1.85G	2M
utoo	13.2K	19.1K	-	-	1.63G	1.85G	2M

npmmirror.com

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	25.29s	0.80s	5.66s	14.83s	583M	37.7K
utoo-npm	24.58s	5.15s	6.13s	16.64s	734M	75.2K
utoo	15.90s	2.48s	4.93s	14.12s	687M	74.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	14.3K	150.5K	-	-	1.79G	1.89G	2M
utoo-npm	4.0K	434.8K	-	-	1.61G	1.84G	2M
utoo	4.4K	256.9K	-	-	1.61G	1.87G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.54s	0.13s	2.35s	1.14s	534M	34.8K
utoo-npm	4.82s	0.04s	2.25s	1.34s	81M	5.9K
utoo	7.05s	8.48s	1.40s	0.57s	82M	6.0K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	8	30.2K	-	-	111M	-	2M
utoo-npm	5	41.8K	-	-	-	4M	2M
utoo	30	25.3K	-	-	-	4M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	21.42s	0.84s	3.33s	14.06s	251M	16.7K
utoo-npm	27.86s	0.88s	4.54s	15.20s	650M	72.4K
utoo	25.61s	6.95s	4.03s	13.92s	747M	72.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	1.9K	137.0K	-	-	1.70G	1.94G	2M
utoo-npm	1.5K	374.5K	-	-	1.61G	1.87G	2M
utoo	1.3K	230.6K	-	-	1.61G	1.87G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.97s	0.49s	0.10s	1.96s	50M	3.8K
utoo-npm	3.65s	0.01s	0.52s	2.64s	97M	7.1K
utoo	3.59s	0.32s	0.33s	2.29s	87M	6.2K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	13.8K	1.1K	-	-	1.87G	1.91G	2M
utoo-npm	12.3K	72.3K	-	-	1.61G	1.83G	2M
utoo	13.3K	19.9K	-	-	1.61G	1.83G	2M

elrrrrrrr and others added 30 commits April 27, 2026 18:02

ci(pm-bench-phases): trigger on PR label so it runs before default-br…

3d272e2

…anch merge Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Revert "perf(ruborist): use HTTP/1.1 + connection pool for manifest f…

499af54

…etching" This reverts commit 51b5ede.

debug(ruborist): switch histogram output to stdout (survives metrics …

a14b17e

…wrapper)

elrrrrrrr and others added 22 commits April 27, 2026 18:03

chore(ruborist): drop trailing newline in preload.rs

cbb22f4

Linter-applied formatting cleanup, no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Revert "perf(pm): raise tokio max_blocking_threads from N to max(N*4,…

4e598bb

… 32)" This reverts commit 132ef36.

Revert "test(pm): drop intra-package rayon par_chunks in tarball extr…

3014901

…act" This reverts commit c7c847d.

Revert "perf(pm): clone_dir uses rayon (mirror extractor pattern)"

6e3a658

This reverts commit 9229e16.

chore(pm): fix CI gates on rebased #2818

d579072

- delete crates/manifest-bench (debug-only, never merged) - tombi format crates/ruborist/Cargo.toml - typos: unparseable → unparsable in bench/pm-bench.sh

Revert "perf(ruborist): preload worker-pool replaces FuturesUnordered"

88221ad

This reverts commit ce574d5.

elrrrrrrr added benchmark Run pm-bench on PR bench-phases Run pm-bench-phases workflow labels Apr 27, 2026

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

This was referenced Apr 27, 2026

perf(pm): probe — #2836 minus OnceMap #2837

Closed

perf(pm): minimum-viable preload bundle (worker-pool + OnceMap + aws-lc-rs) #2838

Closed

elrrrrrrr closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): probe — #2818 minus worker-pool#2836

perf(pm): probe — #2818 minus worker-pool#2836
elrrrrrrr wants to merge 103 commits intonextfrom
perf/strip-worker-pool

elrrrrrrr commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented Apr 27, 2026

Summary

Context

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 27, 2026

📊 pm-bench-phases · a263293 · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

npmmirror.com

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

github-actions Bot commented Apr 27, 2026

📊 pm-bench-phases · a263293 · mac (macos-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

npmmirror.com

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📊 pm-bench-phases · `a263293` · linux (`ubuntu-latest`)

📊 pm-bench-phases · `a263293` · mac (`macos-latest`)