fix(git): cap default zstd-threads to 4 in server config#281
Draft
fix(git): cap default zstd-threads to 4 in server config#281
Conversation
The git strategy's zstd-threads config defaulted to 0, which expands to runtime.NumCPU() at compression/decompression time. On high-core hosts (e.g. 32-64 vCPU nodes) this means each snapshot subprocess spawns one zstd worker per core, with each worker holding several MB of buffers. cachewd schedules three periodic jobs per warm mirror (snapshot, lfs-snapshot, mirror-snapshot) and they fire concurrently across all mirrors at startup. Combined with all-cores zstd, total transient memory during warm-up scales as repos x 3 x cores, easily reaching tens of GB on a many-mirror server and contributing to OOM kills. Cap the server-side default to 4 threads, which is enough to keep zstd off the critical path without blowing per-process memory. Operators that want the legacy behaviour can still set zstd-threads = 0 explicitly. The CLI's zstd-threads flags (cmd/cachew) keep 0 as their default because CLI invocations are short-lived single operations where using all cores is the right behaviour. Amp-Thread-ID: https://ampcode.com/threads/T-019ddaf9-b3be-7604-90b5-18566c260d1f Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The git strategy's
zstd-threadsconfig defaulted to0, which expands toruntime.NumCPU()at compression/decompression time (see client/archive.go).On hosts with many cores (e.g. 32–64 vCPU), every snapshot subprocess invokes
zstd -T<NumCPU>, and each zstd worker holds several MB of buffers. That's already a few hundred MB per snapshot subprocess.This compounds badly with cachewd's snapshot scheduling: scheduleSnapshotJobs submits three periodic jobs per warm mirror (
snapshot-periodic,lfs-snapshot-periodic,mirror-snapshot-periodic), and on startup all three fire immediately for every mirror discovered by warmExistingRepos. Total transient memory during warm-up scales asmirrors × 3 × cores, easily reaching tens of GB on a many-mirror server.Change
Cap the server-side default to
4threads. Operators who want the legacy behaviour can setzstd-threads = 0(or any other value) explicitly in their HCL config.Why 4?
4 threads is enough to keep zstd off the critical path for typical mirror sizes without giving any single snapshot the ability to spike memory and CPU across the whole node. It's a conservative starting point — operators sizing for very large monorepos can still tune up.
What this does NOT change
The CLI flags in
cmd/cachew/{main,git}.gokeep0as their default, becausecachew save/restore/git cloneare short-lived single operations where using all cores is the right behaviour.Follow-ups (separate PRs)
zstd-threadssetting.