Skip to content

perf: use hardlinks for code snapshots to reduce disk and inode usage#2476

Draft
kajalj22 wants to merge 2 commits into
mainfrom
kajalj/hardlink-snapshots
Draft

perf: use hardlinks for code snapshots to reduce disk and inode usage#2476
kajalj22 wants to merge 2 commits into
mainfrom
kajalj/hardlink-snapshots

Conversation

@kajalj22
Copy link
Copy Markdown
Contributor

Summary

  • Replace full rsync copies with hardlinked copies via --link-dest in tools/code_snapshot.sh
  • Each test snapshot still gets its own isolated directory, but files share storage on disk until modified
  • On Lustre this is near-instant and reduces nightly CI disk usage from ~N×repo_size to ~1×repo_size (e.g. 254 GB → ~13 GB on pretyche for one pipeline)

Motivation

In nightly CI with ~20 tests, code_snapshot.sh creates ~20 full copies of the repo on Lustre — identical code, differing only in output logs written per job. This wastes ~254 GB and ~194K files per pipeline.

How it works

rsync --link-dest creates hardlinks to the source files when content matches. Hardlinks share the same inode on disk, so no additional storage is consumed. If a file is later modified (e.g., by an editor doing save-as-rename), the hardlink breaks automatically and only that copy diverges — preserving snapshot isolation.

Safety analysis

  • No snapshotted file is ever modified in-place during test execution — all writes (continue.sh, $JOB_ID-logs/, metrics.json, etc.) create new files
  • Standard editors (vim, VS Code, etc.) break hardlinks on save via temp-file-and-rename, so the developer workflow is unaffected
  • Lustre supports hardlinks within the same filesystem; source and destination are always on the same mount

Test plan

  • Run a nightly CI pipeline and verify snapshots are created successfully
  • Confirm disk usage is reduced (du -sh code_snapshots/ before/after)
  • Verify ls -li shows shared inodes across snapshot directories
  • Confirm test outputs (logs, metrics) are written correctly and independently
  • Test developer workflow: edit a file in one snapshot, verify other snapshots are unaffected

🤖 Generated with Claude Code

Replace full rsync copies with hardlinked copies via --link-dest. Each
test snapshot still gets its own isolated directory, but files share
storage on disk until modified. On Lustre this is near-instant and
reduces nightly CI disk usage from ~N*repo_size to ~1*repo_size
(e.g. 254 GB → ~13 GB on pretyche for one pipeline).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
@kajalj22 kajalj22 requested a review from a team as a code owner May 12, 2026 19:23
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Align test scripts with PY_EXECUTABLES and the uv design doc
(docs/design-docs/uv.md), which mandate --locked to prevent
accidental lockfile updates. This also prevents uv.lock corruption
when snapshots use hardlinks (see previous commit).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
@kajalj22 kajalj22 requested a review from a team as a code owner May 12, 2026 21:17
@kajalj22
Copy link
Copy Markdown
Contributor Author

/ok to test 2d43719

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant