Cache all cids in memory in pruner by hannahhoward · Pull Request #175 · application-research/autoretrieve

hannahhoward · 2023-02-03T02:54:44Z

Goals

Performance testing indicates the pruner is still a huge bottleneck on performance, and the hot spot is reading the all keys chan.

By our calculations, with not too much penalty we can keep a list of all keys in memory, and thus avoid this bottleneck.

Implementation

instead of tracking pins, track all cids including whether they are pinned
periodically sync with disk
use a sync.Pool to avoid unneeded allocations

For discussion

This is complicated enough I want to write tests before anyone merges it, but want folks to review the approach now.

for performance, keep a list of cids in memory

codecov-commenter · 2023-02-03T02:56:59Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 5.31%. Comparing base (27eabc8) to head (a1610e9).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
blocks/randompruner.go	0.00%	60 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff            @@
##           master    #175      +/-   ##
=========================================
- Coverage    5.43%   5.31%   -0.12%     
=========================================
  Files          14      14              
  Lines        1639    1674      +35     
=========================================
  Hits           89      89              
- Misses       1545    1580      +35     
  Partials        5       5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

blocks/randompruner.go

rvagg · 2023-02-03T03:28:48Z

I still hate this file-writing business. I reckon now it'd be more efficient, and safe, to just iterate over the CID map and roll the dice on each one, with some probability of 1% or something small. Delete when they lose the dice roll and aren't pinned, stop iterating if we reach our threshold, or iterate again if we haven't (maybe with some safety around it like don't loop more than 20 times). Go's unstable map iteration ordering even helps a bit here.

Co-authored-by: Rod Vagg <rod@vagg.org>

blocks/randompruner.go

elijaharita · 2023-02-07T08:34:25Z

i didn't take the effort to estimate how much memory cids would take up when i wrote this. in my head it sounded big. but u are right, with a blockstore target size of around 100GiB it seems fairly negligible, a server who wants bigger cache should have plenty of memory available too.

if we do keep all the cids in memory, it opens the door for a much less dumb pruner solution - fifo? doesn't need to be implemented now but probably should do at some point.

elijaharita · 2023-02-07T08:36:21Z

and i think @rvagg is right, file io definitely should go away now if keys are in memory already

Co-authored-by: Rod Vagg <rod@vagg.org>

feat(blocks): cache all cids in memory in pruner

4e06040

for performance, keep a list of cids in memory

hannahhoward requested review from elijaharita and rvagg February 3, 2023 02:56

rvagg reviewed Feb 3, 2023

View reviewed changes

blocks/randompruner.go Outdated Show resolved Hide resolved

rvagg reviewed Feb 3, 2023

View reviewed changes

blocks/randompruner.go Show resolved Hide resolved

rvagg reviewed Feb 3, 2023

View reviewed changes

blocks/randompruner.go Show resolved Hide resolved

hannahhoward and others added 3 commits February 3, 2023 10:32

Update blocks/randompruner.go

f1ef954

Co-authored-by: Rod Vagg <rod@vagg.org>

Update blocks/randompruner.go

0f53338

Co-authored-by: Rod Vagg <rod@vagg.org>

Update blocks/randompruner.go

8e60efc

Co-authored-by: Rod Vagg <rod@vagg.org>

rvagg reviewed Feb 6, 2023

View reviewed changes

blocks/randompruner.go Outdated Show resolved Hide resolved

Update blocks/randompruner.go

a1610e9

Co-authored-by: Rod Vagg <rod@vagg.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache all cids in memory in pruner#175

Cache all cids in memory in pruner#175
hannahhoward wants to merge 5 commits intomasterfrom
feat/faster-pruner

hannahhoward commented Feb 3, 2023

Uh oh!

codecov-commenter commented Feb 3, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rvagg commented Feb 3, 2023

Uh oh!

Uh oh!

elijaharita commented Feb 7, 2023

Uh oh!

elijaharita commented Feb 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hannahhoward commented Feb 3, 2023

Goals

Implementation

For discussion

Uh oh!

codecov-commenter commented Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rvagg commented Feb 3, 2023

Uh oh!

Uh oh!

elijaharita commented Feb 7, 2023

Uh oh!

elijaharita commented Feb 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Feb 3, 2023 •

edited

Loading