Skip to content

HBASE-30038: RefCnt Leak error when caching#7995

Open
dParikesit wants to merge 2 commits intoapache:masterfrom
dParikesit:5-RefCnt-release
Open

HBASE-30038: RefCnt Leak error when caching#7995
dParikesit wants to merge 2 commits intoapache:masterfrom
dParikesit:5-RefCnt-release

Conversation

@dParikesit
Copy link
Copy Markdown
Contributor

JIRA: HBASE-30038

This bug is similar to HBASE-28890

HFileBlock.Writer.getBlockForCaching() creates a ref-counted HFileBlock, and the caller must release its own reference after the cache has taken the ownership it needs.

In HFileWriterImpl.java, the leak happened when shouldCacheBlock() returned before cacheFormatBlock.release() ran.

In HFileBlockIndex.java, the intermediate-index path cached blockForCaching but never released the reference at all.

Because these blocks can be backed by pooled off-heap ByteBuffs, repeated HFile writes could steadily drain allocator buffers and effectively leak memory, even though the blocks were only meant to live long enough to be considered for cache-on-write.

Mockito.verify(cache, Mockito.never()).cacheBlock(Mockito.any(), Mockito.any(),
Mockito.anyBoolean(), Mockito.anyBoolean());
System.gc();
Thread.sleep(1000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dParikesit
I think the 1-second assumption might bite us on slower systems and make the test flaky.
What if we wrap this in a loop instead? We could retry up to 15 times with a small delay—that way it succeeds as soon as it's ready rather than relying on luck.

writeDataBlocksAndCreateIndex(hbw, outputStream, biw);
assertTrue(biw.getNumLevels() >= 3);
System.gc();
Thread.sleep(1000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dParikesit
I think the 1-second assumption might bite us on slower systems and make the test flaky.
What if we wrap this in a loop instead? We could retry up to 15 times with a small delay—that way it succeeds as soon as it's ready rather than relying on luck.

@wchevreuil
Copy link
Copy Markdown
Contributor

Please consider @vaijosh suggestions and also fix spotless issues.

@dParikesit
Copy link
Copy Markdown
Contributor Author

Thanks for the suggestions. I'll get back to you after I fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants