Skip to content

Backport release/v6.5: Wait for blocksync goroutines on Stop to fix leveldb shutdown panic#3442

Closed
seidroid[bot] wants to merge 1 commit into
release/v6.5from
backport-3415-to-release/v6.5
Closed

Backport release/v6.5: Wait for blocksync goroutines on Stop to fix leveldb shutdown panic#3442
seidroid[bot] wants to merge 1 commit into
release/v6.5from
backport-3415-to-release/v6.5

Conversation

@seidroid
Copy link
Copy Markdown

@seidroid seidroid Bot commented May 15, 2026

Backport of #3415 to release/v6.5.

…3415)

Reactor.OnStart and BlockPool.OnStart started their long-running
goroutines (requestRoutine, poolRoutine, processBlockSyncCh,
processPeerUpdates, makeRequestersRoutine) with raw `go fn(ctx)` using
the outer context. They were therefore not registered with the
BaseService WaitGroup, and Stop() never waited for them. The outer ctx
also outlived Stop, so the goroutines kept running after Stop returned.

During node shutdown this raced nodeImpl.OnStop's blockStore.Close():
poolRoutine, still inside SaveBlock -> Base() -> bs.db.Iterator,
observed its leveldb table reader released and panicked with
"leveldb/table: reader released".

Route each goroutine through BaseService.Spawn so it is tracked by the
WaitGroup and bound to inner.ctx. Stop() now cancels them and blocks
until they exit, which happens before the node closes the BlockStore DB.
Add a regression test that asserts no blocksync goroutines remain after
Reactor.Stop() returns.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes blocksync/consensus goroutine lifecycles and shutdown
ordering; mistakes could cause hangs or missed transitions, but the
change is localized and covered by a new regression test.
>
> **Overview**
> Fixes blocksync shutdown races by moving long-running goroutines off
raw `go` launches and onto `BaseService.Spawn`/`SpawnCritical`, ensuring
`Stop()` cancels the correct context and waits for all blocksync
routines to exit before the block store is closed.
>
> Adds readiness gates (`blocksyncReady`, `consensusReady`) so routines
can be pre-spawned in `Reactor.OnStart` yet only begin work when block
sync starts or the consensus handoff completes, and updates
`BlockPool`/`bpRequester` shutdown to avoid blocking on a full
`requestsCh`.
>
> Updates the consensus handoff API (`SwitchToConsensus` signature) and
adds a regression test (`TestReactor_OnStopWaitsForGoroutines`) that
asserts no `internal/blocksync` goroutines remain after `Reactor.Stop()`
returns.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
5479315. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

(cherry picked from commit a27b9d6)
@cursor
Copy link
Copy Markdown

cursor Bot commented May 15, 2026

PR Summary

Medium Risk
Touches blocksync lifecycle/concurrency (goroutine spawning, shutdown ordering, consensus handoff), so regressions could manifest as stuck shutdowns or stalled syncing, but changes are localized and covered by a regression test.

Overview
Prevents blocksync shutdown races by moving long-running goroutines under libs/service structured concurrency so Stop() cancels the correct context and waits for exit (instead of leaking go routines past shutdown).

Blocksync routines (requestRoutine, poolRoutine, channel/peer processors, and the pool requester loop) are now pre-spawned in Reactor.OnStart/BlockPool.OnStart and gated via AtomicSend signals for later activation; the blocksync→consensus handoff now uses SwitchToConsensus(state, skipWAL) and unblocks the pre-spawned auto-restart monitor.

BlockPool shutdown is made non-blocking with respect to a full requestsCh (context-aware sendRequest, stop requesters outside locks), and a new regression test asserts no blocksync goroutines remain after Reactor.Stop().

Reviewed by Cursor Bugbot for commit cc27e2a. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 15, 2026, 2:53 PM

@github-actions
Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 15, 2026, 2:53 PM

@masih
Copy link
Copy Markdown
Collaborator

masih commented May 15, 2026

Dropping this backport, given the utility malarky is not in release yet, and it takes too much time to sort it out.

@masih masih closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant