Backport release/v6.5: Wait for blocksync goroutines on Stop to fix leveldb shutdown panic#3442
Backport release/v6.5: Wait for blocksync goroutines on Stop to fix leveldb shutdown panic#3442seidroid[bot] wants to merge 1 commit into
release/v6.5: Wait for blocksync goroutines on Stop to fix leveldb shutdown panic#3442Conversation
…3415) Reactor.OnStart and BlockPool.OnStart started their long-running goroutines (requestRoutine, poolRoutine, processBlockSyncCh, processPeerUpdates, makeRequestersRoutine) with raw `go fn(ctx)` using the outer context. They were therefore not registered with the BaseService WaitGroup, and Stop() never waited for them. The outer ctx also outlived Stop, so the goroutines kept running after Stop returned. During node shutdown this raced nodeImpl.OnStop's blockStore.Close(): poolRoutine, still inside SaveBlock -> Base() -> bs.db.Iterator, observed its leveldb table reader released and panicked with "leveldb/table: reader released". Route each goroutine through BaseService.Spawn so it is tracked by the WaitGroup and bound to inner.ctx. Stop() now cancels them and blocks until they exit, which happens before the node closes the BlockStore DB. Add a regression test that asserts no blocksync goroutines remain after Reactor.Stop() returns. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes blocksync/consensus goroutine lifecycles and shutdown ordering; mistakes could cause hangs or missed transitions, but the change is localized and covered by a new regression test. > > **Overview** > Fixes blocksync shutdown races by moving long-running goroutines off raw `go` launches and onto `BaseService.Spawn`/`SpawnCritical`, ensuring `Stop()` cancels the correct context and waits for all blocksync routines to exit before the block store is closed. > > Adds readiness gates (`blocksyncReady`, `consensusReady`) so routines can be pre-spawned in `Reactor.OnStart` yet only begin work when block sync starts or the consensus handoff completes, and updates `BlockPool`/`bpRequester` shutdown to avoid blocking on a full `requestsCh`. > > Updates the consensus handoff API (`SwitchToConsensus` signature) and adds a regression test (`TestReactor_OnStopWaitsForGoroutines`) that asserts no `internal/blocksync` goroutines remain after `Reactor.Stop()` returns. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 5479315. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> (cherry picked from commit a27b9d6)
PR SummaryMedium Risk Overview Blocksync routines (
Reviewed by Cursor Bugbot for commit cc27e2a. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
|
Dropping this backport, given the utility malarky is not in release yet, and it takes too much time to sort it out. |
Backport of #3415 to
release/v6.5.