fix: prevent RN migration from overwriting local channel monitors by jvsena42 · Pull Request #495 · synonymdev/bitkit-ios

jvsena42 · 2026-03-17T16:57:41Z

Description

This PR prevents old RN channel monitors from overwriting newer local ones during orphaned channel recovery, which caused LDK to refuse to start and display stale balances.

The root cause: PR #462 introduced orphaned channel monitor recovery that re-fetches all RN monitors from the remote backup. LDK's setChannelDataMigration() blindly writes migration monitors to storage, overwriting existing ones -- including monitors that have advanced far beyond the old RN version. PR #480 fixed the channel manager overwrite but not the monitor overwrite.

The fix adds a filtering step: before applying RN migration monitors, the app checks which channels already have local monitors and skips those. This prevents stale RN monitors from replacing up-to-date local ones.

Note: recovery for already-affected users requires ldk-node changes (migration-aware writes or force-close of stale channels). This PR only addresses prevention.

Logs from the bug

bitkit_logs_2026-03-17_14-16-30.zip

Linked Issues/Tasks

Related PRs:

fix: recover orphaned channel monitors from RN migration #462 (introduced orphaned channel recovery)
fix: cleanup unpaid order transfers from migration #466 (cleanup unpaid order transfers)
fix(lightning): fix orphaned channels migration #480 (fix: don't overwrite channel manager during recovery)

Screenshot / Video

N/A - backend logic change

jvsena42 · 2026-03-17T17:15:39Z

Required ldk-node change

This PR prevents the bug on the Swift side, but ldk-node also needs a fix to prevent setChannelDataMigration() from blindly overwriting existing monitors.

Recommended: Migration-aware write in `builder.rs`

In the migration loop (builder.rs:1627-1658), check if a monitor already exists before writing:

// In the migration loop
for monitor_data in &migration.channel_monitors {
    // ... deserialize to get monitor_key ...

    // Check if monitor already exists in store
    let existing = kv_store.read(
        CHANNEL_MONITOR_PERSISTENCE_PRIMARY_NAMESPACE,
        CHANNEL_MONITOR_PERSISTENCE_SECONDARY_NAMESPACE,
        &monitor_key,
    );
    if existing.is_ok() {
        log_info!(logger, "Skipping migration monitor {} — already exists in store", monitor_key);
        continue;
    }

    // Only write if not present
    kv_store.write(
        CHANNEL_MONITOR_PERSISTENCE_PRIMARY_NAMESPACE,
        CHANNEL_MONITOR_PERSISTENCE_SECONDARY_NAMESPACE,
        &monitor_key,
        &monitor_data,
    )?;
}

This is the safest option — it prevents the overwrite entirely at the ldk-node level, regardless of what the app passes.

For already-affected users: `clearChannelData()` on Builder

Add a method that deletes the channel manager and all channel monitors from the KVStore (including VSS) before node build:

pub fn clear_channel_data(&self) -> Result<(), BuildError> {
    // 1. Create the KVStore (same as build_with_vss_store)
    // 2. Remove channel_manager key
    // 3. List and remove all channel_monitors
    // 4. App can then call build() to start fresh
}

Tradeoff: All Lightning channels are lost. Counterparty force-closes, funds claimable on-chain after timelock. But this unblocks users whose node currently refuses to start due to stale monitors.

Context

The root cause: setChannelDataMigration() uses KVStore::write() which unconditionally overwrites any existing data at the same key. When old RN monitors are passed as migration data for channels that already have up-to-date monitors in VSS, the current monitors get overwritten with stale versions. LDK then detects the mismatch (update_id 10 vs update_id 120) and refuses to start.

jvsena42 · 2026-03-17T17:24:35Z

Safest recovery path for already-affected users

The safest approach is to accept stale monitors and cooperative-close the affected channels, rather than wiping data or force-closing with revoked commitments.

Why this is the safest option

Option	Funds at risk?	Other channels?	Mechanism
Wipe all channel data	Funds may go unclaimed (no monitor watching for force-close)	All lost	Counterparty eventually force-closes
Force-close with stale monitor	ALL funds at risk — broadcasting a revoked commitment (update_id 10 has been superseded 110 times, counterparty holds revocation keys)	Preserved	Penalty transaction
Accept stale + cooperative close	No	Preserved	Cooperative close uses agreed-upon final balances

How it works

Node starts with a stale monitor (warn instead of refusing)
Immediately initiates cooperative close of the stale channel
Both parties agree on correct final balances → closing tx → funds returned safely
The only risk window is between startup and cooperative close: if the counterparty broadcasts a revoked state between updates 10–120, our stale monitor can't punish them. But the counterparty is Blocktank (our LSP) — no incentive to cheat.

Required changes

rust-lightning (ChannelManager deserialization):

The stale check and DangerousValue error lives in channelmanager.rs:~17355. Needs a flag to warn-and-continue instead of refusing to start:

// Instead of returning Err(DecodeError::DangerousValue) when monitor is stale:
if args.accept_stale_channel_monitors {
    log_warn!(logger,
        "Accepting stale monitor for channel {} (monitor update_id {} vs manager update_id {}). \
         Cooperative close recommended.",
        channel_id, monitor_update_id, manager_update_id
    );
    stale_channels.push(channel_id);
    // Continue deserialization instead of returning error
} else {
    return Err(DecodeError::DangerousValue); // existing behavior
}

ldk-node (builder.rs):

Expose the flag via the builder API:

pub fn set_accept_stale_channel_monitors(&mut self, accept: bool) -> &mut Self {
    self.accept_stale_monitors = accept;
    self
}

After the node starts, auto-close stale channels:

for channel_id in stale_channels {
    node.close_channel(&channel_id, counterparty_node_id)?;
}

Swift side (after ldk-node change):

On the first startup failure due to stale monitors, retry with the flag enabled:

// In LightningService.setup() or WalletViewModel.start()
// If node fails with stale monitor error, rebuild with:
builder.setAcceptStaleChannelMonitors(true)
// Node starts → stale channels are auto-closed cooperatively

Combine with migration-aware write (prevention)

For defense-in-depth, also add the migration-aware write from the previous comment to setChannelDataMigration() in builder.rs — skip writing monitors that already exist in the KVStore. This prevents the bug at the ldk-node level regardless of what the app passes.

jvsena42 · 2026-03-17T17:34:15Z

Impact on displayed balance

The balance values (totalBalanceSats, totalOnchainSats, totalLightningSats, etc.) are all @AppStorage (UserDefaults-backed) properties in WalletViewModel. They persist across app launches and are never reset to 0 unless explicitly overwritten.

When the node fails to start due to the stale monitor:

updateBalanceState() fails because lightningService.refreshCache() / balanceManager.deriveBalanceState() require a running node
The error is caught and none of the balance properties are updated
The @AppStorage values retain whatever was last written from the previous successful session (before the update to build 182)

Result: The user sees a plausible-looking but completely frozen balance — the last known values from before the app update. It's not zero, so it doesn't immediately look broken. But:

No new payments are reflected (incoming or outgoing)
Both on-chain and Lightning balances are frozen (even though on-chain could theoretically refresh independently, deriveBalanceState() fails entirely when the node isn't running)
Send/receive operations fail because the node isn't started
The home screen, savings view, and spending view all display stale data

fix: skip RN monitors for channels that already exist locally

86be5df

jvsena42 self-assigned this Mar 17, 2026

jvsena42 marked this pull request as ready for review March 17, 2026 17:29

jvsena42 marked this pull request as draft March 17, 2026 17:29

Merge branch 'master' into fix/channel-monitor-stale-data

cdf4303

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent RN migration from overwriting local channel monitors#495

fix: prevent RN migration from overwriting local channel monitors#495
jvsena42 wants to merge 2 commits intomasterfrom
fix/channel-monitor-stale-data

jvsena42 commented Mar 17, 2026 •

edited

Loading

Uh oh!

jvsena42 commented Mar 17, 2026

Uh oh!

jvsena42 commented Mar 17, 2026 •

edited

Loading

Uh oh!

jvsena42 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jvsena42 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Linked Issues/Tasks

Screenshot / Video

Uh oh!

jvsena42 commented Mar 17, 2026

Required ldk-node change

Recommended: Migration-aware write in builder.rs

For already-affected users: clearChannelData() on Builder

Context

Uh oh!

jvsena42 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Safest recovery path for already-affected users

Why this is the safest option

How it works

Required changes

Combine with migration-aware write (prevention)

Uh oh!

jvsena42 commented Mar 17, 2026

Impact on displayed balance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jvsena42 commented Mar 17, 2026 •

edited

Loading

Recommended: Migration-aware write in `builder.rs`

For already-affected users: `clearChannelData()` on Builder

jvsena42 commented Mar 17, 2026 •

edited

Loading