Skip to content

fix: prevent RN migration from overwriting local channel monitors#495

Draft
jvsena42 wants to merge 2 commits intomasterfrom
fix/channel-monitor-stale-data
Draft

fix: prevent RN migration from overwriting local channel monitors#495
jvsena42 wants to merge 2 commits intomasterfrom
fix/channel-monitor-stale-data

Conversation

@jvsena42
Copy link
Member

@jvsena42 jvsena42 commented Mar 17, 2026

Description

This PR prevents old RN channel monitors from overwriting newer local ones during orphaned channel recovery, which caused LDK to refuse to start and display stale balances.

The root cause: PR #462 introduced orphaned channel monitor recovery that re-fetches all RN monitors from the remote backup. LDK's setChannelDataMigration() blindly writes migration monitors to storage, overwriting existing ones -- including monitors that have advanced far beyond the old RN version. PR #480 fixed the channel manager overwrite but not the monitor overwrite.

The fix adds a filtering step: before applying RN migration monitors, the app checks which channels already have local monitors and skips those. This prevents stale RN monitors from replacing up-to-date local ones.

Note: recovery for already-affected users requires ldk-node changes (migration-aware writes or force-close of stale channels). This PR only addresses prevention.

Logs from the bug

bitkit_logs_2026-03-17_14-16-30.zip

Linked Issues/Tasks

Related PRs:

Screenshot / Video

N/A - backend logic change

@jvsena42 jvsena42 self-assigned this Mar 17, 2026
@jvsena42
Copy link
Member Author

Required ldk-node change

This PR prevents the bug on the Swift side, but ldk-node also needs a fix to prevent setChannelDataMigration() from blindly overwriting existing monitors.

Recommended: Migration-aware write in builder.rs

In the migration loop (builder.rs:1627-1658), check if a monitor already exists before writing:

// In the migration loop
for monitor_data in &migration.channel_monitors {
    // ... deserialize to get monitor_key ...

    // Check if monitor already exists in store
    let existing = kv_store.read(
        CHANNEL_MONITOR_PERSISTENCE_PRIMARY_NAMESPACE,
        CHANNEL_MONITOR_PERSISTENCE_SECONDARY_NAMESPACE,
        &monitor_key,
    );
    if existing.is_ok() {
        log_info!(logger, "Skipping migration monitor {} — already exists in store", monitor_key);
        continue;
    }

    // Only write if not present
    kv_store.write(
        CHANNEL_MONITOR_PERSISTENCE_PRIMARY_NAMESPACE,
        CHANNEL_MONITOR_PERSISTENCE_SECONDARY_NAMESPACE,
        &monitor_key,
        &monitor_data,
    )?;
}

This is the safest option — it prevents the overwrite entirely at the ldk-node level, regardless of what the app passes.

For already-affected users: clearChannelData() on Builder

Add a method that deletes the channel manager and all channel monitors from the KVStore (including VSS) before node build:

pub fn clear_channel_data(&self) -> Result<(), BuildError> {
    // 1. Create the KVStore (same as build_with_vss_store)
    // 2. Remove channel_manager key
    // 3. List and remove all channel_monitors
    // 4. App can then call build() to start fresh
}

Tradeoff: All Lightning channels are lost. Counterparty force-closes, funds claimable on-chain after timelock. But this unblocks users whose node currently refuses to start due to stale monitors.

Context

The root cause: setChannelDataMigration() uses KVStore::write() which unconditionally overwrites any existing data at the same key. When old RN monitors are passed as migration data for channels that already have up-to-date monitors in VSS, the current monitors get overwritten with stale versions. LDK then detects the mismatch (update_id 10 vs update_id 120) and refuses to start.

@jvsena42
Copy link
Member Author

jvsena42 commented Mar 17, 2026

Safest recovery path for already-affected users

The safest approach is to accept stale monitors and cooperative-close the affected channels, rather than wiping data or force-closing with revoked commitments.

Why this is the safest option

Option Funds at risk? Other channels? Mechanism
Wipe all channel data Funds may go unclaimed (no monitor watching for force-close) All lost Counterparty eventually force-closes
Force-close with stale monitor ALL funds at risk — broadcasting a revoked commitment (update_id 10 has been superseded 110 times, counterparty holds revocation keys) Preserved Penalty transaction
Accept stale + cooperative close No Preserved Cooperative close uses agreed-upon final balances

How it works

  1. Node starts with a stale monitor (warn instead of refusing)
  2. Immediately initiates cooperative close of the stale channel
  3. Both parties agree on correct final balances → closing tx → funds returned safely
  4. The only risk window is between startup and cooperative close: if the counterparty broadcasts a revoked state between updates 10–120, our stale monitor can't punish them. But the counterparty is Blocktank (our LSP) — no incentive to cheat.

Required changes

rust-lightning (ChannelManager deserialization):

The stale check and DangerousValue error lives in channelmanager.rs:~17355. Needs a flag to warn-and-continue instead of refusing to start:

// Instead of returning Err(DecodeError::DangerousValue) when monitor is stale:
if args.accept_stale_channel_monitors {
    log_warn!(logger,
        "Accepting stale monitor for channel {} (monitor update_id {} vs manager update_id {}). \
         Cooperative close recommended.",
        channel_id, monitor_update_id, manager_update_id
    );
    stale_channels.push(channel_id);
    // Continue deserialization instead of returning error
} else {
    return Err(DecodeError::DangerousValue); // existing behavior
}

ldk-node (builder.rs):

Expose the flag via the builder API:

pub fn set_accept_stale_channel_monitors(&mut self, accept: bool) -> &mut Self {
    self.accept_stale_monitors = accept;
    self
}

After the node starts, auto-close stale channels:

for channel_id in stale_channels {
    node.close_channel(&channel_id, counterparty_node_id)?;
}

Swift side (after ldk-node change):

On the first startup failure due to stale monitors, retry with the flag enabled:

// In LightningService.setup() or WalletViewModel.start()
// If node fails with stale monitor error, rebuild with:
builder.setAcceptStaleChannelMonitors(true)
// Node starts → stale channels are auto-closed cooperatively

Combine with migration-aware write (prevention)

For defense-in-depth, also add the migration-aware write from the previous comment to setChannelDataMigration() in builder.rs — skip writing monitors that already exist in the KVStore. This prevents the bug at the ldk-node level regardless of what the app passes.

@jvsena42 jvsena42 marked this pull request as ready for review March 17, 2026 17:29
@jvsena42 jvsena42 marked this pull request as draft March 17, 2026 17:29
@jvsena42
Copy link
Member Author

Impact on displayed balance

The balance values (totalBalanceSats, totalOnchainSats, totalLightningSats, etc.) are all @AppStorage (UserDefaults-backed) properties in WalletViewModel. They persist across app launches and are never reset to 0 unless explicitly overwritten.

When the node fails to start due to the stale monitor:

  1. updateBalanceState() fails because lightningService.refreshCache() / balanceManager.deriveBalanceState() require a running node
  2. The error is caught and none of the balance properties are updated
  3. The @AppStorage values retain whatever was last written from the previous successful session (before the update to build 182)

Result: The user sees a plausible-looking but completely frozen balance — the last known values from before the app update. It's not zero, so it doesn't immediately look broken. But:

  • No new payments are reflected (incoming or outgoing)
  • Both on-chain and Lightning balances are frozen (even though on-chain could theoretically refresh independently, deriveBalanceState() fails entirely when the node isn't running)
  • Send/receive operations fail because the node isn't started
  • The home screen, savings view, and spending view all display stale data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant