Skip to content

fix: bump ldk-node to v0.7.0-rc.33#498

Closed
jvsena42 wants to merge 3 commits intomasterfrom
fix/channel-monitor-stale-data-v2
Closed

fix: bump ldk-node to v0.7.0-rc.33#498
jvsena42 wants to merge 3 commits intomasterfrom
fix/channel-monitor-stale-data-v2

Conversation

@jvsena42
Copy link
Member

@jvsena42 jvsena42 commented Mar 18, 2026

Description

This PR bumps ldk-node from c5698d0 (post-rc.32) to v0.7.0-rc.33 which includes migration-aware write protection in apply_channel_data_migration().

The root cause: PR #462 introduced orphaned channel monitor recovery that re-fetches all RN monitors from the remote backup. The previous ldk-node version blindly wrote migration monitors to storage, overwriting existing ones — including monitors that had advanced far beyond the old RN version. This caused LDK to detect stale monitors and refuse to start, leaving users with a frozen balance.

v0.7.0-rc.33 compares update_id before writing and skips if the existing monitor is newer or equal, preventing the overwrite entirely at the Rust level.

Note: recovery for already-affected users still requires a separate ldk-node change (accept stale monitors + cooperative close). See PR #495 comments for details.

Linked Issues/Tasks

Related PRs:

Screenshot / Video

N/A - dependency bump

@ovitrif
Copy link
Collaborator

ovitrif commented Mar 18, 2026

@claude regarding this conclusion:

The root cause: PR #462 introduced orphaned channel monitor recovery that re-fetches all RN monitors from the remote backup. The previous ldk-node version blindly wrote migration monitors to storage, overwriting existing ones — including monitors that had advanced far beyond the old RN version. This caused LDK to detect stale monitors and refuse to start, leaving users with a frozen balance.

How do we simulate this scenario to test it?!

@ovitrif ovitrif self-requested a review March 18, 2026 12:20
@claude
Copy link

claude bot commented Mar 18, 2026

Claude finished @ovitrif's task in 1m 27s —— View job


How to Simulate the Stale Channel Monitor Scenario

  • Read MigrationsService.swift and WalletViewModel.swift to understand the migration flow
  • Review the channel monitor write protection logic
  • Formulate a step-by-step simulation guide

Background: What makes this happen

The bug requires three conditions to be true simultaneously:

  1. A migrated wallet — the user went through the RN→iOS migration, so monitors exist in both local LDK storage and the RN VSS remote backup
  2. Advanced local monitors — after migration, Lightning activity (payments, channel updates, sync) advanced the local monitor's update_id beyond the version stored in the RN backup
  3. Recovery check re-runsisChannelRecoveryChecked is false, so fetchOrphanedChannelMonitorsIfNeeded fetches the now-stale RN monitors and passes them to ldk-node during startup (WalletViewModel.swift:144)

Step-by-Step Simulation

Prerequisites:

  • Two builds: one with old ldk-node (pre-rc.33) and one with this PR's ldk-node (rc.33)
  • A regtest environment with the RN app to create an original LN channel
  • Or: an existing device that already went through RN→iOS migration

Option A — Controlled Regtest Setup (Recommended)

  1. Create a channel via RN app on regtest — open a channel with Blocktank, make at least one payment to advance the monitor state. The RN VSS backup now has a monitor at update_id = N.

  2. Run the iOS native app migration — go through the full migration flow. The monitor is written to local LDK storage at update_id = N.

  3. Advance the local monitor — sync the wallet or make another payment in the iOS app. LDK updates the monitor to update_id = N+K. The RN backup is not updated — it stays at N.

  4. Reset the recovery flag — in Xcode's debugger or via a debug toggle, reset isChannelRecoveryChecked to false:

    UserDefaults.standard.set(false, forKey: "rnChannelRecoveryChecked")

    This causes fetchOrphanedChannelMonitorsIfNeeded to run on the next node start (MigrationsService.swift:441).

  5. Kill and relaunch the app

  6. Observe the behavior:

    • Old ldk-node (pre-rc.33): The stale monitor at update_id = N is blindly written over the local monitor at N+K. On the next startup LDK detects the rollback and logs a stale monitor error, refusing to start → frozen balance.
    • New ldk-node (rc.33, this PR): apply_channel_data_migration() compares update_id, sees N < N+K, skips the write → node starts normally.

Option B — Already-Migrated Device (Shortcut)

If you have a device that already completed migration and has had Lightning activity since:

  1. Confirm isChannelRecoveryChecked is true in UserDefaults (it should be)
  2. Reset it to false as shown above
  3. Kill and relaunch the app
  4. Check logs for "Filtered X already-migrated monitors" (MigrationsService.swift:2136) — that's the Swift-side filter from the previous fix attempt (PR fix: prevent RN migration from overwriting local channel monitors #495)
  5. To reproduce the original bug, you'd also need to revert the localChannelIds filtering in fetchRNRemoteLdkData (lines 2113–2131) to bypass the Swift-side guard and let the stale monitor reach ldk-node

What to Check in Logs

When the scenario is triggered, look for these log entries:

Log message Meaning
"Running pre-startup channel monitor recovery check" Recovery path entered
"Found N monitors on RN backup for pre-startup recovery" Stale monitors fetched from VSS
"Filtered X already-migrated monitors, Y orphaned remaining" Swift-side filter active (PR #495 safeguard)
LDK error: "stale channel monitor" / node fails to start Bug reproduced (old ldk-node)
Node starts normally Fix confirmed (new ldk-node rc.33)

Key Code Paths

@jvsena42
Copy link
Member Author

Replaced by #502

@jvsena42 jvsena42 closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants