Skip to content

Conversation

@valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Nov 17, 2025

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure.

Here we start this process by rebuilding ChannelManager::decode_update_add_htlcs, forward_htlcs, and pending_intercepted_htlcs from the Channels, which will soon be included in the ChannelMonitors as part of #4218.

  • test upgrading from a manager containing pending HTLC forwards that was serialized on <= LDK 0.2, i.e. where Channels will not contain committed update_add_htlcs
  • currently, no tests fail when we force using the new rebuilt decode_update_add_htlcs map and ignoring the legacy maps. This may indicate missing test coverage, since in theory we need to re-forward these HTLCs sometimes so they go back in the forward_htlcs map for processing
  • only use the old legacy maps if the manager and its channels were last serialized on <= 0.2. Currently this is not guaranteed

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Nov 17, 2025

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 88.16794% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.33%. Comparing base (de384ff) to head (77d6dfa).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 86.92% 15 Missing and 5 partials ⚠️
lightning/src/ln/channel.rs 80.00% 9 Missing and 1 partial ⚠️
lightning/src/ln/reload_tests.rs 98.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4227      +/-   ##
==========================================
- Coverage   89.33%   89.33%   -0.01%     
==========================================
  Files         180      180              
  Lines      139042   139263     +221     
  Branches   139042   139263     +221     
==========================================
+ Hits       124219   124410     +191     
- Misses      12196    12222      +26     
- Partials     2627     2631       +4     
Flag Coverage Δ
fuzzing 34.95% <21.28%> (-1.02%) ⬇️
tests 88.70% <88.16%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@valentinewallace
Copy link
Contributor Author

valentinewallace commented Nov 19, 2025

@joostjager helped me realize this may be way overcomplicated, essentially all tests pass on main when we simply read-and-discard the pending forwards maps. It's a bit suspicious though that all tests pass so it seems like additional test coverage could be useful.

Nvm, our test coverage for reload of these maps is just pretty incomplete.

@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 0b4eb68 to ce2ccac Compare November 19, 2025 21:56
@valentinewallace
Copy link
Contributor Author

Updated with new testing and a few tweaks: diff

Will rebase next

@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from ce2ccac to 1e86521 Compare November 19, 2025 22:03
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review pass. Even though the change is pretty contained, I still found it difficult to fully follow what's happening.

Overall I think comments are very much on the light side in LDK, and the code areas touched in this PR are no exception in my opinion. Maybe, now that you've invested the time to build understanding, you can liberally sprinkle comments on your changes and nearby code?

payment_hash: PaymentHash(Sha256::hash(&[42; 32]).to_byte_array()),
cltv_expiry: 300000000,
state: InboundHTLCState::Committed,
state: InboundHTLCState::Committed { update_add_htlc_opt: None },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to never populate update_add_htlc_opt in the tests in this commit, if that isn't supposed to happen in prod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because that field is only used by the manager on restart. Hopefully it's clearer with docs are on the field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought maybe the copying of the update_add message into InboundHTLCState needs to be covered in this commit. It isn't currently. But coverage is probably obtained later on.

args_b_c.send_announcement_sigs = (true, true);
reconnect_nodes(args_b_c);

// Forward the HTLC and ensure we can claim it post-reload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen again without the changes in this PR? I assume the htlc wouldn't be forwarded here, but would it be failed back instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if we did the 3-line fix discussed offline? I don't think so, because the manager will only fail HTLCs that it knows about. I think we just wouldn't handle the HTLC and the channel would FC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, if we just discard the forward htlcs. I thought there was still some inbound htlc timer somewhere, but I guess not then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how fixing the FC by failing back the htlc and then completely forgetting about forwards on restart would compare to the current approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... seems like a significant UX downgrade for routing nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is it significant? Restart is probably rare on a routing node, and then also doing it right when a forward is being processed is even more rare?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still interested in your view on this, even though this PR is already pretty far along. Is it significant because the htlc will linger until close to timeout? I can see that that is undesirable, even though it might be acceptable.

Copy link
Contributor Author

@valentinewallace valentinewallace Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a brief glance I'm not convinced the code would be that much simpler this way. We'd have to decode htlc onions and queue them for failure during manager read. Feel free to get Matt's take but I'm not convinced on the tradeoff 🤔

Similar to the other /target directories we ignore where a bunch of files are
generated during testing.
@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 1e86521 to 26c6dc7 Compare December 2, 2025 00:51
@valentinewallace
Copy link
Contributor Author

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

@tnull
Copy link
Contributor

tnull commented Dec 2, 2025

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

We had discussed whether to split out receive_htlcs as part of this work, since you're already reconstructing the forwards_htlcs map. Are you still considering doing this as part of this PR or rather not?

@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 26c6dc7 to ad79155 Compare December 2, 2025 23:00
@valentinewallace valentinewallace marked this pull request as ready for review December 2, 2025 23:01
@valentinewallace valentinewallace requested review from joostjager and removed request for jkczyz December 2, 2025 23:03
@valentinewallace
Copy link
Contributor Author

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

We had discussed whether to split out receive_htlcs as part of this work, since you're already reconstructing the forwards_htlcs map. Are you still considering doing this as part of this PR or rather not?

Probably not in this PR since it's already decently sized and has a clear scope, but I will look into it for follow-up!

@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch 2 times, most recently from 6b919ce to 0b83e80 Compare December 3, 2025 01:19
@tnull
Copy link
Contributor

tnull commented Dec 3, 2025

I still need to finish adding some comments and responding to feedback, but I believe the tests are done and this should be mostly there so pushed up the latest state.

We had discussed whether to split out receive_htlcs as part of this work, since you're already reconstructing the forwards_htlcs map. Are you still considering doing this as part of this PR or rather not?

Probably not in this PR since it's already decently sized and has a clear scope, but I will look into it for follow-up!

SGTM!

@tnull
Copy link
Contributor

tnull commented Dec 3, 2025

Btw, feel free to ping me whenever you think this is ready for a secondary reviewer!

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new commit structure. Left a few remarks and questions, some probably due to a few pieces of understanding missing in my mental model.

payment_hash: PaymentHash(Sha256::hash(&[42; 32]).to_byte_array()),
cltv_expiry: 300000000,
state: InboundHTLCState::Committed,
state: InboundHTLCState::Committed { update_add_htlc_opt: None },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought maybe the copying of the update_add message into InboundHTLCState needs to be covered in this commit. It isn't currently. But coverage is probably obtained later on.

// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
// `Channel`, as part of removing the requirement to regularly persist the
// `ChannelManager`.
decode_update_add_htlcs.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be done in the next loop over the monitors where the update_add_htlcs are deduped? This loop seems to be about payments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, IMO the first loop is about repopulating HTLC maps from monitors, and the second loop is about pruning maps from monitors (hence having two loops so we don't prune before the maps are completely filled out), so this approach seems to fit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the first loop is about repopulating (outbound) payments. We probably mean the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But just for populating decode_update_add_htlcs, I don't think there is a two step processes need. I've been playing around a bit along this line. Not completely the same, but you get the idea.

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 299ff60ba..5fedf6fb9 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -17670,21 +17670,6 @@ where
 					let mut peer_state_lock = peer_state_mtx.lock().unwrap();
 					let peer_state = &mut *peer_state_lock;
 					is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id);
-					if let Some(chan) = peer_state.channel_by_id.get(channel_id) {
-						if let Some(funded_chan) = chan.as_funded() {
-							let inbound_committed_update_adds =
-								funded_chan.get_inbound_committed_update_adds();
-							if !inbound_committed_update_adds.is_empty() {
-								// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
-								// `Channel`, as part of removing the requirement to regularly persist the
-								// `ChannelManager`.
-								decode_update_add_htlcs.insert(
-									funded_chan.context.outbound_scid_alias(),
-									inbound_committed_update_adds,
-								);
-							}
-						}
-					}
 				}
 
 				if is_channel_closed {
@@ -17719,9 +17704,17 @@ where
 			for (channel_id, monitor) in args.channel_monitors.iter() {
 				let mut is_channel_closed = true;
 				let counterparty_node_id = monitor.get_counterparty_node_id();
+				let mut inbound_committed_update_adds = Vec::new();
 				if let Some(peer_state_mtx) = per_peer_state.get(&counterparty_node_id) {
 					let mut peer_state_lock = peer_state_mtx.lock().unwrap();
 					let peer_state = &mut *peer_state_lock;
+					if let Some(chan) = peer_state.channel_by_id.get(channel_id) {
+						if let Some(funded_chan) = chan.as_funded() {
+							inbound_committed_update_adds =
+								funded_chan.get_inbound_committed_update_adds();
+						}
+					}
+
 					is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id);
 				}
 
@@ -17746,12 +17739,9 @@ where
 								// still have an entry for this HTLC in `forward_htlcs`,
 								// `pending_intercepted_htlcs`, or `decode_update_add_htlcs`, we were apparently not
 								// persisted after the monitor was when forwarding the payment.
-								dedup_decode_update_add_htlcs(
-									&mut decode_update_add_htlcs,
-									&prev_hop_data,
-									"HTLC was forwarded to the closed channel",
-									&args.logger,
-								);
+								inbound_committed_update_adds.retain(|update_add| {
+									update_add.htlc_id != prev_hop_data.htlc_id
+								});
 								dedup_decode_update_add_htlcs(
 									&mut decode_update_add_htlcs_legacy,
 									&prev_hop_data,
@@ -17877,6 +17867,8 @@ where
 					}
 				}
 
+				decode_update_add_htlcs.insert(*channel_id, inbound_committed_update_adds);
+
 				// Whether the downstream channel was closed or not, try to re-apply any payment
 				// preimages from it which may be needed in upstream channels for forwarded
 				// payments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are tests passing for you with that diff? I tried moving the repopulating down to the lower loop and got test failures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try, but they also fail for me. Did a few more restructurings below that make the tests pass. But the restructure did lead to the realization that when the channel is closed, there are no inbound_committed_update_adds. And that there is no point in deduplicating them and adding to decode_update_add_htlcs? Am i overseeing something here?

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 299ff60ba..8a101e791 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -17670,21 +17670,6 @@ where
 					let mut peer_state_lock = peer_state_mtx.lock().unwrap();
 					let peer_state = &mut *peer_state_lock;
 					is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id);
-					if let Some(chan) = peer_state.channel_by_id.get(channel_id) {
-						if let Some(funded_chan) = chan.as_funded() {
-							let inbound_committed_update_adds =
-								funded_chan.get_inbound_committed_update_adds();
-							if !inbound_committed_update_adds.is_empty() {
-								// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
-								// `Channel`, as part of removing the requirement to regularly persist the
-								// `ChannelManager`.
-								decode_update_add_htlcs.insert(
-									funded_chan.context.outbound_scid_alias(),
-									inbound_committed_update_adds,
-								);
-							}
-						}
-					}
 				}
 
 				if is_channel_closed {
@@ -17717,12 +17702,21 @@ where
 				}
 			}
 			for (channel_id, monitor) in args.channel_monitors.iter() {
+				let mut chan_state = None;
 				let mut is_channel_closed = true;
 				let counterparty_node_id = monitor.get_counterparty_node_id();
 				if let Some(peer_state_mtx) = per_peer_state.get(&counterparty_node_id) {
 					let mut peer_state_lock = peer_state_mtx.lock().unwrap();
 					let peer_state = &mut *peer_state_lock;
-					is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id);
+					if let Some(chan) = peer_state.channel_by_id.get(channel_id) {
+						if let Some(funded_chan) = chan.as_funded() {
+							chan_state = Some((
+								funded_chan.context.outbound_scid_alias(),
+								funded_chan.get_inbound_committed_update_adds(),
+							));
+						}
+						is_channel_closed = false;
+					}
 				}
 
 				if is_channel_closed {
@@ -17746,12 +17740,12 @@ where
 								// still have an entry for this HTLC in `forward_htlcs`,
 								// `pending_intercepted_htlcs`, or `decode_update_add_htlcs`, we were apparently not
 								// persisted after the monitor was when forwarding the payment.
-								dedup_decode_update_add_htlcs(
-									&mut decode_update_add_htlcs,
-									&prev_hop_data,
-									"HTLC was forwarded to the closed channel",
-									&args.logger,
-								);
+
+								// No need to do this because chan_state is None?
+								//
+								// chan_state.1.retain(|update_add| {
+								// 	update_add.htlc_id != prev_hop_data.htlc_id
+								// });
 								dedup_decode_update_add_htlcs(
 									&mut decode_update_add_htlcs_legacy,
 									&prev_hop_data,
@@ -17875,6 +17869,12 @@ where
 							completion_action,
 						));
 					}
+
+					// Nothing to add because the channel is closed??
+					//
+					// if !chan_state.1.is_empty() {
+					// 	decode_update_add_htlcs.insert(chan_state.0, chan_state.1);
+					// }
 				}
 
 				// Whether the downstream channel was closed or not, try to re-apply any payment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some staring and print statements, what's missing from your thoughts above/our offline discussion is the case where we add an HTLC forward to decode_update_add_htlcs when processing the inbound edge channel, then remove it again while processing the (closed) outbound edge channel. So I believe it is important to aggregate them all at once in the first loop and then prune them in the second loop, to avoid failing to prune if we process the outbound edge first.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 0b83e80 to dfef41a Compare December 3, 2025 23:23
@valentinewallace
Copy link
Contributor Author

I think the sermver CI failure is spurious

@tnull
Copy link
Contributor

tnull commented Dec 4, 2025

I think the sermver CI failure is spurious

Fixed by #4260.

@joostjager
Copy link
Contributor

I've tried rebasing this PR on top of #4151 to see if that fixes some of the test failures: https://github.com/joostjager/rust-lightning/tree/reconstruct-mgr-fwd-htlcs-rebased

Unfortunately that is not the case. The same 44 failures as before, plus 2 extra being the tests added in this PR.

The new tests fail on the channel not being in the right state after deserialization of chan mgr. Probably unrelated to the tests, and a more general problem that blankets specific issues.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but overall I think this PR is ready for reviewer two.

// to `process_pending_htlc_forwards`. This is part of a larger effort to remove the requirement
// of regularly persisting the `ChannelManager`. However, this new pipeline cannot currently
// handle inbound HTLC receives, some of which may be present in `failed_htlcs`, so continue
// handling HTLCs present in `failed_htlcs` during deserialization for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the wording, it sounds like the code below isn't necessary yet, but a preparation for the future. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the below code is necessary until the new pipeline adds support for HTLC receives. Any suggestions on how to clarify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describing the future, then mentioning a new pipeline (future pipeline?) and its limitations, followed by code that surprisingly is needed today caused some confusion for me. Perhaps describe what goes wrong today if this code wouldn't be there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code and comment should go away once we support reconstructing claimables the way we (as of this PR) reconstruct forwards, so I just simplified it down a ton.

args_b_c.send_announcement_sigs = (true, true);
reconnect_nodes(args_b_c);

// Forward the HTLC and ensure we can claim it post-reload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still interested in your view on this, even though this PR is already pretty far along. Is it significant because the htlc will linger until close to timeout? I can see that that is undesirable, even though it might be acceptable.

// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
// `Channel`, as part of removing the requirement to regularly persist the
// `ChannelManager`.
decode_update_add_htlcs.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the first loop is about repopulating (outbound) payments. We probably mean the same thing.

// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
// `Channel`, as part of removing the requirement to regularly persist the
// `ChannelManager`.
decode_update_add_htlcs.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But just for populating decode_update_add_htlcs, I don't think there is a two step processes need. I've been playing around a bit along this line. Not completely the same, but you get the idea.

diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 299ff60ba..5fedf6fb9 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -17670,21 +17670,6 @@ where
 					let mut peer_state_lock = peer_state_mtx.lock().unwrap();
 					let peer_state = &mut *peer_state_lock;
 					is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id);
-					if let Some(chan) = peer_state.channel_by_id.get(channel_id) {
-						if let Some(funded_chan) = chan.as_funded() {
-							let inbound_committed_update_adds =
-								funded_chan.get_inbound_committed_update_adds();
-							if !inbound_committed_update_adds.is_empty() {
-								// Reconstruct `ChannelManager::decode_update_add_htlcs` from the serialized
-								// `Channel`, as part of removing the requirement to regularly persist the
-								// `ChannelManager`.
-								decode_update_add_htlcs.insert(
-									funded_chan.context.outbound_scid_alias(),
-									inbound_committed_update_adds,
-								);
-							}
-						}
-					}
 				}
 
 				if is_channel_closed {
@@ -17719,9 +17704,17 @@ where
 			for (channel_id, monitor) in args.channel_monitors.iter() {
 				let mut is_channel_closed = true;
 				let counterparty_node_id = monitor.get_counterparty_node_id();
+				let mut inbound_committed_update_adds = Vec::new();
 				if let Some(peer_state_mtx) = per_peer_state.get(&counterparty_node_id) {
 					let mut peer_state_lock = peer_state_mtx.lock().unwrap();
 					let peer_state = &mut *peer_state_lock;
+					if let Some(chan) = peer_state.channel_by_id.get(channel_id) {
+						if let Some(funded_chan) = chan.as_funded() {
+							inbound_committed_update_adds =
+								funded_chan.get_inbound_committed_update_adds();
+						}
+					}
+
 					is_channel_closed = !peer_state.channel_by_id.contains_key(channel_id);
 				}
 
@@ -17746,12 +17739,9 @@ where
 								// still have an entry for this HTLC in `forward_htlcs`,
 								// `pending_intercepted_htlcs`, or `decode_update_add_htlcs`, we were apparently not
 								// persisted after the monitor was when forwarding the payment.
-								dedup_decode_update_add_htlcs(
-									&mut decode_update_add_htlcs,
-									&prev_hop_data,
-									"HTLC was forwarded to the closed channel",
-									&args.logger,
-								);
+								inbound_committed_update_adds.retain(|update_add| {
+									update_add.htlc_id != prev_hop_data.htlc_id
+								});
 								dedup_decode_update_add_htlcs(
 									&mut decode_update_add_htlcs_legacy,
 									&prev_hop_data,
@@ -17877,6 +17867,8 @@ where
 					}
 				}
 
+				decode_update_add_htlcs.insert(*channel_id, inbound_committed_update_adds);
+
 				// Whether the downstream channel was closed or not, try to re-apply any payment
 				// preimages from it which may be needed in upstream channels for forwarded
 				// payments.

We have an overarching goal of (mostly) getting rid of ChannelManager
persistence and rebuilding the ChannelManager's state from existing
ChannelMonitors, due to issues when the two structs are out-of-sync on restart.
The main issue that can arise is channel force closure.

As part of this, we plan to store at least parts of Channels in
ChannelMonitors, and that Channel data will be used in rebuilding the manager.

Once we store update_adds in Channels, we can use them on restart when
reconstructing ChannelManager maps such as forward_htlcs and
pending_intercepted_htlcs. Upcoming commits will start doing this
reconstruction.
We have an overarching goal of (mostly) getting rid of ChannelManager
persistence and rebuilding the ChannelManager's state from existing
ChannelMonitors, due to issues when the two structs are out-of-sync on restart.
The main issue that can arise is channel force closure.

As part of rebuilding ChannelManager forward HTLCs maps, we will also add
a fix that will regenerate HTLCIntercepted events for HTLC intercepts that
are present but have no corresponding event in the queue. That fix will use
this new method.
We have an overarching goal of (mostly) getting rid of ChannelManager
persistence and rebuilding the ChannelManager's state from existing
ChannelMonitors, due to issues when the two structs are out-of-sync on restart.
The main issue that can arise is channel force closure.

We'll use this new util when reconstructing the
ChannelManager::decode_update_add_htlcs map from Channel data in upcoming
commits. While the Channel data is not included in the monitors yet, it will be
in future work.
We have an overarching goal of (mostly) getting rid of ChannelManager
persistence and rebuilding the ChannelManager's state from existing
ChannelMonitors, due to issues when the two structs are out-of-sync on restart.
The main issue that can arise is channel force closure.

Soon we'll be reconstructing these now-legacy maps from Channel data (that will
also be included in ChannelMonitors in future work), so rename them as part of
moving towards not needing to persist them in ChannelManager.
Makes an upcoming commit cleaner
We have an overarching goal of (mostly) getting rid of ChannelManager
persistence and rebuilding the ChannelManager's state from existing
ChannelMonitors, due to issues when the two structs are out-of-sync on restart.
The main issue that can arise is channel force closure.

Here we start this process by rebuilding
ChannelManager::decode_update_add_htlcs from the Channels, which will soon be
included in the ChannelMonitors as part of a different series of PRs.

The newly built map is not yet used but will be in the next commit.
We have an overarching goal of (mostly) getting rid of ChannelManager
persistence and rebuilding the ChannelManager's state from existing
ChannelMonitors, due to issues when the two structs are out-of-sync on restart.
The main issue that can arise is channel force closure.

Here we start this process by rebuilding
ChannelManager::decode_update_add_htlcs, forward_htlcs, and
pending_intercepted_htlcs from Channel data, which will soon be included in the
ChannelMonitors as part of a different series of PRs.

We also fix the reload_node test util to use the node's pre-reload config after
restart. The previous behavior was a bit surprising and led to one of this
commit's tests failing.
We have an overarching goal of (mostly) getting rid of ChannelManager
persistence and rebuilding the ChannelManager's state from existing
ChannelMonitors, due to issues when the two structs are out-of-sync on restart.
The main issue that can arise is channel force closure.

In the previous commit we started this process by rebuilding
ChannelManager::decode_update_add_htlcs, forward_htlcs, and
pending_intercepted_htlcs from the Channel data, which will soon be included in the
ChannelMonitors as part of a different series of PRs.

Here we test that HTLC forwards that were originally received on 0.2 can still
be successfully forwarded using the new reload + legacy handling code that will
be merged for 0.3.
@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from dfef41a to 77d6dfa Compare December 5, 2025 23:55
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the changes look good (though I might ofc lack context on the overall plan/next steps).

Just some minor comments on the tests.

lightning-invoice = { path = "../lightning-invoice", default-features = false }
lightning-macros = { path = "../lightning-macros" }
lightning = { path = "../lightning", features = ["_test_utils"] }
lightning_0_2 = { package = "lightning", git = "https://github.com/lightningdevkit/rust-lightning.git", branch = "0.2", features = ["_test_utils"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use the v0.2 release now, just as we do with the lightning_0_1 and lightning_0_0_125 deps?

#[test]
fn upgrade_pre_htlc_forward_onion_decode() {
do_upgrade_mid_htlc_forward(MidHtlcForwardCase::PreOnionDecode);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing whitespace between test cases.

// In 0.3, we started reconstructing the `ChannelManager`'s HTLC forwards maps from the HTLCs
// contained in `Channel`s, as part of removing the requirement to regularly persist the
// `ChannelManager`. However, HTLC forwards can only be reconstructed this way if they were
// received on 0.3 or higher. Test that HTLC forwards that were serialized on <=0.2 will still
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we don't actually test <=0.2, but just for 0.2. Given the comment in the first commit on pre-0.1 specifics of InboundHTLCResolution::Resolved, I do wonder if it's worth to also run this test based on lightning_0_0_125?

Copy link
Contributor

@elnosh elnosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reaching the limits of my knowledge of this part of the code. Took some time to build context on the problem and understand the change. Is this the problematic part https://github.com/valentinewallace/rust-lightning/blob/77d6dfa3c2170c768e01302081d3550920a836af/lightning/src/ln/channelmanager.rs#L16945-L17029 that we want to get rid of? So in the future that channel state will come from the monitors and avoid this out-of-sync issues.

As far as my understanding goes, I think the changes here look good.

Comment on lines +16822 to +16825
// If the HTLC corresponding to `prev_hop_data` is present in `decode_update_add_htlcs`, remove it
// from the map as it is already being stored and processed elsewhere.
fn dedup_decode_update_add_htlcs<L: Deref>(
decode_update_add_htlcs: &mut HashMap<u64, Vec<msgs::UpdateAddHTLC>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be used outside of the channel manager's read? If not, feels this could be a closure as it seems very specific to the filtering that needs to happen there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants