Set Dont_Forward bit in Channel Update Message by ziggie1984 · Pull Request #7759 · lightningnetwork/lnd

ziggie1984 · 2023-06-12T10:36:55Z

Change Description

According to the Specification we must set the dont_forward bit when we send a channelupdate msg to the peer.
With this change we make sure that our local channel updates for unannounced channels have this bit set.
Moreover we are performing a migration on the fly for all old unannounced channels as soon as the channel policy is updated.

We currently do not use this flag for our own decisions when forwarding a channel_update to the broader network (we use the AuthProof instead but we have to introduce this change to be spec. compliant.

Currently no new tests where added. Tests in the gossiper_test.go are not affected by this change because the bit is set on higher levels. But open for discussions regarding more tests.

Steps to Test

Steps for reviewers to follow to test the change.

Pull Request Checklist

Testing

Your PR passes all CI checks.
Tests covering the positive and negative (error paths) are included.
Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

The change obeys the Code Documentation and Commenting guidelines, and lines wrap at 80.
Commits follow the Ideal Git Commit Structure.
Any new logging statements use an appropriate subsystem and logging level.
There is a change description in the release notes, or [skip ci] in the commit message for small changes.

📝 Please see our Contribution Guidelines for further guidance.

This change is

hieblmi

Hey @ziggie1984 :-), thanks for picking up this feature! LGTM, just a few nits and questions.

Also re the test, I think in general it would be great to get new features covered by a test. Only in case it makes sense here, could you maybe add a check in a private channel test or a scid-alias test to confirm the dont_forward bit is set/unset in a public channel?

ziggie1984 · 2023-06-13T07:25:04Z

@hieblmi thanks for taking a look, incorporated your comments. Agree with the tests, actually the failing unit-tests and itests forced me to update some test procedures to keep them running through and by this we test this new behaviour as well.

hieblmi

Great work @ziggie1984, thanks for the quick turnaround. Only one minor nit.

ProofOfKeags

Comments left in-line

ProofOfKeags · 2023-06-16T00:25:24Z

is there a a reason that this isn't:

Suggested change

announcedChan := completeChan.ChannelFlags&lnwire.FFAnnounceChannel != 0

if peerAlias != nil {

announcedChan = false

}

announcedChan := completeChan.ChannelFlags&lnwire.FFAnnounceChannel != 0 && peerAlias == nil

it seems like this more clearly captures the semantics.

Was doing it because of the readability of the code, the 80 character limit would lead to something like this:

announcedChan := completeChan.ChannelFlags& lnwire.FFAnnounceChannel != 0 && peerAlias == nil

But ok to change if that's better?

the fewer times we reassign a variable the better imo. If you break it up put the second boolean condition on a new line rather than breaking the mask operation across lines.

ProofOfKeags · 2023-06-16T00:32:43Z

This seems to guarantee that if we don't have an AuthProof that we will set the ChanUpdateDontForward bit, but does not specify in any way what should happen if the AuthProof is present. Do we always want to leave it in whatever state it is currently in, or do we want to unset that bit if the AuthProof is present? Secondly, is the AuthProof always absent when the channel is unannounced? If so, is this invariant captured anywhere? It seems unintuitive to me that those two ideas are so tightly linked. However, if we can guarantee that that invariant will hold, then this seems like a safe enough way to do this.

Good Catch, I need to add a testcase somewhere where we switch from unannounced to announced (zero-conf public channel). Because the current code would just have added the same edgeinfo as before including the msg flag (retrieving it from the db).

Lnd makes sure that we only have a valid proof specified when the channel was announced to the network so I think it is even stricter than just looking at the dont_forward bit.

Yeah judging by some of the other parts of this PR, it seems that the AuthProof is a necessary but not sufficient condition for forwarding. We must also have an absence of the dont_forward bit. I think we need to mentally decouple the idea of the presence of the AuthProof and the decision to forward. I do not believe that zero-conf public channels will be the last time we see these ideas differ from one another.

ziggie1984 · 2023-06-21T13:17:53Z

So I reevaluated the situation and fixed the problem which @ProofOfKeags pointed out (whether we need to set the bit in case we have a channel proof). There is a situation where a prior private channel can be switched to public and this is a zeroconf public channel, which is private before 6 confs and later published to the network. In this case the prior codebase made sure we remove the old entry in the channel graph and add it under the real ShortChannel ID. But we did just use the old EdgePolicyInfo so that would have been a bug. Now I implemented it in a way where the MessageFlags are not overwritten.

However still some things I am indecisive about:

I added a Log-Warning message in case we receive a channelupdate msg with the dont_forward bit set although this channel has a AuthProof. So question do we need this?
I added a unit-test related to the public zeroconf channel and basically test that we do not overwrite the MessageFlags. Do you think this is worth it? I also contemplated adding an itest where we check for the MessageFlags in the ChannelUpdate message via the SubscribeToGraph RPC call. There I would have to add the MessageFlags into the EdgeInfoAnnouncement to compare it. I found it not worth it but happy to discuss it.

With that said looking forward to your final conclusion.

yyforyongyu

Thanks for the PR🙏

Mostly questions, also have a question - do we use dont_forward bit anywhere else other than private channels?

yyforyongyu · 2023-09-26T09:32:59Z

 	shortChanID *lnwire.ShortChannelID,
 	peerAlias *lnwire.ShortChannelID,
-	ourPolicy *channeldb.ChannelEdgePolicy) error {
+	ourPolicy *channeldb.ChannelEdgePolicy, shouldAnnounce bool) error {


think we can remove this new param and do the check inside this method?

shouldAnnounce := completeChan.ChannelFlags&lnwire.FFAnnounceChannel == 1 && completeChan.NumConfsRequired >= 6

TIL that the network requires 6 confs and doesn't simply allow it to be accepted by the acceptor's minimum_depth 🤔

yes 6 confs are mandatory according to the spec before announcing to the network: https://github.com/lightning/bolts/blob/4dcc377209509b13cf89a4b91fde7d478f5b46d8/07-routing-gossip.md?plain=1#L86-L89

I pulled the check out for clarity but also ok with putting it back into the addToRouterGraph function (see #7759 (comment)), but also open to change it back.

yyforyongyu · 2023-09-26T09:33:19Z

+	// can be usable before six confirmations but we still keep the channel
+	// unannounced even for public channels until six confirmations are
+	// reached.
+	shouldAnnounce := channel.ChannelFlags&lnwire.FFAnnounceChannel == 1 &&


can we do this check inside addToRouterGraph?

see comment above.

yyforyongyu · 2023-09-26T10:22:53Z

 	// max_htlc field.
 	msgFlags := lnwire.ChanUpdateRequiredMaxHtlc

+	if !shouldAnnounce {


actually even further I'm wondering if we could use ourPolicy.MessageFlags here, but this won't be feasible unless we also modify storedFwdingPolicy, maybe a future refactor PR.

hmm not sure, so the ourPolicy flag is only not nil, it the channel is a ZeroConf channel to make sure we persist the policies which were present before the 6 confirmations. And because the msg flag bits might change see comment below, we cannot use the ourPolicy variable ?

// We are not reusing the MessageFlags because a channel might // change from unannounced to announced during its lifetime.

yyforyongyu · 2023-09-26T10:41:31Z

-		})
+		// Only add the ChanUpdate to the gossiper if the
+		// __dont_forward__ bit signals it.
+		if !e1Ann.MessageFlags.HasDontForward() {


I'm a bit confused about this change - so I guess when we must be the participant of the channel when receiving this channel_announcement? Otherwise it means we received channel_update which shouldn't be forwarded in the first place since they have dont_forward bit set?

Let me see if I can make sense of this. This happens in the scenario where we are processing a rejected edge. The only time we add an edge to our graph that we didn't validate the AuthProof for is if we are one of the parties on the channel. I'm speculating because I'm still new to this segment of the codebase but it seems that this only happens when we add the edge prior to announcement_signatures and then possibly process the remote's channel_update prior to a valid channel_announcement which contains the proof. So in this scenario it actually stands to reason that this bit should always be set here, right?

Good questions!
So as it is coded currently, both lnd peers will readd the channel after 6 confirmations are reached with the _dont_forward_bit unset here: https://github.com/lightningnetwork/lnd/pull/7759/files#diff-bbccb27bb17be513ad84c77ad8e37a4d76168956f7978309aab29a0cbfdf91acR3487 (for Scid-channels it works a bit different but the channel is readded as well => hence a new ChanUpdate is sent (directly to the peer) so we anticipate the dont_forward bit not set in the above case. However we can not be sure whether the peer ratelimited us or something and the ChanUpdate did not make it into its graph. So thats why I integrated this check to make sure we follow the network rules and don't gossip a ChanUpdate with the bit set, potentially getting punished by other peers for not sticking to the rules?

yyforyongyu · 2023-09-26T10:42:26Z

-
+		// Only add the ChanUpdate to the gossiper if the
+		// __dont_forward__ bit signals it.
+		if !e2Ann.MessageFlags.HasDontForward() {


Q: do e2 and e1 always have the same message flags?

I believe in this case since we are processing a rejected edge, the only case where this happens is one where these flags are set, or even if they aren't, they ought to be.

We don't know for sure, depending on how strict the peer rate-limits us, but in the best case scenario we should have the same message bits set here. I think it makes sense to check for the bit nonetheless just in case we somehow did not update the Graph-Data and do not broadcast a ChanUpdate which was never met to be broadcasted.

This code seems fine as-is, but can you elaborate -- what does the peer rate-limiting have to do with this?

So I was referring to the rate limiting because we consider a ChanUpdate as stale if it has not a greater timestamp than the prior one. (even if the content differ). This had effects in the itests where blocks were mined quickly and the time between block 3 and block 6 was lower than 1 sec, and when we or our peer announced the channel after 6 blocks it had happened that the old ChanUpdate from our Peer was still in our db. That should never happen in the real world but thats why this check is also important here (at least for the itests).

yyforyongyu · 2023-09-26T10:45:54Z

@@ -2271,6 +2282,16 @@ func IsKeepAliveUpdate(update *lnwire.ChannelUpdate,
 	if update.MessageFlags.HasMaxHtlc() && !prev.MessageFlags.HasMaxHtlc() {


looks like this line can be removed

Sure also to remain backwards compatible ? There might be old nodes out there which have not updated to the latest release where the MaxHTLC flag became mandatory so we don't want to consider those ChanUpdates as keepalive msgs ?

yyforyongyu · 2023-09-26T10:52:32Z

+			log.Debugf("Update edge for short_chan_id(%v) got: %v",
+				shortChanID, err)
+		} else if strings.Contains(err.Error(),
+			channeldb.ErrEdgeNotFound.Error()) {


I don't think we can do this here as it creates a possible DoS since it bypasses the isRecentlyRejectedMsg check.

You are right this is something I still not figured out (referring to the race condition here), I don't think its a DoS vector tho because the case where we do not find the corresponding edge is already covered here:

lnd/discovery/gossiper.go

Line 2768 in 14a6f53

case channeldb.ErrEdgeNotFound:

The problem why we need to check for this case here, is that somehow for Alias channels we remove and add the edge after six confirmations and we might run into the case where we are receiving the new ChanUpdate (with the dont_foward bit unset) and also removing the edge leading to the case where the itests break. I guess I need to make sure that this call :

lnd/funding/manager.go

Line 3474 in 14a6f53

ourPolicy, err := f.cfg.DeleteAliasEdge(baseScid)

and the call here:

lnd/discovery/gossiper.go

Line 2941 in 14a6f53

if err := d.cfg.Router.UpdateEdge(update, ops...); err != nil {

do not interfere.

This should be a separate commit with step-by-step info in the commit body. One TODO was to make the deletion + addition of edges atomic. That could be a separate PR that lands before this one.

sounds good, I separated this commit and added a detailed commit body. I also included a TODO so that depending on whether this separate PR which makes the deletion+addition atomic lands before of after this change we can just delete it again.
However I am also in favour of adding a separate PR before this one and removing this commit altogether because I don't think we need it if its atomic. I will create an issue or a PR directly.

I don't think this PR should land until atomic deletion and addition is done, just to keep things simple and not have band-aids in our code

yyforyongyu · 2023-09-26T11:03:18Z

+	// Make sure Alice which is not a direct peer will see this channnel
+	// as well after it is confirmed. A public zeroconf channel is only
+	// announced to the broader network after six confirmations.
+	ht.AssertTopologyChannelOpen(ht.Alice, fundingPoint3)


don't think this change is necessary - if we wanna test relaying gossip we should test it in graph tests.

Ok will remove

yyforyongyu · 2023-09-26T11:04:56Z

+	// after 6 confirmations are reached is not considered stale. This
+	// ChanUpdate is important because it unsets the __dont_forward__ bit
+	// signaling the broadcasting to the network.
+	time.Sleep(time.Second)


I don't think we need to sleep here? If the channel updates have diffs they shouldn't be considered as stale?

So the problem which manifests in the itests lies here:

lnd/discovery/gossiper.go

Lines 2721 to 2732 in 14a6f53

if d.cfg.Router.IsStaleEdgePolicy(

graphScid, timestamp, upd.ChannelFlags,

) {

log.Debugf("Ignored stale edge policy for short_chan_id(%v): "+

"peer=%v, msg=%s, is_remote=%v", shortChanID,

nMsg.peer, nMsg.msg.MsgType(), nMsg.isRemote,

)

nMsg.err <- nil

return nil, true

}

our timestamp granularity is 1 second, now I introduced the Change where we send a ChanUpdate after 3 blocks(if default value is active) but also after 6 blocks with the dont_forward bit unset, because in the itests blocks are mined very fast, we hit exactly the case where the ChanUpdates have the exact same timestamp. We did not see this effect prior to this change tho. So to catch this case in the itests, we have to include a check in the isStaleEdgePolicy that ChanUpdates which are different but have the same timestamp are not considered stale ? Wondering if this could have some side effects ?

ProofOfKeags · 2023-09-26T22:14:38Z

+	// channel has been announced. We anticipate the local ChanAnnouncement
+	// and the local ChanUpdate.


Nice clarification here. I do wonder if instead of just counting the messages though that we should actually check that both of those distinct messages appear

good idea will add the msg check.

ProofOfKeags · 2023-09-26T22:34:13Z

-		})
+		// Only add the ChanUpdate to the gossiper if the
+		// __dont_forward__ bit signals it.
+		if !e1Ann.MessageFlags.HasDontForward() {


Let me see if I can make sense of this. This happens in the scenario where we are processing a rejected edge. The only time we add an edge to our graph that we didn't validate the AuthProof for is if we are one of the parties on the channel. I'm speculating because I'm still new to this segment of the codebase but it seems that this only happens when we add the edge prior to announcement_signatures and then possibly process the remote's channel_update prior to a valid channel_announcement which contains the proof. So in this scenario it actually stands to reason that this bit should always be set here, right?

ProofOfKeags · 2023-09-26T22:36:39Z

-
+		// Only add the ChanUpdate to the gossiper if the
+		// __dont_forward__ bit signals it.
+		if !e2Ann.MessageFlags.HasDontForward() {


I believe in this case since we are processing a rejected edge, the only case where this happens is one where these flags are set, or even if they aren't, they ought to be.

ProofOfKeags · 2023-09-26T22:46:27Z

+	// Every public channel starts with the dont_forward bit set before the
+	// mandatory six confirmations are reached. We will see 2 different
+	// ChanUpdates before a public channel is announced to the broader
+	// network. We need to make sure we don't consider the second one as a
+	// keepalive msg.
+	if !update.MessageFlags.HasDontForward() &&
+		prev.MessageFlags.HasDontForward() {
+
+		return false
+	}


ProofOfKeags · 2023-09-26T22:50:47Z

 	shortChanID *lnwire.ShortChannelID,
 	peerAlias *lnwire.ShortChannelID,
-	ourPolicy *channeldb.ChannelEdgePolicy) error {
+	ourPolicy *channeldb.ChannelEdgePolicy, shouldAnnounce bool) error {


TIL that the network requires 6 confs and doesn't simply allow it to be accepted by the acceptor's minimum_depth 🤔

Crypt-iQ · 2023-10-02T16:20:30Z

+	// unannounced even for public channels until six confirmations are
+	// reached.
+	shouldAnnounce := channel.ChannelFlags&lnwire.FFAnnounceChannel == 1 &&
+		channel.NumConfsRequired >= 6


NumConfsRequired is the min accept depth setting, but the channel may have less than 6 confirmations at this point. This means that we'll announce the channel w/o the dont_forward bit before 6 confs have been reached

nice catch, I forgot about the zero-conf public channels.

Crypt-iQ · 2023-10-02T16:37:15Z

+	// edge is persisted to disk later when propagating the new channel
+	// update.
+	if edgeInfo.AuthProof == nil {
+		edge.MessageFlags |= lnwire.ChanUpdateDontForward


This should be fixed now since addToRouterGraph is now called in annAfterSixConfs

Crypt-iQ · 2023-10-02T17:06:16Z

-
+		// Only add the ChanUpdate to the gossiper if the
+		// __dont_forward__ bit signals it.
+		if !e2Ann.MessageFlags.HasDontForward() {


This code seems fine as-is, but can you elaborate -- what does the peer rate-limiting have to do with this?

Crypt-iQ · 2023-10-02T17:12:28Z

+			log.Debugf("Update edge for short_chan_id(%v) got: %v",
+				shortChanID, err)
+		} else if strings.Contains(err.Error(),
+			channeldb.ErrEdgeNotFound.Error()) {


This should be a separate commit with step-by-step info in the commit body. One TODO was to make the deletion + addition of edges atomic. That could be a separate PR that lands before this one.

Crypt-iQ · 2023-10-03T15:57:18Z

+			log.Debugf("Update edge for short_chan_id(%v) got: %v",
+				shortChanID, err)
+		} else if strings.Contains(err.Error(),
+			channeldb.ErrEdgeNotFound.Error()) {


I don't think this PR should land until atomic deletion and addition is done, just to keep things simple and not have band-aids in our code

Crypt-iQ · 2023-10-03T15:59:34Z

Shouldn't this only be done when the actual number of confirmations is >= 6? NumConfsRequired is min_accept_depth, not the number of confirmations that the funding tx has

Agree maybe not that intuitive, but for normal channels we only execute this function if the channel has reached the minimum_accepted depth or am I misunderstanding something ?
Nonetheless I will base it on mined blocks rather than the min_accepted_height

Ah I see, this should probably have a more clear comment explicitly saying that 6 confs have passed in this case

Cool, I decided to go with the check as it is and added a more detailed comment. Fetching chain data here would overcomplicate things in my view because we would have to fetch chain data here (introducing some level of indirection) wdyt ?

Set the dont_forward bit in the chan update msg when a channel is not (yet) announced to the network. This is either the case for channels which signal this in the ChannelFlags of the OpenChannel msg but also for public channels which have still not reached the minimum depth height of 6 confirmations.

Adopt tests to make sure the dont_forward bit is set when needed.

When a channel update is now executed the msg field will be migrated on the fly including the dont_forward bit.

In case the __dont_forward__ is set in the ChanUpdate msg lnd will not forward this msg to the gossiper hence broadcasting it to all its peers. This makes sure we follow the network rules and dont forward ChanUpdates which do not signal it.

This pr introduced a change in the ChanUpdate msg. We now explictily signal to our peer whether he should broadcast our msg to the broader network. So it is important that we receive a new ChanUpdate msg from our peer after the six confirmations are reached and we should broadcast the existence of this channel to the broader network. However there is a race condition for non-zero-conf public alias channels for now because the channel edge is removed+added after six confirmations to make sure we do not broadcast any ChanUpdate which uses the scid instead of the real short channel id. During the removal and the addition of the non-zero-conf public alias channel we might receive an update for this channel (because our peer was faster readding the channel in its db and sending out the new ChanUpdate using the confirmed-scid). If we did not catch this edge case messages from this peer would be rate limited and the new ChanUpdate msg where the __dont_forward__ bit is unset would not be received in time. This edge case should be removed as soon as we implement atomic deleting and addition of channel edges (for alias channels).

We test that a public zeroconf channel is gossiped through the network after 6 confirmation. Moreover itests had to be adopted because we are now sending an additional ChanUpdate msg for normal public channels signaling the broadcasting to the broader network.

ziggie1984 · 2023-10-04T16:06:27Z

I analysed the itests which were failing occasionally and could find the problem. So because we now send the ChanUpdate after six_confirmation again, for slower backends we run into the problem where 2 consecutive ChanUpdates are sent in the itests very fast. First one here:
https://github.com/lightningnetwork/lnd/pull/7759/files#diff-bbccb27bb17be513ad84c77ad8e37a4d76168956f7978309aab29a0cbfdf91acR3487-R3490

and the second one here after the Proofs of the peer are received:
https://github.com/lightningnetwork/lnd/pull/7759/files#diff-1eda595bbebe495bd74a6a0431c46b66cb4e8b53beb311067c010feac2665dcbR3350-R3356

This can happen so fast that our check here:

lnd/discovery/gossiper.go

Lines 2721 to 2732 in 42dfa2c

    
           if d.cfg.Router.IsStaleEdgePolicy( 
        
           	graphScid, timestamp, upd.ChannelFlags, 
        
           ) { 
        
           	log.Debugf("Ignored stale edge policy for short_chan_id(%v): "+ 
        
           		"peer=%v, msg=%s, is_remote=%v", shortChanID, 
        
           		nMsg.peer, nMsg.msg.MsgType(), nMsg.isRemote, 
        
           	) 
        
           	nMsg.err <- nil 
        
           	return nil, true 
        
           }

fails to distinguish those two consecutive ChanUpdates not marking the latter as stale (because the db is not updated fast enough with the newest update). So for itests where we test the rate-limiting this can lead to some effects where we rate-limit too much and therefore not recognizing the channel update policies.

I am not sure whether we want to accept this because a lot of people could experience failed itests in their PRs solely based on this behaviour above.

Happy to hear your conclusion.

Apart from that this PR is in stall mode currently and waiting for a PR which makes the deletion and addition of edges atomic (basically a new command maybe called readd_edge). This command is needed because for alias channels we have to readd the edge to make sure we do not propagate any ChanUpdates to the broader network using the alias instead of the real_chan_id. Doing this not atomic can lead to side-effects observed during the current PR.

lightninglabs-deploy · 2023-10-25T17:23:52Z

@hieblmi: review reminder
@saubyk: review reminder
@ziggie1984, remember to re-request review from reviewers when ready

ziggie1984 · 2023-10-25T19:49:33Z

!lightninglabs-deploy mute

Roasbeef · 2023-11-10T21:32:19Z

Deleted the staging branch, but this should go into master.

I can't re-open this for some reason.

ziggie1984 force-pushed the dont-forward-bit branch 3 times, most recently from 8530b00 to aaf3da6 Compare June 12, 2023 11:03

ziggie1984 marked this pull request as ready for review June 12, 2023 11:13

ziggie1984 force-pushed the dont-forward-bit branch from aaf3da6 to 2066eaa Compare June 12, 2023 15:08

saubyk requested a review from hieblmi June 12, 2023 18:22