Set Dont_Forward bit in Channel Update Message#7759
Conversation
8530b00 to
aaf3da6
Compare
aaf3da6 to
2066eaa
Compare
There was a problem hiding this comment.
Hey @ziggie1984 :-), thanks for picking up this feature! LGTM, just a few nits and questions.
Also re the test, I think in general it would be great to get new features covered by a test. Only in case it makes sense here, could you maybe add a check in a private channel test or a scid-alias test to confirm the dont_forward bit is set/unset in a public channel?
aaa1128 to
48f9fe8
Compare
|
@hieblmi thanks for taking a look, incorporated your comments. Agree with the tests, actually the failing unit-tests and itests forced me to update some test procedures to keep them running through and by this we test this new behaviour as well. |
48f9fe8 to
b3c49a2
Compare
hieblmi
left a comment
There was a problem hiding this comment.
Great work @ziggie1984, thanks for the quick turnaround. Only one minor nit.
b3c49a2 to
d45c834
Compare
ProofOfKeags
left a comment
There was a problem hiding this comment.
Comments left in-line
There was a problem hiding this comment.
is there a a reason that this isn't:
| announcedChan := completeChan.ChannelFlags&lnwire.FFAnnounceChannel != 0 | |
| if peerAlias != nil { | |
| announcedChan = false | |
| } | |
| announcedChan := completeChan.ChannelFlags&lnwire.FFAnnounceChannel != 0 && peerAlias == nil |
it seems like this more clearly captures the semantics.
There was a problem hiding this comment.
Was doing it because of the readability of the code, the 80 character limit would lead to something like this:
announcedChan := completeChan.ChannelFlags&
lnwire.FFAnnounceChannel != 0 && peerAlias == nil
But ok to change if that's better?
There was a problem hiding this comment.
the fewer times we reassign a variable the better imo. If you break it up put the second boolean condition on a new line rather than breaking the mask operation across lines.
There was a problem hiding this comment.
This seems to guarantee that if we don't have an AuthProof that we will set the ChanUpdateDontForward bit, but does not specify in any way what should happen if the AuthProof is present. Do we always want to leave it in whatever state it is currently in, or do we want to unset that bit if the AuthProof is present? Secondly, is the AuthProof always absent when the channel is unannounced? If so, is this invariant captured anywhere? It seems unintuitive to me that those two ideas are so tightly linked. However, if we can guarantee that that invariant will hold, then this seems like a safe enough way to do this.
There was a problem hiding this comment.
Good Catch, I need to add a testcase somewhere where we switch from unannounced to announced (zero-conf public channel). Because the current code would just have added the same edgeinfo as before including the msg flag (retrieving it from the db).
Lnd makes sure that we only have a valid proof specified when the channel was announced to the network so I think it is even stricter than just looking at the dont_forward bit.
There was a problem hiding this comment.
Yeah judging by some of the other parts of this PR, it seems that the AuthProof is a necessary but not sufficient condition for forwarding. We must also have an absence of the dont_forward bit. I think we need to mentally decouple the idea of the presence of the AuthProof and the decision to forward. I do not believe that zero-conf public channels will be the last time we see these ideas differ from one another.
71d6847 to
590b811
Compare
|
So I reevaluated the situation and fixed the problem which @ProofOfKeags pointed out (whether we need to set the bit in case we have a channel proof). There is a situation where a prior private channel can be switched to public and this is a zeroconf public channel, which is private before 6 confs and later published to the network. In this case the prior codebase made sure we remove the old entry in the channel graph and add it under the real ShortChannel ID. But we did just use the old EdgePolicyInfo so that would have been a bug. Now I implemented it in a way where the MessageFlags are not overwritten. However still some things I am indecisive about:
With that said looking forward to your final conclusion. |
3317ba5 to
14a6f53
Compare
yyforyongyu
left a comment
There was a problem hiding this comment.
Thanks for the PR🙏
Mostly questions, also have a question - do we use dont_forward bit anywhere else other than private channels?
| shortChanID *lnwire.ShortChannelID, | ||
| peerAlias *lnwire.ShortChannelID, | ||
| ourPolicy *channeldb.ChannelEdgePolicy) error { | ||
| ourPolicy *channeldb.ChannelEdgePolicy, shouldAnnounce bool) error { |
There was a problem hiding this comment.
think we can remove this new param and do the check inside this method?
shouldAnnounce := completeChan.ChannelFlags&lnwire.FFAnnounceChannel == 1 &&
completeChan.NumConfsRequired >= 6There was a problem hiding this comment.
TIL that the network requires 6 confs and doesn't simply allow it to be accepted by the acceptor's minimum_depth 🤔
There was a problem hiding this comment.
yes 6 confs are mandatory according to the spec before announcing to the network: https://github.com/lightning/bolts/blob/4dcc377209509b13cf89a4b91fde7d478f5b46d8/07-routing-gossip.md?plain=1#L86-L89
There was a problem hiding this comment.
I pulled the check out for clarity but also ok with putting it back into the addToRouterGraph function (see #7759 (comment)), but also open to change it back.
| // can be usable before six confirmations but we still keep the channel | ||
| // unannounced even for public channels until six confirmations are | ||
| // reached. | ||
| shouldAnnounce := channel.ChannelFlags&lnwire.FFAnnounceChannel == 1 && |
There was a problem hiding this comment.
can we do this check inside addToRouterGraph?
There was a problem hiding this comment.
see comment above.
| // max_htlc field. | ||
| msgFlags := lnwire.ChanUpdateRequiredMaxHtlc | ||
|
|
||
| if !shouldAnnounce { |
There was a problem hiding this comment.
actually even further I'm wondering if we could use ourPolicy.MessageFlags here, but this won't be feasible unless we also modify storedFwdingPolicy, maybe a future refactor PR.
There was a problem hiding this comment.
hmm not sure, so the ourPolicy flag is only not nil, it the channel is a ZeroConf channel to make sure we persist the policies which were present before the 6 confirmations. And because the msg flag bits might change see comment below, we cannot use the ourPolicy variable ?
// We are not reusing the MessageFlags because a channel might
// change from unannounced to announced during its lifetime.
| }) | ||
| // Only add the ChanUpdate to the gossiper if the | ||
| // __dont_forward__ bit signals it. | ||
| if !e1Ann.MessageFlags.HasDontForward() { |
There was a problem hiding this comment.
I'm a bit confused about this change - so I guess when we must be the participant of the channel when receiving this channel_announcement? Otherwise it means we received channel_update which shouldn't be forwarded in the first place since they have dont_forward bit set?
There was a problem hiding this comment.
Let me see if I can make sense of this. This happens in the scenario where we are processing a rejected edge. The only time we add an edge to our graph that we didn't validate the AuthProof for is if we are one of the parties on the channel. I'm speculating because I'm still new to this segment of the codebase but it seems that this only happens when we add the edge prior to announcement_signatures and then possibly process the remote's channel_update prior to a valid channel_announcement which contains the proof. So in this scenario it actually stands to reason that this bit should always be set here, right?
There was a problem hiding this comment.
Good questions!
So as it is coded currently, both lnd peers will readd the channel after 6 confirmations are reached with the _dont_forward_bit unset here: https://github.com/lightningnetwork/lnd/pull/7759/files#diff-bbccb27bb17be513ad84c77ad8e37a4d76168956f7978309aab29a0cbfdf91acR3487 (for Scid-channels it works a bit different but the channel is readded as well => hence a new ChanUpdate is sent (directly to the peer) so we anticipate the dont_forward bit not set in the above case. However we can not be sure whether the peer ratelimited us or something and the ChanUpdate did not make it into its graph. So thats why I integrated this check to make sure we follow the network rules and don't gossip a ChanUpdate with the bit set, potentially getting punished by other peers for not sticking to the rules?
|
|
||
| // Only add the ChanUpdate to the gossiper if the | ||
| // __dont_forward__ bit signals it. | ||
| if !e2Ann.MessageFlags.HasDontForward() { |
There was a problem hiding this comment.
Q: do e2 and e1 always have the same message flags?
There was a problem hiding this comment.
I believe in this case since we are processing a rejected edge, the only case where this happens is one where these flags are set, or even if they aren't, they ought to be.
There was a problem hiding this comment.
We don't know for sure, depending on how strict the peer rate-limits us, but in the best case scenario we should have the same message bits set here. I think it makes sense to check for the bit nonetheless just in case we somehow did not update the Graph-Data and do not broadcast a ChanUpdate which was never met to be broadcasted.
There was a problem hiding this comment.
This code seems fine as-is, but can you elaborate -- what does the peer rate-limiting have to do with this?
There was a problem hiding this comment.
So I was referring to the rate limiting because we consider a ChanUpdate as stale if it has not a greater timestamp than the prior one. (even if the content differ). This had effects in the itests where blocks were mined quickly and the time between block 3 and block 6 was lower than 1 sec, and when we or our peer announced the channel after 6 blocks it had happened that the old ChanUpdate from our Peer was still in our db. That should never happen in the real world but thats why this check is also important here (at least for the itests).
| @@ -2271,6 +2282,16 @@ func IsKeepAliveUpdate(update *lnwire.ChannelUpdate, | |||
| if update.MessageFlags.HasMaxHtlc() && !prev.MessageFlags.HasMaxHtlc() { | |||
There was a problem hiding this comment.
looks like this line can be removed
There was a problem hiding this comment.
Sure also to remain backwards compatible ? There might be old nodes out there which have not updated to the latest release where the MaxHTLC flag became mandatory so we don't want to consider those ChanUpdates as keepalive msgs ?
| log.Debugf("Update edge for short_chan_id(%v) got: %v", | ||
| shortChanID, err) | ||
| } else if strings.Contains(err.Error(), | ||
| channeldb.ErrEdgeNotFound.Error()) { |
There was a problem hiding this comment.
I don't think we can do this here as it creates a possible DoS since it bypasses the isRecentlyRejectedMsg check.
There was a problem hiding this comment.
You are right this is something I still not figured out (referring to the race condition here), I don't think its a DoS vector tho because the case where we do not find the corresponding edge is already covered here:
Line 2768 in 14a6f53
The problem why we need to check for this case here, is that somehow for Alias channels we remove and add the edge after six confirmations and we might run into the case where we are receiving the new ChanUpdate (with the dont_foward bit unset) and also removing the edge leading to the case where the itests break. I guess I need to make sure that this call :
Line 3474 in 14a6f53
and the call here:
Line 2941 in 14a6f53
do not interfere.
There was a problem hiding this comment.
This should be a separate commit with step-by-step info in the commit body. One TODO was to make the deletion + addition of edges atomic. That could be a separate PR that lands before this one.
There was a problem hiding this comment.
sounds good, I separated this commit and added a detailed commit body. I also included a TODO so that depending on whether this separate PR which makes the deletion+addition atomic lands before of after this change we can just delete it again.
However I am also in favour of adding a separate PR before this one and removing this commit altogether because I don't think we need it if its atomic. I will create an issue or a PR directly.
There was a problem hiding this comment.
I don't think this PR should land until atomic deletion and addition is done, just to keep things simple and not have band-aids in our code
| // Make sure Alice which is not a direct peer will see this channnel | ||
| // as well after it is confirmed. A public zeroconf channel is only | ||
| // announced to the broader network after six confirmations. | ||
| ht.AssertTopologyChannelOpen(ht.Alice, fundingPoint3) |
There was a problem hiding this comment.
don't think this change is necessary - if we wanna test relaying gossip we should test it in graph tests.
| // after 6 confirmations are reached is not considered stale. This | ||
| // ChanUpdate is important because it unsets the __dont_forward__ bit | ||
| // signaling the broadcasting to the network. | ||
| time.Sleep(time.Second) |
There was a problem hiding this comment.
I don't think we need to sleep here? If the channel updates have diffs they shouldn't be considered as stale?
There was a problem hiding this comment.
So the problem which manifests in the itests lies here:
Lines 2721 to 2732 in 14a6f53
our timestamp granularity is 1 second, now I introduced the Change where we send a ChanUpdate after 3 blocks(if default value is active) but also after 6 blocks with the dont_forward bit unset, because in the itests blocks are mined very fast, we hit exactly the case where the ChanUpdates have the exact same timestamp. We did not see this effect prior to this change tho. So to catch this case in the itests, we have to include a check in the isStaleEdgePolicy that ChanUpdates which are different but have the same timestamp are not considered stale ? Wondering if this could have some side effects ?
| // channel has been announced. We anticipate the local ChanAnnouncement | ||
| // and the local ChanUpdate. |
There was a problem hiding this comment.
Nice clarification here. I do wonder if instead of just counting the messages though that we should actually check that both of those distinct messages appear
There was a problem hiding this comment.
good idea will add the msg check.
| }) | ||
| // Only add the ChanUpdate to the gossiper if the | ||
| // __dont_forward__ bit signals it. | ||
| if !e1Ann.MessageFlags.HasDontForward() { |
There was a problem hiding this comment.
Let me see if I can make sense of this. This happens in the scenario where we are processing a rejected edge. The only time we add an edge to our graph that we didn't validate the AuthProof for is if we are one of the parties on the channel. I'm speculating because I'm still new to this segment of the codebase but it seems that this only happens when we add the edge prior to announcement_signatures and then possibly process the remote's channel_update prior to a valid channel_announcement which contains the proof. So in this scenario it actually stands to reason that this bit should always be set here, right?
|
|
||
| // Only add the ChanUpdate to the gossiper if the | ||
| // __dont_forward__ bit signals it. | ||
| if !e2Ann.MessageFlags.HasDontForward() { |
There was a problem hiding this comment.
I believe in this case since we are processing a rejected edge, the only case where this happens is one where these flags are set, or even if they aren't, they ought to be.
| // Every public channel starts with the dont_forward bit set before the | ||
| // mandatory six confirmations are reached. We will see 2 different | ||
| // ChanUpdates before a public channel is announced to the broader | ||
| // network. We need to make sure we don't consider the second one as a | ||
| // keepalive msg. | ||
| if !update.MessageFlags.HasDontForward() && | ||
| prev.MessageFlags.HasDontForward() { | ||
|
|
||
| return false | ||
| } |
| shortChanID *lnwire.ShortChannelID, | ||
| peerAlias *lnwire.ShortChannelID, | ||
| ourPolicy *channeldb.ChannelEdgePolicy) error { | ||
| ourPolicy *channeldb.ChannelEdgePolicy, shouldAnnounce bool) error { |
There was a problem hiding this comment.
TIL that the network requires 6 confs and doesn't simply allow it to be accepted by the acceptor's minimum_depth 🤔
14a6f53 to
1d2b0c0
Compare
| // unannounced even for public channels until six confirmations are | ||
| // reached. | ||
| shouldAnnounce := channel.ChannelFlags&lnwire.FFAnnounceChannel == 1 && | ||
| channel.NumConfsRequired >= 6 |
There was a problem hiding this comment.
NumConfsRequired is the min accept depth setting, but the channel may have less than 6 confirmations at this point. This means that we'll announce the channel w/o the dont_forward bit before 6 confs have been reached
There was a problem hiding this comment.
nice catch, I forgot about the zero-conf public channels.
| // edge is persisted to disk later when propagating the new channel | ||
| // update. | ||
| if edgeInfo.AuthProof == nil { | ||
| edge.MessageFlags |= lnwire.ChanUpdateDontForward |
There was a problem hiding this comment.
This should be fixed now since addToRouterGraph is now called in annAfterSixConfs
|
|
||
| // Only add the ChanUpdate to the gossiper if the | ||
| // __dont_forward__ bit signals it. | ||
| if !e2Ann.MessageFlags.HasDontForward() { |
There was a problem hiding this comment.
This code seems fine as-is, but can you elaborate -- what does the peer rate-limiting have to do with this?
| log.Debugf("Update edge for short_chan_id(%v) got: %v", | ||
| shortChanID, err) | ||
| } else if strings.Contains(err.Error(), | ||
| channeldb.ErrEdgeNotFound.Error()) { |
There was a problem hiding this comment.
This should be a separate commit with step-by-step info in the commit body. One TODO was to make the deletion + addition of edges atomic. That could be a separate PR that lands before this one.
9748760 to
04eb1f0
Compare
| log.Debugf("Update edge for short_chan_id(%v) got: %v", | ||
| shortChanID, err) | ||
| } else if strings.Contains(err.Error(), | ||
| channeldb.ErrEdgeNotFound.Error()) { |
There was a problem hiding this comment.
I don't think this PR should land until atomic deletion and addition is done, just to keep things simple and not have band-aids in our code
There was a problem hiding this comment.
Shouldn't this only be done when the actual number of confirmations is >= 6? NumConfsRequired is min_accept_depth, not the number of confirmations that the funding tx has
There was a problem hiding this comment.
Agree maybe not that intuitive, but for normal channels we only execute this function if the channel has reached the minimum_accepted depth or am I misunderstanding something ?
Nonetheless I will base it on mined blocks rather than the min_accepted_height
There was a problem hiding this comment.
Ah I see, this should probably have a more clear comment explicitly saying that 6 confs have passed in this case
There was a problem hiding this comment.
Cool, I decided to go with the check as it is and added a more detailed comment. Fetching chain data here would overcomplicate things in my view because we would have to fetch chain data here (introducing some level of indirection) wdyt ?
b44fc60 to
0bee285
Compare
Set the dont_forward bit in the chan update msg when a channel is not (yet) announced to the network. This is either the case for channels which signal this in the ChannelFlags of the OpenChannel msg but also for public channels which have still not reached the minimum depth height of 6 confirmations.
Adopt tests to make sure the dont_forward bit is set when needed.
When a channel update is now executed the msg field will be migrated on the fly including the dont_forward bit.
In case the __dont_forward__ is set in the ChanUpdate msg lnd will not forward this msg to the gossiper hence broadcasting it to all its peers. This makes sure we follow the network rules and dont forward ChanUpdates which do not signal it.
This pr introduced a change in the ChanUpdate msg. We now explictily signal to our peer whether he should broadcast our msg to the broader network. So it is important that we receive a new ChanUpdate msg from our peer after the six confirmations are reached and we should broadcast the existence of this channel to the broader network. However there is a race condition for non-zero-conf public alias channels for now because the channel edge is removed+added after six confirmations to make sure we do not broadcast any ChanUpdate which uses the scid instead of the real short channel id. During the removal and the addition of the non-zero-conf public alias channel we might receive an update for this channel (because our peer was faster readding the channel in its db and sending out the new ChanUpdate using the confirmed-scid). If we did not catch this edge case messages from this peer would be rate limited and the new ChanUpdate msg where the __dont_forward__ bit is unset would not be received in time. This edge case should be removed as soon as we implement atomic deleting and addition of channel edges (for alias channels).
We test that a public zeroconf channel is gossiped through the network after 6 confirmation. Moreover itests had to be adopted because we are now sending an additional ChanUpdate msg for normal public channels signaling the broadcasting to the broader network.
0bee285 to
42dfa2c
Compare
|
I analysed the itests which were failing occasionally and could find the problem. So because we now send the ChanUpdate after six_confirmation again, for slower backends we run into the problem where 2 consecutive ChanUpdates are sent in the itests very fast. First one here: and the second one here after the Proofs of the peer are received: This can happen so fast that our check here: Lines 2721 to 2732 in 42dfa2c fails to distinguish those two consecutive ChanUpdates not marking the latter as stale (because the db is not updated fast enough with the newest update). So for itests where we test the rate-limiting this can lead to some effects where we rate-limit too much and therefore not recognizing the channel update policies. I am not sure whether we want to accept this because a lot of people could experience failed itests in their PRs solely based on this behaviour above. Happy to hear your conclusion. Apart from that this PR is in stall mode currently and waiting for a PR which makes the deletion and addition of edges atomic (basically a new command maybe called |
|
@hieblmi: review reminder |
|
!lightninglabs-deploy mute |
|
Deleted the staging branch, but this should go into master. I can't re-open this for some reason. |
Fixes #7416
Change Description
According to the Specification we must set the
dont_forwardbit when we send achannelupdatemsg to the peer.With this change we make sure that our local channel updates for unannounced channels have this bit set.
Moreover we are performing a migration on the fly for all old unannounced channels as soon as the channel policy is updated.
We currently do not use this flag for our own decisions when forwarding a channel_update to the broader network (we use theAuthProofinstead but we have to introduce this change to be spec. compliant.Currently no new tests where added. Tests in the
gossiper_test.goare not affected by this change because the bit is set on higher levels. But open for discussions regarding more tests.Steps to Test
Steps for reviewers to follow to test the change.
Pull Request Checklist
Testing
Code Style and Documentation
[skip ci]in the commit message for small changes.📝 Please see our Contribution Guidelines for further guidance.
This change is