Skip to content

Strict channel enabling#2091

Closed
halseth wants to merge 2 commits into
lightningnetwork:masterfrom
halseth:strict-channel-enable
Closed

Strict channel enabling#2091
halseth wants to merge 2 commits into
lightningnetwork:masterfrom
halseth:strict-channel-enable

Conversation

@halseth
Copy link
Copy Markdown
Contributor

@halseth halseth commented Oct 24, 2018

This PR enables strict channel enabling in lnd. This means that similarly to how we disable channels, we require a disabled link to stay active for 20 minutes (default) before re-enabling it.

This is done in two steps:

  1. The path finding algorithm is modified to take into account an extra set of active local channels, that it will use when considering whether to ignore local channels. This improves the current path finding logic when using mission control, as we will no longer attempt to route over local channels that are inactive. Split into path finding: Disable bit decoupling #2115
  2. The channel re-enabling logic carried out when a peer connects is removed, as we will now instead let the server's re-enabling logic take care of enabling the disabled channels when they have been stable for 20 minutes.

This essentially decouples the graph's disabled status from the local path finding, as the local channel status will be more up-to-date than what is found in the graph.

Builds on #2115
Replaces #2080

@halseth halseth force-pushed the strict-channel-enable branch from fcee865 to a6b5089 Compare October 24, 2018 21:06
Comment thread htlcswitch/switch.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should maybe filter by EligibleToForward()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good idea! Should the same filtering be done for HasActiveLink below?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it seems so. in fact, i think we may be sending out channel announcements prematurely in some cases then

Comment thread routing/pathfind.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread config.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be useful to have these be distinctly configurable. The naming is somewhat confusing since the config option is overloaded

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you say it is sufficient to rename it? Making them distinct will complicate the logic a bit, and also seems they should be symmetric.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm my thinking was that they would be asymmetric, or at least that the ability to configure them as such would be useful even if the defaults were the same

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still seems like overloading the parameter is confusing and that they should be separate

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also somewhat of a breaking change from the user's perspective. prior to this change, their channels will be reenabled immediately. after applying this change, their channels won't be routable for 20 mins. this behavior happens w/o making any changes to the config because we're overloading the param

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should help with the health of the network, and in the future nodes might get punished (by getting used less for routing) for not disabling unreliable channels.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should help with the health of the network

I agree, which to me indicates that the defaults should be symmetric. though i still don't see a reason for not giving users the option to choose asymmetric timeouts if they want, esp if the code paths for enable/disable will be distinct anyway.

misnaming/overloading parameters has led to a great deal of confusion previously, imo we should try keep argument names as self-descriptive as possible, since most users will never read lnd -h

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on changing the parameter name to something more descriptive.

But I'm reluctant to making it possible to set them asymmetrically, as we want to discourage that (I think?), and also clutters up the config.

Comment thread peer.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we want to have this logic w/in the peer. Depending on the timing of the channelStatusWatcher, it could result in the reenable happening at 2x the specified timeout. I think this is okay for disables, but not something we want for enabling a channel

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2080 still seems like it should go in, and then thread the reenable timeout to override the default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic within the channelStatusWatcher should ensure that an active channel will stay disabled for at most InactiveChanInterval * 5 / 4 ?

The reason we want this for enabling as well is to attempt making the network converge to a state where nodes going offline often will have their channels disabled. Eventually we might even consider enabling them with an exponential backoff.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic within the channelStatusWatcher should ensure that an active channel will stay disabled for at most InactiveChanInterval * 5 / 4 ?

Indeed, it is 1.25x the specified timeout

From my understanding, the channelStatusWatcher is effectively long-polling, and is susceptible to false positives where a peer is online for say 5 mins at a time, but disconnects and reconnects quickly so each time we think it's enabled.

IMO we should only reenable if the connection is actually stable for 20 (or w/e) minutes, which can't really be deduced via long polling. #2080 fixes this by ensuring we always do so only after the prescribed duration since the connection was initialized. We avoid this currently by immediately sending a reenable, so the watcher is never actually detecting the disable -> enable transition.

FWIW, I think long polling is sufficient for detecting disables, as being down four consecutive times should be an indicator that the channel isn't reliable. For enabling however, passing four times isn't really sufficient for determining if the connection is actually stable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we might even consider enabling them with an exponential backoff.

Yeah was pondering this as well, I think it'd be a good idea

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should only reenable if the connection is actually stable for 20 (or w/e) minutes, which can't really be deduced via long polling.

Totally agree, I like this idea! I would say if we do this we should also make this true for disables; if you cannot maintain a stable connection for 20 minutes, we'll disable you. I think the code would be easier to maintain and extend if they share the logic here.

To allow nodes to do infrequent restarts, we could say something like they must keep a rolling 18.5 out of 20 min uptime, with each disconnect counting as minimum 1 min downtime say ( I haven't given this too much thought yet). I imagine @valentinewallace's channel event service could come in handy here :)

@halseth halseth force-pushed the strict-channel-enable branch 2 times, most recently from cdcf292 to 8b2d113 Compare October 29, 2018 04:34
@halseth
Copy link
Copy Markdown
Contributor Author

halseth commented Oct 29, 2018

The disable bit decoupling is now split into a separate PR, to reduce the size of this one, and since they are really two different discussions: #2115

@Roasbeef Roasbeef added P3 might get fixed, nice to have p2p Code related to the peer-to-peer behaviour htlcswitch server labels Oct 31, 2018
@halseth
Copy link
Copy Markdown
Contributor Author

halseth commented Nov 9, 2018

Might actually just use #2080 nevertheless when #2115 is merged, will demote this to P4 for now. Don't review.

@halseth halseth added P4 low prio and removed P3 might get fixed, nice to have labels Nov 9, 2018
The local path finding is now decoupled from the ChannelUpdate disable
flags in the channel graph, so we don't have to immediately enable the
channels when a peer connects. Instead we let the server's
watchChannelStatus goroutine enable the channel after it is been active
for a duration of inactivechantimeout.
@halseth halseth force-pushed the strict-channel-enable branch from 8b2d113 to 05d2e92 Compare December 5, 2018 12:50
@halseth
Copy link
Copy Markdown
Contributor Author

halseth commented Dec 5, 2018

Replaced by #2080.

@halseth halseth closed this Dec 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

htlcswitch p2p Code related to the peer-to-peer behaviour P4 low prio server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants