Skip to content

peer: send reenable updates after 20m#2080

Closed
cfromknecht wants to merge 3 commits into
lightningnetwork:masterfrom
cfromknecht:delay-channel-enable
Closed

peer: send reenable updates after 20m#2080
cfromknecht wants to merge 3 commits into
lightningnetwork:masterfrom
cfromknecht:delay-channel-enable

Conversation

@cfromknecht
Copy link
Copy Markdown
Contributor

This commit adds a short delay before sending
out channel reenable messages. The aim is to
prevent flooding the network if we have
flapping peers. The approach ensures that
reenabling will only occur within a peer's
lifecycle, which can prevent lingering
goroutines from interfering with
enable/disable attempts across different
connections with the same peer.

@Roasbeef
Copy link
Copy Markdown
Member

announceChanStatus will already delay this, and only enable if the state hasn't changed in the last 20 minutes. In addition, this is an incomplete solution as we need to decouple enable/disable in our local graph, from the announcement we send out to the network. We should immediately enable within our graph, but delay the announcement until we reach a stable point. cc @halseth

@halseth
Copy link
Copy Markdown
Contributor

halseth commented Oct 24, 2018

Will be taking this over to finish the decoupling.

@halseth halseth mentioned this pull request Oct 24, 2018
@cfromknecht
Copy link
Copy Markdown
Contributor Author

@Roasbeef correct, this does not address the decoupling of local graph from network gossip, but neither is that the intention of this PR.

The purpose of this PR is

  1. not enable immediately to avoid flooding in case the peer is unstable.
  2. to remove the stray goroutine by ensuring reenabling happens within the peer lifecycle, as this can properly "cancel" the reenable if the peer disconnects before stability is reached.

announceChanStatus will already delay this, and only enable if the state hasn't changed in the last 20 minutes

where does this happen? to me, it looks like announceChanStatus signs and sends a new update to the gossiper immediately, which is then only delayed by 30s?

this is an incomplete solution as we need to decouple enable/disable in our local graph, from the announcement we send out to the network

again, that is mostly orthogonal to this PR. decoupling them allows us greater flexibility in picking timeouts for each. After decoupling, we are free to increase this timeout before sending out a new announcement.

Additionally, just letting channelStatusWatcher reenable has the downside that it may take up to almost twice the intended timeout before the channel is actually reenabled, where as this will reliably reenable after the chosen timeout.

@cfromknecht
Copy link
Copy Markdown
Contributor Author

downside that it may take up to almost twice the intended timeout

reread through the code, and it's actually 1.25x

@cfromknecht cfromknecht added p2p Code related to the peer-to-peer behaviour P3 might get fixed, nice to have gossip labels Oct 31, 2018
@cfromknecht
Copy link
Copy Markdown
Contributor Author

After discussion w/ @halseth irl, seems like this should go in before #2091 as it is more accurate than allowing reenabling through long polling

@halseth
Copy link
Copy Markdown
Contributor

halseth commented Oct 31, 2018

Yep, #2115 needs to go in first though.

@halseth
Copy link
Copy Markdown
Contributor

halseth commented Nov 9, 2018

With #2115 merged I think this PR can replace #2091 if we set the channelReannounceDelay to InactiveChanTimeout (or create a new config option).

@cfromknecht
Copy link
Copy Markdown
Contributor Author

Created a new config option called ChanStatusInterval, PTAL

@cfromknecht cfromknecht force-pushed the delay-channel-enable branch 2 times, most recently from 113541f to c8d0367 Compare November 11, 2018 00:47
Comment thread peer.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is nilling necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shaves some CPU cycles from select, figured why not?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds some CPU cycles in my head at least

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why btw will it save CPU cycles? I think maybe that is only for tickers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the runtime doesn't have to acquire a lock to check if a value is being sent on a nil channel, it just skips it. it saves CPU cycles on every subsequent select

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func main() {
        const n = 1000000
        const numChans = 10

        chans := make([]chan int, numChans)
        for i := range chans {
                chans[i] = make(chan int)
        }   

        start := time.Now()
        for i := 0; i < n; i++ {
                select {
                case <-chans[0]:
                case <-chans[1]:
                case <-chans[2]:
                case <-chans[3]:
                case <-chans[4]:
                case <-chans[5]:
                case <-chans[6]:
                case <-chans[6]:
                case <-chans[8]:
                case <-chans[9]:
                default:
                }   
        }   
        fmt.Printf("non-nil: %v\n", time.Since(start))

        for i := range chans {
                chans[i] = nil 
        }   

        start = time.Now()
        for i := 0; i < n; i++ {
                select {
                case <-chans[0]:
                case <-chans[1]:
                case <-chans[2]:
                case <-chans[3]:
                case <-chans[4]:
                case <-chans[5]:
                case <-chans[6]:
                case <-chans[6]:
                case <-chans[8]:
                case <-chans[9]:
                default:
                }   
        }   
        fmt.Printf("nil: %v\n", time.Since(start))
}

running this gives, OMM:

non-nil: 521.206479ms
nil: 155.758026ms

Copy link
Copy Markdown
Contributor Author

@cfromknecht cfromknecht Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case anyone was interested, I modified the demo to show how the latency is linearly dependent on the number of nil channels:

func main() {
        const n = 1000000
        const numChans = 10

        chans := make([]chan int, numChans)
        for i := range chans {
                chans[i] = make(chan int)
        }   

        testNNil := func(ii int) {
                start := time.Now()
                for i := range chans[:ii] {
                        chans[i] = nil 
                }   

                start = time.Now()
                for i := 0; i < n; i++ {
                        select {
                        case <-chans[0]:
                        case <-chans[1]:
                        case <-chans[2]:
                        case <-chans[3]:
                        case <-chans[4]:
                        case <-chans[5]:
                        case <-chans[6]:
                        case <-chans[7]:
                        case <-chans[8]:
                        case <-chans[9]:
                        default:
                        }   
                }   
                fmt.Printf("%d-nil: %v\n", ii, time.Since(start))
        }   

        for i := 0; i < 11; i++ {
                testNNil(i)
        }   
}

outputs:

0-nil: 529.313487ms
1-nil: 508.630059ms
2-nil: 492.725965ms
3-nil: 462.589964ms
4-nil: 437.568966ms
5-nil: 408.646654ms
6-nil: 358.446526ms
7-nil: 301.32741ms
8-nil: 244.178853ms
9-nil: 195.43525ms
10-nil: 160.556439ms

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice! TIL :)

Comment thread config.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a rebase

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread server.go Outdated
@cfromknecht cfromknecht force-pushed the delay-channel-enable branch 2 times, most recently from 7a6e779 to 8df4db5 Compare November 15, 2018 01:00
@cfromknecht cfromknecht changed the title peer: send reenable updates after 5s peer: send reenable updates after 20m Nov 15, 2018
@halseth
Copy link
Copy Markdown
Contributor

halseth commented Dec 5, 2018

Needs a rebase now that #2115 is in!

@cfromknecht cfromknecht force-pushed the delay-channel-enable branch 2 times, most recently from 45b6d0d to a04fffd Compare December 6, 2018 03:18
Adds a configuration option for specifying
delay between detecting active/inactive
channels and sending an update to the
network. The default interval is 20m.
Copy link
Copy Markdown
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis failed with:

    lnd_test.go:108: lnd finished with error (stderr):
        exit status 1
        unknown flag `inactivechantimeout'
        unknown flag `inactivechantimeout'
        
    --- FAIL: TestLightningNetworkDaemon/send_update_disable_channel (36.50s)

Comment thread peer.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/dispatchStatuses/reenableTimeout

This commit adds a configurable delay before
sending out channel reenable messages. The aim is
to prevent flooding the network if we have
flapping peers. The approach ensures that
reenabling will only occur within a peer's
lifecycle, which can prevent lingering
goroutines from interfering with
enable/disable attempts across different
connections with the same peer.
@halseth
Copy link
Copy Markdown
Contributor

halseth commented Dec 7, 2018

Another rebase needed 🤣

Comment thread peer.go
// reenableTimeout will fire once after the configured channel status
// interval has elapsed. This will trigger us to sign new channel
// updates and broadcast them with the "disabled" flag unset.
reenableTimeout := time.After(cfg.ChanStatusInterval)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be possible for the existing watchChannelStatus method to also take on this responsibility? As is right now, it's possible for both of these goroutines to conflict, and cause a double-ish re-enable for channels.

Generally, I find the existing logic in watchChannelStatus hard to follow w.r.t the precise delay between enable and disable. I wouldn't be against abstracting it further to isolate to its own package so we can properly unit test it (with necessary interfaces added).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we decided on not using long-polling for reenable as it can lead to false positives. agree with the comments on the existing logic, i will extend this PR to refactor that to only be responsible for 1) dampening reenables and 2) disable still-enabled channels after the configured interval via long polling

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleanest to make the long polling only responsible for disabling channels.

Since we poll to disable, and require 20 minutes (configurable) steady connection to enable, I don't think they should conflict?

@cfromknecht
Copy link
Copy Markdown
Contributor Author

Closing since this was replaced by the #2411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gossip p2p Code related to the peer-to-peer behaviour P3 might get fixed, nice to have

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants