Today we implement the query_channel_range and reply_channel_range messages defined in BOLT 7. The gossiper sub-system will use this, in concert with our set of connected peers to attempt to reconcile our graph state with their over time. This works well for the most part as we rotate this reconciliation between peers to spot check our view of the graph every 20 minutes or so.
One issue we've seen pop up relatively frequently is a case of missing channel graph data. In this case, a node maybe was offline for some period of time, didn't get any updates at all, then ended up triggering the zombie channel pruning logic. The assumption with the way we restore zombie channels is that eventually we'll hear of the new channel update from the peer to recognize it as an actual channel once again. In practice, this doesn't always happen, either due to a node being online intermittently (mobile, etc) or due to poor connectivity, or messages just not propagating all that well.
Today whenever we get a reply_channel_range message, we'll check our zombie index for the channel before fetching it. If it's found in the index, then we won't fetch it all. Avoding fetching known zombies serves to reduce bandwidth (why download something you think is stale), and also avoid churn where we add a channel, then decide it's a zombie, then do it all over again.
BOLT 7 was eventually extended to support an extension of the reply message that optionally includes timestamp data for each scid sent. The timestamps are the time the channel updates were last sent for peers. We can use this information to see if our remote peer has a newer channel update for something we may consider a zombie. This may help to resolve the issue of nodes having gaps in their channel graph.
Steps To Completion
Today we implement the
query_channel_rangeandreply_channel_rangemessages defined in BOLT 7. The gossiper sub-system will use this, in concert with our set of connected peers to attempt to reconcile our graph state with their over time. This works well for the most part as we rotate this reconciliation between peers to spot check our view of the graph every 20 minutes or so.One issue we've seen pop up relatively frequently is a case of missing channel graph data. In this case, a node maybe was offline for some period of time, didn't get any updates at all, then ended up triggering the zombie channel pruning logic. The assumption with the way we restore zombie channels is that eventually we'll hear of the new channel update from the peer to recognize it as an actual channel once again. In practice, this doesn't always happen, either due to a node being online intermittently (mobile, etc) or due to poor connectivity, or messages just not propagating all that well.
Today whenever we get a
reply_channel_rangemessage, we'll check our zombie index for the channel before fetching it. If it's found in the index, then we won't fetch it all. Avoding fetching known zombies serves to reduce bandwidth (why download something you think is stale), and also avoid churn where we add a channel, then decide it's a zombie, then do it all over again.BOLT 7 was eventually extended to support an extension of the reply message that optionally includes timestamp data for each
scidsent. The timestamps are the time the channel updates were last sent for peers. We can use this information to see if our remote peer has a newer channel update for something we may consider a zombie. This may help to resolve the issue of nodes having gaps in their channel graph.Steps To Completion
Add the new
channel_update_timestampsTLV to thelnwirepackage.Update the
GossipSyncerto send the new bit that signals that we want the timestamp information.Concurrent with the above, update the
GossipSyncerto be able to send the timestamp information along side our normal replies. This information should ideally be obtained from the channel graph cache. We'll also want to take care that the cache remains consistent with the graph.