In-memory graph cache for faster pathfinding#5642
Conversation
f2382bf to
64a4966
Compare
Okay, PTAL, #5595 now has things a bit reordered. Last 3 commits is the cache + integration and the first 4 commits is pure refactors. Should work well and even tests pass with skipping graph buckets. |
64a4966 to
dde1d22
Compare
bhandras
left a comment
There was a problem hiding this comment.
Looking really good! I'm not super familiar with the routing code base but following the changes in the PR was easy and to me apart from a few nits looks almost ready. Great job Oliver!
dde1d22 to
f6ccffe
Compare
f6ccffe to
229ca32
Compare
|
|
||
| // db points to the actual backend holding the channel state database. | ||
| // This may be a real backend or a cache middleware. | ||
| db kvdb.Backend |
There was a problem hiding this comment.
I think we don't need this db as it's already in LinkNodeDB.db?
There was a problem hiding this comment.
The idea is to give the LinkNodeDB its own instance so we don't share the same reference necessarily. At the moment it will be the same instance but in the future we might point the LinkNodeDB to its own separate namespace.
There was a problem hiding this comment.
Cool. Since LinkNodeDB is embedded here I'd expect to call ChannelStateDB.db to get the LinkNodeDB.db, but then it's overwritten with its own db. Maybe we could choose not to do the embedding here. Non-blocking, just a thought for future changes.
There was a problem hiding this comment.
I decided to take things apart even more clearly in the latest state. Let me know what you think.
| startTime := time.Now() | ||
| log.Debugf("Populating in-memory channel graph, this might take a " + | ||
| "while...") | ||
| err := g.ForEachNode(func(tx kvdb.RTx, node *LightningNode) error { |
There was a problem hiding this comment.
Would be interesting to test this on etcd with the current mainnet graph to see if we encounter any transaction limits. Maybe we'll need a more specific solution without a transaction since this is at startup.
There was a problem hiding this comment.
Quick calculation: assuming we have ~15k nodes => 15k * (32 + 33 + 1) raw bytes for keys which is roughly 1 MiB (per key 32 bytes = bucket, 33 bytes = pubkey, 1 byte = value suffix).
Plus for each we also send the last mod revision which is a 8 byte so another 100 KiB. Plus the protocol overhead, so the txn should be less than 2 MiB. Still manageble.
There was a problem hiding this comment.
Good idea to test this! I'm going to fix up #5561 this week and then attempt a migration of the mainnet graph data to etcd and test this PR with it.
There was a problem hiding this comment.
I wonder if we should build tag this out for mobile clients? So if the mobile build tags are active, we don't load all this in....would be good to get some testing on this front, as worst case maybe this causes some weird OOM stuff for mobile nodes inadvertently.
There was a problem hiding this comment.
Yes, but that would completely disable path finding on mobile devices since we currently don't have a fall back version that uses the graph in the database. Or are you saying we should make it possible to enable the DB based fall back and implement it in this PR?
There was a problem hiding this comment.
I'm suspicious that way bbolt works at some point we probably have most of the graph in memory when we ForEach over all nodes/edges.
There was a problem hiding this comment.
The difference is that w/ bolt, we'll map everything into virtual memory, then as we iterate over the graph, the kernel will automatically swap the necessary pages into resident memory when we access it (page fault).
With this, we'll read everything at the very start (allocating enough resident memory for it all to be held), then over time the unused pages may be swapped back onto disk. In the worst case, everything will need ot be swapped back in if a path finding attempt fails. In the average case, we save a lot though since we cut down on context switching from user <-> kernel space, and also all the locking+copying mechanisms in bolt.
Re the chunky allocation at the very start, we could possibly pre-size the cache itself so the runtime can make one larger allocation instead of a series of smaller ones.
Re the question of if we can add a flag or not here, this was brought up again last week by some users concerned about the memory footprint on mobile phones. At this point, we simply haven't tested that path, but it may not be an issue in practice, we won't know until we either get someone to take this PR for a spin, or spin up a test env ourselves.
229ca32 to
f27ec45
Compare
|
Visit https://dashboard.github.orijtech.com?back=0&pr=5642&remote=true&repo=guggero%2Flnd to see benchmark details. |
yyforyongyu
left a comment
There was a problem hiding this comment.
Very cool feature. This will also fix some pain points we've encountered in our itests, regarding the deadlocks and etc. Got several questions and a few nits.
There was a problem hiding this comment.
nit: missing docs here. I'm still learning the updated structure here. So we have a LinkNodeDB.db and a LinkNode.db, and they might be different?
There was a problem hiding this comment.
Yeah bit of a naming collision here...
Not clear why we need this intermediate struct actually, givne we just need to pass in the kvdb.Backend directly?
There was a problem hiding this comment.
Just to make it more clear where a future code separation could be done. So everything "owned" by the LinkNodeDB could go into its own SQL table.
There was a problem hiding this comment.
unrelated to this PR, so the *LinkNode returned here doesn't have a database backend attached and we could not do operations like *LinkNode.Sync() here right?
There was a problem hiding this comment.
Yes, the backend is only used for the Sync() method. Added a comment and refactored it a bit to make it more clear.
There was a problem hiding this comment.
nit: all the LinkNode related methods seem to belong to LinkNodeDB.
There was a problem hiding this comment.
Not sure what you mean. This change turns the method into a function because no reference to the LinkNodeDB is required here, since we have the transaction instead.
There was a problem hiding this comment.
So we are separating the old big DB into more defined structs right?
There was a problem hiding this comment.
Yes, that's the idea. To eventually (further down the line) separate them into their own databases and possibly SQL tables.
There was a problem hiding this comment.
seems like the graph cache needs a bit more unit tests.
There was a problem hiding this comment.
I think one thing we'll want to do minimally is update the prior set of channel DB tests to have pre and post test assertions w.r.t the state of the channel cache. This should help to catch any instance we may have missed re cache consistency.
There was a problem hiding this comment.
Will start with those additional assertions now.
Roasbeef
left a comment
There was a problem hiding this comment.
Excellent PR!
Completed an initial pass, no major comments so far, planning on also giving this a spin on mainnet as well to see the impact of the added memory load on the newly optimized mater, with the 60k channel loaded in (w/ and w/o strict pruning).
There was a problem hiding this comment.
Yeah bit of a naming collision here...
Not clear why we need this intermediate struct actually, givne we just need to pass in the kvdb.Backend directly?
There was a problem hiding this comment.
In theory, the caller of this method could query the link node then query the graph, and merge it themselves manually. It's down as is for mainly historical and convenience reasons.
The only places that use this method atm is the chanbackup package (to include all the known addresses of a peer in the latest SCB instance). The chanbackup package uses the LiveChannelSource so it isn't tightly bound to the way we extract things here at the database level. Don't think this is super blocking though, just a nice to have to have cleaner seperation here, which may help with some other remote DB features in the future.
| startTime := time.Now() | ||
| log.Debugf("Populating in-memory channel graph, this might take a " + | ||
| "while...") | ||
| err := g.ForEachNode(func(tx kvdb.RTx, node *LightningNode) error { |
There was a problem hiding this comment.
I wonder if we should build tag this out for mobile clients? So if the mobile build tags are active, we don't load all this in....would be good to get some testing on this front, as worst case maybe this causes some weird OOM stuff for mobile nodes inadvertently.
There was a problem hiding this comment.
If we somehow attempt to insert a channel edge policy before the actual node?
There was a problem hiding this comment.
Can also avoid this w/ a nested map layer, so map[Vertex]map[ChannelID]Edge (assuming we don't need to preserve ordering of iteration, maybe it's better not to as then we also get some slight randomization here from the Go runtime?).
There was a problem hiding this comment.
I've updated the cache to use the map[Vertex]map[ChannelID]Edge as suggested. Makes sense here, even if it is perhaps slightly bigger.
There was a problem hiding this comment.
This is for some future where like neutrino nodes eventually fully validate their channel graph? Or splicing perhaps?
There was a problem hiding this comment.
Yeah, currently this is mostly a no-op for the cache. But maybe we'll actually update something later on that needs to be tracked in the graph. That way we already have the update in place. Currently the only call path here is router.AddProof() -> graph.UpdateChannelEdge() -> cache.UpdateChannel().
There was a problem hiding this comment.
I think one thing we'll want to do minimally is update the prior set of channel DB tests to have pre and post test assertions w.r.t the state of the channel cache. This should help to catch any instance we may have missed re cache consistency.
There was a problem hiding this comment.
Might be useful to tack on a comment that we do path finding backards, so we're always interested in the edge that arrives to us from the other node. Haven't really found the prefect terminology for all this though, mainly depends on like how one visualizes it in a sense.
90027fe to
3295a1a
Compare
|
I think this should behave a bit better concerning memory now. The improvements will be more noticeable with a larger number of nodes. The itest is still failing though I cannot really consistently reproduce it locally. Will try again tomorrow. |
Yeah, I think I'm going to add that during the RC phase in a separate PR to get this moving. Feels like the PR is large enough as is. |
bhandras
left a comment
There was a problem hiding this comment.
Re-reviewed the optimizaton part, PR LGTM 👍 , just one question regarding the scratch buffer.
The funding manager doesn't need to know the details of the underlying storage of the opening channel state, so we move the actual store and retrieval into the channel database.
As a preparation to have the method for querying the addresses of a node separate from the channel state, we extract that method out into its own interface.
To further separate the channel graph from the channel state, we refactor the AddrsForNode method to use the graphs's public methods instead of directly accessing any buckets. This makes sure that we can have the channel state cached with just its buckets while not using a kvdb level cache for the graph. At the same time we refactor the graph's test to also be less dependent upon the channel state DB.
Adds an in-memory channel graph cache for faster pathfinding. Original PoC by: Joost Jager Co-Authored by: Oliver Gugger
To avoid the channel map needing to be re-grown while we fill the cache initially, we might as well pre-allocate it with a somewhat sane value to decrease the number of grow events.
With this commit we use an optimized version of the node iteration that causes fewer memory allocations by only loading the part of the graph node that we actually need to know for the cache.
245c6db to
6240851
Compare
This commit fixes a flake in the channel status update itest that occurred if Carol got a channel edge update for a channel before it heard of the channel in the first place. To avoid that, we wait for Carol to sync her graph before sending out channel edge or policy updates. As always when we touch itest code, we bring the formatting and use of the require library up to date.
|
I think I finally nailed the flaky itest (at least the |
You consider these a blocker as well? |
|
Not a blocker necessarily. Just feels a bit scary to see the "vanilla" itest fail (the |
|
Gave this another spin, ended up nocking down that initial burst quite a bit. I think we still want to add the flag to disable for mobile however (along w/ additional testing) |
|
I've fixed up my |
Replaces #5631.
Fixes #5389.
This PR adds an in-memory cache for the graph that is used for pathfinding.
This brings down the average time for finding a path from around 3000ms with bbolt on my machine to between 250 and 300ms.
For remote database backends this gain should be even more significant.
TODO:
Make sure all itests passFurther optimize memory footprint (currently this adds about 60MB of heap)Add more unit testsGet dependent PR channeldb: write through cache for the graph and channel state #5595 merged.Fixes #3266