-
Notifications
You must be signed in to change notification settings - Fork 974
first version of lib-autopilot.py #1888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cdecker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @renepickhardt, excellent start for the autopilot. I went through and as always with new contributors I split every possible hair and was as petty as I could 😄 The following are some general remarks, and I have left a lot of inline comments:
- Name: Why is this called
lib-autopilot.py? It's a sidecar that is supposed to be run alongside the main c-lightning daemon, hence thelib-part is confusing to me. Calling itautopilot.pymight be clearer. - Formatting: the code could use a run of
pyflakesorpep8to fix some minor formatting issues. A spell-checker would also be useful 😉 - Sampling: your current sampling strategy just crops off the long tail of the distribution, meaning that nodes at the lower end of the distribution cannot be possibly chosen. A better strategy would be to use
numpy.random.choicewith thepparameter to bias towards more valuable nodes, but still allowing for the occasional low preference peer. - Disjoint sets: the current way of independently selecting from 4 different mechanisms does not guarantee that we don't get duplicates, e.g., a peer might both be desirable in terms of betweenness and richness.
- I wonder if we can improve the betweenness to consider our own score instead of that of a potential peer, i.e., if we were to make these 2 connections that were proposed, how would our own betweenness change? That's more what we're trying to optimize, rather than the betweenness of our peers.
Other than these, I'm really excited about starting to experiment with various optimization techniques. Do you think a global score, created by combining the various techniques is also doable, as opposed to the current "we just pick nodes from 4 different sets at random"?
contrib/lib-autopilot.py
Outdated
| try: | ||
| self.__logger.info("Try to load graph from file system at: data/networkx_graph") | ||
| with open("data/networkx_graph","rb") as infile: | ||
| self.G = pickle.load(infile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pickle is also most likely inefficient for storage (storing a lot of metadata about the python classes used in memory, instead of raw data. There are a number of better formats that are also more portable here: https://networkx.github.io/documentation/networkx-1.10/reference/readwrite.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I wanted to remove that storage part anyway. it was just for me that I could develop more quickly in a real world setting we also wish to have the most recent state of the graph and fetch it wile running the autopilot. The question is what will happen, if the network becomes really large? other than that do you have a preferred method from that list, I saw pickle was also part of it.
contrib/lib-autopilot.py
Outdated
| cutoff = 10 | ||
| self.__logger.info("cutoff value must be a positive integer. Set back to default value: 10") | ||
|
|
||
| all_pair_shortest_paths = nx.all_pairs_shortest_path(self.G, cutoff=cutoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_pair_shortest_paths is not actually needed since you discard the path after computing its length, you can use shortest_path_length without specifying the src and dest which will just give you the length, without materializing the path in-memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check that. I thought I might need the paths later but in the current implementation I obviously don't
contrib/lib-autopilot.py
Outdated
| def run(self): | ||
| self.__logger.info("running the autopilot on a graph with {} nodes and {} edges.".format(len(self.G.nodes()), len(self.G.edges()))) | ||
| candidates = self.get_candidates(k=21) | ||
| time.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this sleep for? It doesn't seem like we need it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used logger and print and they got mixed so I decided while debugging I could wait a second in order to have nicer output. forgot to delete the line
|
Hey Christian thanks for this extensive code review. I will answer your general remarks now. I will probably only after the lightninghackday find the time for a new version.
There would be many ways to learn a global score (either by simulation or by collecting statistics over time). I am sure one could easily work years on optimizing the autopilot. So I guess it will be more a question of resource and focus. If that is wanted it would certainly be something that could be done (which again would speak for a lib autopilot) At this time I believe (obviously without knowing) for the network the best heuristics are I guess it would be really great if the autopilot was not only working on the static graph, but would talk with the network (and maybe even exchange some statistics (e.g. about skewness of channel balances) also there are more features which I kind of ignored. also the current implementation is impractical if the network grows so also for that it make sense to have the autopilot be coordinated in a distributed fashion maybe even as services like watch towers wich could be queried against. Same gist as above. it is easy to provide a hacky solution as I did but the problem can be engineered on for ever (: |
|
@cdecker some of your comments vanished. I am currently imiplementing the improved second version but wanted to look up some of your old comments since I don't remember everything. somehow every comment from you to which I did not answer disappeared. Do you know what has happened? Can the old state easily be restored? (if not no problem, I think I remember most of the comments) |
|
Comments are attached to commit objects in github, and we have the policy to rebase, which makes old commit objects obsolete. github also has some random, unspecified schedule for cleaning up unused commit objects, and hence comments may disappear completely if they are on a commit that was rebased out of existence. But if you did not rebase... possibly some other problem in github. |
I'm trying to get people to create |
the lib can currently generate candidates for an autopilot to open channels with according to four strategies: Random: following the Erdoes Renyi model nodes are drawn from a uniform distribution Central: nodes are sampled from a uniform distribution of the top most central nodes (betweenness) Network_Improovement: nodes are sampled from a uniform distribution of the nodes which are badly connected richness: nodes with high liquidity are taken and it is sampled from a uniform distribution of those this library is work in progress. there is no logic yet implemented on how many funds to allocate to which channel. also due to the randomness the lib might sometimes find less channels than was proposed. also many conor cases will let the code break.
…ill needs some clean up and sanity checks.
ff907c3 to
892e1c4
Compare
|
Ok I added the second version. It is late and I hope I didn't do any stupid mistakes. Biggest updates for the second version:
on the longterm issues are:
docs for command line interface: output of running the autopilot: |
|
|
||
| res = self.__create_pdfs() | ||
|
|
||
| candidats = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling candidats -> candidates . Repeated quite a lot.
| efficiently on large scale graphs with more than 100k nodes or on densly | ||
| connected graphs. | ||
| the programm needs the following dependencies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/programm/program/
|
Awesome project. Do you plan on pulling it out as a standalone application? |
|
@leishman thanks for your review. I separated the lib_autopilot from the C-lightning rpc interface on purpose. So the lib_autopilot could also easily be integrated to lnd or eclair. In particular it could be used by researchers to run simulations and find better strategies. A long term goal will be that the autopilots commuicate to each other and exchange some information about channel balances (which might conflict with privacy) and are able to request channels from others. In particular the autopilot currently ignores the channels you already have. There will be still much more work to do for this feature. |
| -b BALANCE, --balance BALANCE | ||
| use specified number of satoshis to open all channels | ||
| -c CHANNELS, --channels CHANNELS | ||
| opens specified amount of channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opens specified amount of channels -> opens the specified number of channels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always open 5 new channels when I specify --channels=5 or is this giving the target number of channels, i.e., if I have 3 channels open, I'll just open 2 more?
| optional arguments: | ||
| -h, --help show this help message and exit | ||
| -b BALANCE, --balance BALANCE | ||
| use specified number of satoshis to open all channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use specified -> uses the specified
|
@cdecker any major fixes requested otherwise I would correct those typos. What about integration to lightning-cli? |
cdecker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite some nice improvements. I found a few more things that I commented inline.
| python3 c-lightning-autopilot --help | ||
| in order to get all the command line options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalization: in -> In
| -b BALANCE, --balance BALANCE | ||
| use specified number of satoshis to open all channels | ||
| -c CHANNELS, --channels CHANNELS | ||
| opens specified amount of channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always open 5 new channels when I specify --channels=5 or is this giving the target number of channels, i.e., if I have 3 channels open, I'll just open 2 more?
| use specified number of satoshis to open all channels | ||
| -c CHANNELS, --channels CHANNELS | ||
| opens specified amount of channels | ||
| -r PATH_TO_RPC_INTERFACE, --path_to_rpc_interface PATH_TO_RPC_INTERFACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--rpc-socket-path?
| -p PERCENTILE_CUTOFF, --percentile_cutoff PERCENTILE_CUTOFF | ||
| only uses the top percentile of each probability | ||
| distribution | ||
| -d, --dont_store don't store the network on the hard drive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually negatives are prefixed by --no- or take false as an option to disable (I know, it's not a correct sentence). Also I think there is a convention to use - instead of _ but I can't find it in PEPs right now.
| -i INPUT, --input INPUT | ||
| points to a pickle file | ||
| a good example call of the program could look like that: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's an example, not sure how good it is 😉
| needed_total_balance, len(pdf))) | ||
| while needed_total_balance > balance and len(pdf) > 1: | ||
| min_val = min(pdf.values()) | ||
| k = [k for k, v in pdf.items() if v == min_val][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically just an ordered dict. You can also just sort a list of items by their values and effectively get the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also obviates the need to renormalize on lines 356-357
|
|
||
|
|
||
|
|
||
| def find_candidates(self, num_items=21,strategy = Strategy.DIVERSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: spacing in arguments.
|
|
||
|
|
||
| def find_candidates(self, num_items=21,strategy = Strategy.DIVERSE, | ||
| percentile = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
percentile really should default to 1, which is the actual effect in this call.
| candidats = self.__sample_from_percentile(merged, percentile, | ||
| num_items) | ||
| """ | ||
| following code prints a list of candidates for debugging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't leave dead code commented out.
| if strategy == Strategy.DIVERSE: | ||
| for strategy, pdf in res.items(): | ||
| tmp = self.__sample_from_percentile(pdf, percentile, sub_k) | ||
| candidats = candidats.union(set(tmp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very likely to result in duplicates: central nodes may also be rich. This then result in less than the expected number of channels being opened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and no. the likelihood of duplicates drops tremendously when sampling from the full probability distribution.
My current Idea was to say if duplicates emerge we have less channels. this could of course easily be fixed by just checking the size of the result set and polling channels again. In the best case polling with one of the strategies from which the duplicates emerged.
|
have put a few questions back in the inline statements would be great to hear your thoughts @cdecker before I finish up the requested changes. in particular the question about seednodes and the mapping of the graph from c lightning to lib-autopilot |
| lib_autopilot is a library which based on a networkx graph tries to | ||
| predict which channels should be added for a new node on the network. The | ||
| long term is to generate a lightning network with good topological properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably missing word: goal
|
I am right now creating a c-lightning plugin that runs the lib_autopilot. I have some UX / UI trouble: Calls to the autopilot can take several minutes to compute. In the current design calls will be made with How am I supposed to have non blocking calls? Is my plugin supposed to maintain its own threadpool and dispatch calls and provide an api to query against the state of certain calls or is there another way how this is supposed to be achieved? I currently imagine a userflow that goes like: The problem with this design would be that there would not be an explicit user feedback once the job is completed. Currently I also don't see a way of sending incremental feedback about the progress (other than pushing it to the logfile) Any ideas how this should be resolved? |
I completely forgot to push these changes upstream, but I have a branch that has async plugin method calls. I opened #2326 which contains the commits in question. |
|
As I am afk for the next 3 months I would be very happy if someone would take over. I know @cdecker has signaled that he might do that. I think there is not much to do. the main work is done. it has basically to be wrapped with the plugin-client lib and a command line api has to be provided. I had already started but couldn't finish it. |
|
@renepickhardt I would be happy to help. I am traveling until the end of March but can definitely work on it after April 1st. So if you don't plan to finish this in the next couple of days @cdecker let me get started first and see how far I get in the first week of April maybe? Do you have the plugin/command line code that you mention somewhere to check out, @renepickhardt, or would it be better to just start over? |
|
I have the simple funds plugin in my c Lightning plugin collection github but that is like the hello World plugin... I have started to wrap the autopilot to plugin code but that is so much work in progress that it is probably saver to start over anyway. Before you read my code you would have reimplemented it. |
|
The plugin has been ported to https://github.com/lightningd/plugins (lightningd/plugins@187c66a). So I'm closing this PR in favor of keeping any eventual changes there. |
As mentioned in the c-lightning list: https://lists.ozlabs.org/pipermail/c-lightning/2018-August/000065.html you wellcome plugins which are not written in c.
So I took the liberty to provide a first sketch for an autopilot for c-lightning written an python. In the beginning of the year I have stated on lnd's autopilot that the Barabasi Albert model was probably not the best choice (c.f.: lightningnetwork/lnd#677 ) This is why my lib can currently generate candidates for an autopilot to open channels with according to four strategies (and it combines those currently):
this library is work in progress. there is no logic yet implemented on how many funds to allocate to which channel and actually opening the channels (though I think that this should be the easy part) Also due to the randomness the lib might sometimes find less channels than the user asked for. Also many conner cases will let the code break.
Therefor the PR is more for code review and suggestions.
Running the code produces this output: (edit: the version in the pull request only prints the nodes without the statistics about balance and channels)
to explain the output. I asked to lib-autopilot to generate 21 candidates. you can see in the table the nodes (by their alias for better readability) which are suggested to connect to.
The first two columns list the percentage of our funds that the lib-autopilot thinks should be used for this channel. In the first column this is given by looking at the average channel balances of the partners and look at the relative frequency (with respect to alle the other average channel balances for the suggested channels). the second column smoothes the results with a weighted uniform distribution (currently the weight is 70% for the first column and 30% for the uniform distribution)
the third colum lists how much balance all the channels of the suggested partner has. the fourth column is the number of active channels and the last one is the alias.
I would be happy about any thoughts regarding my statistics. I think smoothing is usefull.