Nn layers refactor#1888
Conversation
|
@scarlehoff @RoyStegeman Does this look good to you, the plan described above I mean? And just in case it has changed since the last time I asked: do we still want to maintain the dense_per_flavour layer? |
|
If combining Even if we may never end up using it, I'm afraid we should keep supporting |
Yes. It is the burden of having published the code! but indeed
We have promised backwards compatibility but that doesn't mean that any improvement will affect the entire code. |
|
Ok no problem, working now! The issue I had with the per_flavour layer came from this old TODO about the basis_size coming from the last entry of the nodes list, while it should come form the runcard. I was overwriting Also, I don't think it makes sense to add the possibility of combining these layers with dropout, and indeed it wasn't possible before, so I just raised an error in that case. |
scarlehoff
left a comment
There was a problem hiding this comment.
lgtm, they are all suggestions for style / loops / naming
| inits = [ | ||
| initializer_generator(initializer_name, replica_seed, i_layer) | ||
| for replica_seed in replica_seeds | ||
| ] | ||
| layers = [ | ||
| base_layer_selector( | ||
| layer_type, | ||
| kernel_initializer=init, | ||
| units=nodes_out, | ||
| activation=activation, | ||
| input_shape=(nodes_in,), | ||
| **custom_args, | ||
| ) | ||
| for init in inits | ||
| ] |
There was a problem hiding this comment.
| inits = [ | |
| initializer_generator(initializer_name, replica_seed, i_layer) | |
| for replica_seed in replica_seeds | |
| ] | |
| layers = [ | |
| base_layer_selector( | |
| layer_type, | |
| kernel_initializer=init, | |
| units=nodes_out, | |
| activation=activation, | |
| input_shape=(nodes_in,), | |
| **custom_args, | |
| ) | |
| for init in inits | |
| ] | |
| for replica_seed in replica_seeds: | |
| init = initializer_generator(replica_seed, i_layer) | |
| layers = base_layer_selector( | |
| layer_type, | |
| kernel_initializer=init, | |
| units=nodes_out, | |
| activation=activation, | |
| input_shape=(nodes_in,), | |
| **custom_args, | |
| ) |
I think like this it is better for readability (I think you don't need inits later right? otherwise of course no)
I've also removed the initializer name.
There was a problem hiding this comment.
Agreed, I think I was trying to anticipate how it will look with multi dense layers, but it doesn't matter.
There was a problem hiding this comment.
Actually, your revision should have a layers = [] and a layers.append(layer), which I think makes it ugly again, how about what I have now? If you prefer the regular loop with appending, I'll change it again.
| # ... then apply them to the input to create the models | ||
| xs = [layer(x) for layer in list_of_pdf_layers[0]] | ||
| for layers in list_of_pdf_layers[1:]: | ||
| if type(layers) is list: |
There was a problem hiding this comment.
You mean on why the if statement is needed? I added a comment, it's because dropout is shared between layers. I could also remove the if statement and replace dropout_layer with [dropout_layer for _ in range(num_replicas)] or something.
|
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
|
Please don't merge this yet! |
|
Thanks, I won't merge yet! When is the next tag expected? |
|
Hopefully once #1901 is merged |
And when the papers are out... |
The ... seems to indicate that that will take a while? ;P Maybe it's worth creating a general waiting-for-next-tag branch or something, so that we can keep master as is but also not block further development? |
…nerate_dense and generate_dense_per_flavour
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
0475196 to
903c75b
Compare
This PR does two things that both should leave everything identical:
1. Pull together the 3 functions that were responsible for generating the neural network layers:
generate_dense_networkgenerate_dense_per_flavor_networkgenerate_nnOnly the last one remains, the first two had a lot of overlap.
I have also pulled in the loop over replicas out of
pdf_NN_layer_generatorintogenerate_nnThis is everything up to and includingthis commit.
This PR may be easier to follow commit by commit.
2. Reverse the order of the loops over replicas and layers
This is the actual point, currently we do: for all replicas, for all layers, create the layer.
To accomodate the upcoming multi-replica layers, where one layer contains all replicas, the order needed to change.
Status
I think there are 3 relevant choices to test for:
The dense_per_flavour layers aren't compatible with multiple replicas or with dropout (they could be, but in the first case it doesn't pass a check, and the second case just wasn't implemented), but the dropout and replicas choices should be independent, so we have:
default layer:
For each of these I have taken a simple runcard, ran it for 100 epochs, and compared the last 3 digits of the chi2 between this branch and master (well, replica_axis_first, which is ready to be merged). (Xs mean they pass this test)
TODO:
generate_dense_networkandgenerate_dense_per_flavour_networkMultiDenselayer itself is implemented.