n3fit merge-ready PR#499
Conversation
|
Ok, stupid question @Zaharid, I copied the way VP is in the cmake recipe but shouldn't the default be OFF? Naively I would think we would want the default to be not the EDIT: Indeed, the default is off, I missed the VP_DEV=off flag. |
|
(Had replied by email, but don't see it here as a comment): The reasoning was that you are supposed to use the conda package for deployment. But if you bother to install from source, then you probably want to modify the source. In any case, this discussion is making me thing that perhaps we really want one python package rather than several. That is a net win for an user of nnfit because it would automatically set up the validphys dependency, and it is ok for validphys as long as the extra dependencies don't cause too much problems. |
|
In fact how difficult would be to move everything into the existing python package? It would be a net win for the infrastructure because you wouldn't have to worry about these kinds of issues, it would be a win for users of n3fit because it is one less thing to set up, but would imply that tensorflow and keras become a dependency to everything. |
|
The only thing you need to do is to add the n3fit directory to the validphys setup.py, right? Everything else should be automatic, the nnpdf conda package is just a subset of the n3pdf package. |
|
Do you want me to do it or this harmonization is something you prefer to do yourself? |
|
It should be simple, but one never knows with setup.py files. Please have a
go.
…On Fri, 5 Jul 2019, 13:20 Juacrumar, ***@***.***> wrote:
Do you want me to do it or this harmonization is something you prefer to
do yourself?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#499?email_source=notifications&email_token=ABLJWUTKAW6URIXDJAY346DP544A3A5CNFSM4H5HKEA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZJMJ5A#issuecomment-508740852>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABLJWUVKHLDNCYQSJMC3GIDP544A3ANCNFSM4H5HKEAQ>
.
|
|
The two solutions I can see means either putting the n3fit directory together with validphys (bad idea, since we are mixing also the evolven3fit, runcards, hyperopt plotting) or having a very clunky setup.py (bad idea because it defeats the purpose). So I think it is better to have (for now) two separated packages. The conda package can still be just one. note: if there is some setuptools syntax that allows for having two separate packages in just one beautiful setup file I have failed to find it.
I was young and inexperienced... |
|
Firstly having looked through and used the code I think it's very nice and in general I find it easier to modify than the c++ code (obviously) One question I had was wrt a comment @Zaharid had made with merging this code but having a PR which could be checked in more detail - what are the practical implications of that? Will this PR be merged but then checked after or did I miss something? I wrote a really long comment before with one observation for perhaps some slightly longer term development of this code which I think would improve the readability, cleanliness and future extension I've tried to summarise it shorter here: TL;DRThis code could possibly be improved by leveraging Examplethe for starters there could actually be more control on a layer by layer basis: if any of those keys are incorrect then the fit would not run (this is less important since the parsing of the runcard in general happens before the actual fitting part happens - although could generate cleaner errors). More importantly there isn't an unresolved dictionary floating around which appears both in the then I think it could be more transparent how certain resources are built in this fitting framework and also they could be passed around in a slightly cleaner way (maybe this is biased by the fact I just know my way around |
As far as I understand it was for organizational purposes. The history tree of the previous commit became a bit of a mess after I did too many rebases.
I think it is a good idea and would be an interesting enhancement to have. In particular I really like this syntax
What worries me (and I have not much experience with validphys so take this with a grain of salt) is that I am not sure how this will play along with the hyperparameter scan. This last part is highly non-trivial to do in a nice way (for instance, having exactly the same functions in Model Trainer but making them into calls to more complicated functions somewhere else would not be nice imho). I think there are two different things here that can be eventually added to the code:
|
|
@wilsonmr I guess in my opinion this PR to be reviewed and exercised in some depth, especially with regards to the overall structure. However at this point, I can't do a line by line review in a reasonable amount of time (and I don't see a solution involving an actionable amount of effort), so we should settle for merging as soon as the big picture is settled. A few comments.
Of course in actual fact reportengine doesn't do that much, which means that there are various other ways to organize things. And it may be that some library has its own opinion on how to set up the pipeline, which is fine.
|
| This version looks at the sum instead of at the norm | ||
| """ | ||
|
|
||
| class MinMaxWeight(MinMaxNorm): |
There was a problem hiding this comment.
This looks like it could be a top level class rather than some nested method. This creates a new class every time you call it, which is not what you want. If you like, you can override __init__ to take min_value and max_value positionally.
There was a problem hiding this comment.
I'll have a look. But I think Keras might be wanting a new class every time.
There was a problem hiding this comment.
I would expect and hope not.That would be a gross antipattern.
There was a problem hiding this comment.
I am using it in a way different to what it was intended for so it would be my fault.
But you are right, looking again at how constraints work I think I can even get away with using the same instance of the constraint for all weights of the same layer...
Yes. I mean, you would give up some of the control to hyperopt or would have to implement in some way some of the hyperopt stuff in each of the report engine providers. I think it could eventually be done but I think is non trivial (or highly messy, which to me are equivalent) so I would put it as low priority for now.
I have really no opinion on this. I like the syntax @wilsonmr proposed but it would require some work making sure everything works fine. Not difficult, but not just said and done. That said,a vp provider which makes the nice syntax into the ugly dictionary will be the path of least resistance and would mean already having the structure there if in the future the vp providers need to be the ones creating the network.
Everything in Statistics.py will go away. My plan is to have another PR in the future with the animations, positivity and stopping well separated. I can create those PR before the merging of this one but I need a few days. I will be freer after next week so I can do it then.
Most of it does. There are several places where I am copying other names (for instance the function is just a wrapper to some Keras class). I'll have another go at pylint (because last time I got bored) but most of the ones left are motivated. |
| try: | ||
| opt_tuple = self.optimizers[optimizer_name] | ||
| except KeyError: | ||
| raise Exception(f"[MetaModel.compile] Optimizer not found {optimizer_name}") |
There was a problem hiding this comment.
raise Exception is generally discouraged, because you catch everything along with it. In fact leaving the original KeyError would alone would be a bit better because it gives a clearer taceback and less oportunities to catch it unintentionally.
In general python people (including core python https://www.python.org/dev/peps/pep-3151/ https://www.python.org/dev/peps/pep-0479/) have learned that fine grained exceptions are a good thing.
One option is to define something like class ModelNotFound(KeyError): pass and then raise that. Also when you raise an exception with additional information, it is a good practice to do:
try:
...
except SomeGenericException as e:
raise CustomError(f'{e}') from ebecause it gives clearer and more explicit trackbacks (instead not having from e looks like a bug in the exception handler).
That said, checking for the model existence is one of these things that valifdphys should do in early stages cc @wilsonmr
There was a problem hiding this comment.
As a funny piece of trivia related to why generic exceptions are a terrible idea, explain this piece of code
In [1]: class X:
...: @property
...: def prop(self):
...: return self.patata
...:
In [2]: hasattr(X, 'prop')
Out[2]: True
In [3]: hasattr(X(), 'prop')
Out[3]: FalseIt took me several hours to find a bug related to this once. I even proposed that python should be changed!
There was a problem hiding this comment.
That said, checking for the model existence is one of these things that valifdphys should do in early stages
I insist, this is cool but not trivial. And before even thinking about doing this there is a lot of "destroying C++" to do which have way higher priority.
I'm doing changes to the code as you guys comment, but I'll wait until the review is finished to push if that's ok. Then you can comment on those changes, that way it's a bit better organized.
There was a problem hiding this comment.
I think the custom error is clearly an easy thing which can be changed now, there are a few of these generic exceptions being raised and I think changing these is manageable now.
Of course it would be nice if the errors raised were at a stage where the runcard is being checked and resources built as I said, however I think @scarlehoff is right that things need to be done in priority order. In this case the fit will fail fairly early on (like in the first couple of minutes?) which can be improved but I think is fine as jumping off point for future iterations of n3fit.
Also I think making these changes might be easier if they were done in smaller batches - moving the building of various resources and integration with validphys tools is surely easier if done incrementally? At some point we need a skeleton fitting code which is doing something pretty sensible (which I think this is) and then have smaller pull requests to really clean it up.
ps: at a code call there was a list of priority things mentioned which I believe you all agreed on - I think that the items of this list are mentioned in the closed PR and possibly in various issues but perhaps it would be worth compiling them somewhere in priority order if they aren't already?
There was a problem hiding this comment.
I think the custom error is clearly an easy thing which can be changed now, there are a few of these generic exceptions being raised and I think changing these is manageable now.
Yes I agree, the exceptions it's something I am looking at. I was thinking on the vp integration only in the previous comment.
| # Since the history object can have more than one loss (if epochs != 1) but it is always a list | ||
| # Save them as the mean | ||
|
|
||
| total = np.mean(hobj["loss"]) / self.ndata_tr |
There was a problem hiding this comment.
I was wondering what the mean is taken over here - is it epoch? If so how many epochs, can we not just take the final value?
There was a problem hiding this comment.
It is always 1 epoch but it could potentially be more than one.
I decided on the mean because the only reason I found to increment the number of epochs was to avoid cases in which by chance we have a fake very low loss.
In practice it rarely happens though.
| try: | ||
| layer_tuple = layers[layer_name] | ||
| except KeyError: | ||
| raise Exception( |
There was a problem hiding this comment.
probably another place where we could change the exception to be a bit more helpful
|
Ok, I take the point that we should anyway sort out how the providers should look like (with the On the camp of things that can be changed now I would place changing things like |
| self.log_info = log.info | ||
| else: | ||
| self.log_info = print | ||
| self.log = log |
There was a problem hiding this comment.
Do you have a strong reason not to use a global logger object here. A typical convention is to do
import logging
log = logging.getLogget(__name__)and that allows you to control the logger from calling code in various ways without having to pass it around as a parameter. A minor annoyance is that by default log.info is disabled, but it isn't in a vp app. Another possible source of annoyance is that by default it printd to stderr but that can be justified. On the plus side vp prints a nice green text.
There was a problem hiding this comment.
To allow for more flexibility, e.g., if you run in a cluster then you can send the messages to some socket in your home computer or stuff cool like that which will never be implemented and it is not that useful anyway. So...
Do you have a strong reason
No.
Maybe I misunderstood how PEP8 works but I though classes were supposed to be capitalized (that file is a bit special in that even the class name is different from the filename though). I know the a-file-per-class thing is not the most pythonic thing, but it is something I am really fond of. I can make all files lower case but I like being able to see from an |
|
Indeed one typically doesn't do one file per class but OTOH a class with some helper functions is not that atypical (and free functions for when you don't care about the internal state such as in the |
|
Yes, modules yes. Here you have |
Yeah, those are all fine. |
|
On Tue, 9 Jul 2019, 11:24 Juacrumar, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In n3fit/src/n3fit/ModelTrainer.py
<#499 (comment)>:
> + self.pass_status = pass_status
+ self.failed_status = failed_status
+ self.debug = debug
+
+ # Initialise internal variables which define behaviour
+ self.print_summary = True
+ self.mode_hyperopt = False
+ self.model_file = None
+ self.impose_sumrule = True
+
+ # Set the log_info function
+ if log:
+ self.log_info = log.info
+ else:
+ self.log_info = print
+ self.log = log
To allow for more flexibility, e.g., if you run in a cluster then you can
send the messages to some socket in your home computer or stuff cool like
that which will never be implemented and it is not that useful anyway. So...
That is what the logging module is intended for (with varying degrees of
practical success). With the addition that it is the kind of thing that you
want to control centrally at application level rather than at library
level, so it makes sense to not pass it around.
Do you have a strong reason
…
No.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#499?email_source=notifications&email_token=ABLJWUWZLI5DAG5NDH7S2VTP6RROJA5CNFSM4H5HKEA2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB53EDWA#discussion_r301510881>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABLJWUWSSWKQVJW2P5F5DPTP6RROJANCNFSM4H5HKEAQ>
.
|
Zaharid
left a comment
There was a problem hiding this comment.
I think there is a considerable amount of complexity in ModelTrainer, and at least some of it is likely not strictly required as it can easily be fended off to calling code.
I'd aim t making it smaller, and documenting clearly the state and especially the mutable state (which should be treated as radioactive material and probably separated off in different classes).
| """ If a model_file is set the training model will try to get the weights form here """ | ||
| self.model_file = model_file | ||
|
|
||
| def set_hyperopt(self, on, keys=None): |
There was a problem hiding this comment.
The description of this could be more clear. And the way it works really. It seems to me that on and keys do entirety different things, and so it should be two methods.
| else: | ||
| self.no_validation = False | ||
|
|
||
| def set_model_file(self, model_file): |
There was a problem hiding this comment.
The idiomatic way of doing this is with @property and property.set.
There was a problem hiding this comment.
but is there an use case for mutating this after initialization?
There was a problem hiding this comment.
Yes. For doing more than one replica sequentially you don't want to reinitialize the whole thing but you might have different check-points for all of them.
(for doing more than one in parallel I still don't know how to do it so the method is a bit of a placeholder at the moment)
Agree on the comment about idiomatic(ity) though
| # and so it should affect all models | ||
| tr_model.load_weights(self.model_file) | ||
|
|
||
| if self.print_summary: |
There was a problem hiding this comment.
I would remove the print_summary option and use the appropriate log level instead.
There was a problem hiding this comment.
The summary is printed by Keras to stdout. You mean use the log level as the if condition?
| *called in this way because it accept a dictionary of hyper-parameters which defines the Neural Network | ||
| """ | ||
| def __init__( | ||
| self, exp_info, pos_info, flavinfo, nnseed, pass_status="ok", failed_status="fail", log=None, debug=False |
There was a problem hiding this comment.
I am not sure I understand what pass_status and failed_status are for or why are they user settable, but it seems to me that this class already does too many things and should use a boolean instead, leaving the formatting elsewhere.
There was a problem hiding this comment.
I think you can remove the log argument.
There was a problem hiding this comment.
Considering there is a sole use forself.debug in one function and it changes some complicated global state, I'd remove it altogether and let the calling code deal with it.
There was a problem hiding this comment.
I am not sure I understand what
pass_statusandfailed_statusare for or why are they user settable, but it seems to me that this class already does too many things and should use a boolean instead, leaving the formatting elsewhere.
Because these are necessary for hyperopt. If you implement some other hyperoptimization library your pass flag might be something else.
I think you can remove the log argument.
Ups, forgot about it.
Considering there is a sole use for
self.debugin one function and it changes some complicated global state, I'd remove it altogether and let the calling code deal with it.
What do you mean "the calling code"? I can call it something else but I need a on-off flag there to decide whether to completely clean the Keras state or not.
| ): | ||
| """ | ||
| # Arguments: | ||
| - `exp_info`: list of dictionaries containing experiments |
There was a problem hiding this comment.
This should describe all the arguments, such as nnseed.
There was a problem hiding this comment.
It does?
Or do you mean there is not enough description?
| ############################################################################ | ||
| # # Parametizable functions # | ||
| # # | ||
| # The functions defined in this block accept a 'params' dictionary which # |
There was a problem hiding this comment.
Hmm, this is the kind of think that reportengine solves nicely, allowing you to define functions in terms of the parameters you actually care about while dispatching them automatically. But fine, let's see about that later.
There was a problem hiding this comment.
Ok, this is a preference thing, but I would rather have everything that can be parametrized within the same class. The rationale behind this (and this ties in with your initial comment) is the following:
I want to create a ModelTrainer object which has all the information necessary in order to generate the Neural Network and the training, this is:
- All the experimental data and fktables
- All methods that generate pieces of the NN
- All methods dedicated to the training
I can have them in different classes, that's fine, but that won't reduce the complexity, it will just spread it out and at the end of the day the object I need to produce must contain everything that you see in ModelTrainer.py
My goal with this was reducing to the minimum the overhead from one call of hyperopt to the next (I'm sure it can still be reduced though).
Note: there are several things in this class that can be done a bit better to improve readability. But since it is tied to the statistics thing I'll do some work on that on that PR.
| ) | ||
|
|
||
|
|
||
| def compute_arclength(fitbasis_layer, verbose=False): |
There was a problem hiding this comment.
I would use log.debug instead of verbose.
| @@ -0,0 +1,44 @@ | |||
| def set_initial_state(debug=False, seed=13): | |||
There was a problem hiding this comment.
I think this file should have a docstring explaining what it does. It is not immediately obvious to me at all.
There was a problem hiding this comment.
I'll write a bit more in the docstring, but it just sets several initial states.
| return 0 | ||
|
|
||
|
|
||
| def clear_backend_state(debug = False): |
There was a problem hiding this comment.
This really should take no arguments and be used as
if debug:
clear_backend_state()
| """ | ||
| if not debug: | ||
| print("Clearing session") | ||
| from keras import backend as K |
There was a problem hiding this comment.
Is the import done internally for any particular reason? If so, please add a comment.
There was a problem hiding this comment.
Ok, there was a reason I cannot have it outside. In order to set the random seed I need to set the rn and numpy seeds before I ever import keras.
But the fact that it took me a while to remember points clearly to the necessity of a comment :P
There was a problem hiding this comment.
Turns out it is enough if I clear the session so all points regarding this file implemented.
… n3fit-ProductionVersion
| self.training["pos_multiplier"] = pos_multiplier | ||
|
|
||
| def _generate_pdf(self, params): | ||
| def _generate_pdf(self, nodes_per_layer, activation_per_layer, initializer, layer_type, dropout): |
There was a problem hiding this comment.
this is definitely a lot more readable!
There was a problem hiding this comment.
Did @Zaharid told you to write that comment? :P?
(just had a discussion with him about which version looked best)
There was a problem hiding this comment.
Haha no, I was just thinking it. I guess I accidentally proved a point? :P
|
Good. I am going to merge this. Let's make improvements (which will hopefully be many but much smaller in size) against master. |
As requested during today's meeting. This PR supersedes #461 containing all the new code from n3fit in the form of 3 very elegant commits.
Note: the documentation won't compile automatically (because obv this is not merged with doc_folders) but it should work by just adding the following line to the makefile
(writing it here mainly for my own convenience)
The how to for n3fit can still be read in the text of PR #461.
If this PR works there will be a conda package in the server called
n3pdf. If it doesn't work... then I'll be sad because the PR won't be so elegant anymore!In the next PRs...
There are several things that are added in this PR because they work but I don't think they are production-ready, this corresponds to Statistics.py and the hyperopt routines.
So after this one is merged there will be two more PR:
Statistics
For a lack of a better name. This will break what is now in the Statistics module into several different files. Probably something along the lines of: stopping, animations, positivity checks.
Hyperopt
The hyperoptimization capabilities work perfectly fine, but the analysis routines are a bit of a mess since they were written as part of the analysis and were constantly changing. Now that we have a final version a lot of refactoring is needed to have them in some nice form.
Tests
Some nice 1/2-minutes long regression test would be nice to know when things change (although not sure we would want to take a difference in a fit regression tests as a failure)