Skip to content

[WIP] Data Keyword#515

Closed
wilsonmr wants to merge 4 commits into
masterfrom
datakeyword
Closed

[WIP] Data Keyword#515
wilsonmr wants to merge 4 commits into
masterfrom
datakeyword

Conversation

@wilsonmr
Copy link
Copy Markdown
Contributor

I will add some more description here later

I basically realised straight away that I actually don't know a good way to be able to parse both experiments and data and produce the same object - with a warning for the former and in the case that both are missing raise the correct exception that data keyword is missing

I found that this does some of the right things but having both a production rule and a parse for the same thing seems like I'm not using this in the way it was intended.

Any suggestions? Being able to process experiments is purely from a backwards compatibilty persepective

@wilsonmr wilsonmr requested a review from Zaharid July 18, 2019 17:14
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Jul 18, 2019

Do we need to have both parse_data and produce_data? Frankly until we have reportengine 1.0 I'd consider what that does when you have complicated namespaces as undefined behaviour (in that I don't really know what it does exactly).

I think that what we should do instead is to have a production rule that you need to put in front explicitly to get data from experiments.

def produce_data_from_experiments(experiments, <?>): -> namespace containing {data: <list of dataset_input>}

In your runcard you would do:

experiments: <...>

actions_:
    - data_from_experiments action_taking_data

@wilsonmr
Copy link
Copy Markdown
Contributor Author

That works nicely on the level of a single action, but what about vp-comparefits, is this just going to work on fits which have specified data in the runcards?

@wilsonmr
Copy link
Copy Markdown
Contributor Author

So to expand further we want to be able to use the fits_chi2_table on two fits one which used the data keyword and one which used the experiments keyword and I'm trying to work out if there is a way that either of those keywords could be specified in the runcard in the fit and in the end the table is produced for the data used in the fit.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Jul 19, 2019

i'd say we could patch up vp-comparefits to apply the production rule automatically. We do something similar for the theory covmat anyway.

That said, one argument against the production rule approach would be that then you cannot quietly rewrite all actions to work with data internally but have to keep versions with experiments around for compatibility. Not sure how easy would that rewrite be anyway.

The argument in favour would be that then everything is nice and explicit.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

Ok I only just read your comment and was already trying something a bit different this morning. I haven't fully finished it yet but what about this approach?

similar to use_cuts you can do a 'fromfit' rather than using the from_ specification and this should handle both experiments and data nicely for fits.

The only annoying thing is the old dataset_input had dataset instead of name - I guess to work alongside experiment? However when you have a flat list of these mappings then they have some conflict with the produce_dataset I believe, so I have tentatively changed the new dataset_input to have name which was in #356 anyway..

@wilsonmr
Copy link
Copy Markdown
Contributor Author

ok obviously this won't pass CI yet because I changed dataset_input and would need to change a lot of other things but this does the following:

In [3]: from validphys.api import API                                                                                                                                                                       

In [4]: a = API.data(data="fromfit", theoryid=163, use_cuts='nocuts', fit="190707-mw-001")                                                                                                                  
190707-mw-001 uses the deprecated `experiment` key, converting to `data`

In [5]: b = API.data(data=[{"name":"NMC"}], theoryid=163, use_cuts='nocuts')                                                                                                                                

In [6]: b                                                                                                                                                                                                   
Out[6]: ExperimentSpec(name='data', datasets=(DataSetSpec(name='NMC', commondata=CommonDataSpec(datafile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/DATA_NMC.dat'), sysfile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/systypes/SYSTYPE_NMC_DEFAULT.dat'), plotfiles=(PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTINGTYPE_DIS.yaml'), PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTING_NMC.yaml'))), fkspecs=(FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMC.dat'), cfactors=()),), thspec=TheoryIDSpec(id=163, path=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163')), cuts=None, frac=1, op='NULL', weight=1),))

In [7]: a.datasets                                                                                                                                                                                          
Out[7]: 
(DataSetSpec(name='NMCPD', commondata=CommonDataSpec(datafile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/DATA_NMCPD.dat'), sysfile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/systypes/SYSTYPE_NMCPD_DEFAULT.dat'), plotfiles=(PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTINGTYPE_DIS.yaml'), PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTING_NMCPD.yaml'))), fkspecs=(FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMCPD_D.dat'), cfactors=()), FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMCPD_P.dat'), cfactors=())), thspec=TheoryIDSpec(id=163, path=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163')), cuts=None, frac=0.5, op='RATIO', weight=1),
 DataSetSpec(name='NMC', commondata=CommonDataSpec(datafile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/DATA_NMC.dat'), sysfile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/systypes/SYSTYPE_NMC_DEFAULT.dat'), plotfiles=(PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTINGTYPE_DIS.yaml'), PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTING_NMC.yaml'))), fkspecs=(FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMC.dat'), cfactors=()),), thspec=TheoryIDSpec(id=163, path=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163')), cuts=None, frac=0.5, op='NULL', weight=1),
....

@wilsonmr
Copy link
Copy Markdown
Contributor Author

@Zaharid sorry to bother you, since you were away last week I guess you didn't look at this but I was wondering if you could have a look?

I don't feel comfortable doing any more work on it until then since I'm concerned I'm going down the wrong path with it

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Jul 29, 2019

I'm not quite sure on how I'd go about all the details, but I'd say that one thing to improve over how experiments works, is that rather than having a monstrously complicated parser that takes many inputs, we should have many simpler objects that build upon each other.

So at some level there would be data, which represents the user input pretty much unmodified (just making sure the keys look sensible and the names exist). Then there is a (possibly complicated) production rule that adds defaults over that, then something else that adds cuts, something that specifies the fktables, then something else that specifies how the covariance matrix should be built and so on. That would make these things much easier to compose in non trivial ways than they are now, including in ways we haven't anticipated.

In a sense the biggest problem is to come up with good names for all of these things.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

Ok after having a conversation in Milan, writing this here for record:

  • data is a list of dataset inputs (or similar)
  • data_plus_defaults is data with (data) defaults added
  • there will be a seperate theory defaults

the production rule which adds these together will produce a new namespace with defaults filled in and save this to a lock file #496

defaults will be added per dataset to the PLOTTING file. A new file shall be added with global defaults

parse_data looks in the runcard for the data key and then constructs the data object

parse_data_defaults takes data as input, will first look in the runcard (in the case it is a lock file) and fill in the defaults from there, if the defaults is not specified in the runcard it shall take them from the set location.

produce_data_plus_defaults might be renamed but takes the above functions and creates a new runcard object (lock file) updated with a defaults key if it didn't already exist and returns the namespace with those defaults filled in the correct places.

parse_theory_defaults gets defaults according to the specific theory like cfacs and cfac uncertainties either from runcard or set location

produce_data_plus_theory_defaults also might be renamed but finally adds theory defaults to this namespace and updates the lock file, so probably lock file now all changes to lock file have happened it can be saved - this should happen before code starts and could possibly have uses in debugging, we can check people's defaults are up to date for example.

@wilsonmr wilsonmr mentioned this pull request Feb 17, 2020
6 tasks
@wilsonmr
Copy link
Copy Markdown
Contributor Author

see #651

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants