[WIP] Data Keyword by wilsonmr · Pull Request #515 · NNPDF/nnpdf

wilsonmr · 2019-07-18T17:14:48Z

I will add some more description here later

I basically realised straight away that I actually don't know a good way to be able to parse both experiments and data and produce the same object - with a warning for the former and in the case that both are missing raise the correct exception that data keyword is missing

I found that this does some of the right things but having both a production rule and a parse for the same thing seems like I'm not using this in the way it was intended.

Any suggestions? Being able to process experiments is purely from a backwards compatibilty persepective

…urn same object

Zaharid · 2019-07-18T22:36:32Z

Do we need to have both parse_data and produce_data? Frankly until we have reportengine 1.0 I'd consider what that does when you have complicated namespaces as undefined behaviour (in that I don't really know what it does exactly).

I think that what we should do instead is to have a production rule that you need to put in front explicitly to get data from experiments.

def produce_data_from_experiments(experiments, <?>): -> namespace containing {data: <list of dataset_input>}

In your runcard you would do:

experiments: <...>

actions_:
    - data_from_experiments action_taking_data

wilsonmr · 2019-07-19T08:30:23Z

That works nicely on the level of a single action, but what about vp-comparefits, is this just going to work on fits which have specified data in the runcards?

wilsonmr · 2019-07-19T08:47:27Z

So to expand further we want to be able to use the fits_chi2_table on two fits one which used the data keyword and one which used the experiments keyword and I'm trying to work out if there is a way that either of those keywords could be specified in the runcard in the fit and in the end the table is produced for the data used in the fit.

Zaharid · 2019-07-19T10:39:33Z

i'd say we could patch up vp-comparefits to apply the production rule automatically. We do something similar for the theory covmat anyway.

That said, one argument against the production rule approach would be that then you cannot quietly rewrite all actions to work with data internally but have to keep versions with experiments around for compatibility. Not sure how easy would that rewrite be anyway.

The argument in favour would be that then everything is nice and explicit.

wilsonmr · 2019-07-19T11:02:15Z

Ok I only just read your comment and was already trying something a bit different this morning. I haven't fully finished it yet but what about this approach?

similar to use_cuts you can do a 'fromfit' rather than using the from_ specification and this should handle both experiments and data nicely for fits.

The only annoying thing is the old dataset_input had dataset instead of name - I guess to work alongside experiment? However when you have a flat list of these mappings then they have some conflict with the produce_dataset I believe, so I have tentatively changed the new dataset_input to have name which was in #356 anyway..

wilsonmr · 2019-07-19T13:13:04Z

ok obviously this won't pass CI yet because I changed dataset_input and would need to change a lot of other things but this does the following:

In [3]: from validphys.api import API                                                                                                                                                                       

In [4]: a = API.data(data="fromfit", theoryid=163, use_cuts='nocuts', fit="190707-mw-001")                                                                                                                  
190707-mw-001 uses the deprecated `experiment` key, converting to `data`

In [5]: b = API.data(data=[{"name":"NMC"}], theoryid=163, use_cuts='nocuts')                                                                                                                                

In [6]: b                                                                                                                                                                                                   
Out[6]: ExperimentSpec(name='data', datasets=(DataSetSpec(name='NMC', commondata=CommonDataSpec(datafile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/DATA_NMC.dat'), sysfile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/systypes/SYSTYPE_NMC_DEFAULT.dat'), plotfiles=(PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTINGTYPE_DIS.yaml'), PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTING_NMC.yaml'))), fkspecs=(FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMC.dat'), cfactors=()),), thspec=TheoryIDSpec(id=163, path=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163')), cuts=None, frac=1, op='NULL', weight=1),))

In [7]: a.datasets                                                                                                                                                                                          
Out[7]: 
(DataSetSpec(name='NMCPD', commondata=CommonDataSpec(datafile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/DATA_NMCPD.dat'), sysfile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/systypes/SYSTYPE_NMCPD_DEFAULT.dat'), plotfiles=(PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTINGTYPE_DIS.yaml'), PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTING_NMCPD.yaml'))), fkspecs=(FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMCPD_D.dat'), cfactors=()), FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMCPD_P.dat'), cfactors=())), thspec=TheoryIDSpec(id=163, path=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163')), cuts=None, frac=0.5, op='RATIO', weight=1),
 DataSetSpec(name='NMC', commondata=CommonDataSpec(datafile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/DATA_NMC.dat'), sysfile=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/systypes/SYSTYPE_NMC_DEFAULT.dat'), plotfiles=(PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTINGTYPE_DIS.yaml'), PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/commondata/PLOTTING_NMC.yaml'))), fkspecs=(FKTableSpec(fkpath=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163/fastkernel/FK_NMC.dat'), cfactors=()),), thspec=TheoryIDSpec(id=163, path=PosixPath('/Users/michael/conda/envs/n3fit/share/NNPDF/data/theory_163')), cuts=None, frac=0.5, op='NULL', weight=1),
....

wilsonmr · 2019-07-29T14:42:11Z

@Zaharid sorry to bother you, since you were away last week I guess you didn't look at this but I was wondering if you could have a look?

I don't feel comfortable doing any more work on it until then since I'm concerned I'm going down the wrong path with it

Zaharid · 2019-07-29T16:55:27Z

I'm not quite sure on how I'd go about all the details, but I'd say that one thing to improve over how experiments works, is that rather than having a monstrously complicated parser that takes many inputs, we should have many simpler objects that build upon each other.

So at some level there would be data, which represents the user input pretty much unmodified (just making sure the keys look sensible and the names exist). Then there is a (possibly complicated) production rule that adds defaults over that, then something else that adds cuts, something that specifies the fktables, then something else that specifies how the covariance matrix should be built and so on. That would make these things much easier to compose in non trivial ways than they are now, including in ways we haven't anticipated.

In a sense the biggest problem is to come up with good names for all of these things.

wilsonmr · 2019-08-27T13:20:37Z

Ok after having a conversation in Milan, writing this here for record:

data is a list of dataset inputs (or similar)
data_plus_defaults is data with (data) defaults added
there will be a seperate theory defaults

the production rule which adds these together will produce a new namespace with defaults filled in and save this to a lock file #496

defaults will be added per dataset to the PLOTTING file. A new file shall be added with global defaults

parse_data looks in the runcard for the data key and then constructs the data object

parse_data_defaults takes data as input, will first look in the runcard (in the case it is a lock file) and fill in the defaults from there, if the defaults is not specified in the runcard it shall take them from the set location.

produce_data_plus_defaults might be renamed but takes the above functions and creates a new runcard object (lock file) updated with a defaults key if it didn't already exist and returns the namespace with those defaults filled in the correct places.

parse_theory_defaults gets defaults according to the specific theory like cfacs and cfac uncertainties either from runcard or set location

produce_data_plus_theory_defaults also might be renamed but finally adds theory defaults to this namespace and updates the lock file, so probably lock file now all changes to lock file have happened it can be saved - this should happen before code starts and could possibly have uses in debugging, we can check people's defaults are up to date for example.

wilsonmr · 2020-02-17T15:27:58Z

see #651

first attempt at being able to take both data and experiments and ret…

5eb2783

…urn same object

wilsonmr requested a review from Zaharid July 18, 2019 17:14

different approach to parsing data with backwards compatibility

423a132

wilsonmr added 2 commits July 19, 2019 13:58

fixed check_fit_data

bccee06

don`t mutate datasets in fit.as_input()

f54bc47

wilsonmr mentioned this pull request Feb 17, 2020

Adding data keyword to validphys #651

Merged

6 tasks

wilsonmr closed this Feb 17, 2020

Zaharid mentioned this pull request Mar 27, 2020

Add a distribution plot for total chi² #687

Closed

scarrazza mentioned this pull request Mar 31, 2020

[WIP] Experiment API #476

Closed

wilsonmr mentioned this pull request Apr 28, 2020

Adding generator to yield dataframe of pseudodata #746

Merged

RosalynLP mentioned this pull request Apr 29, 2020

Produce scale variation dataspecs #750

Closed

Zaharid mentioned this pull request Jun 4, 2020

FKTables should come with a version #796

Closed

scarrazza deleted the datakeyword branch August 27, 2021 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Data Keyword#515

[WIP] Data Keyword#515
wilsonmr wants to merge 4 commits into
masterfrom
datakeyword

wilsonmr commented Jul 18, 2019

Uh oh!

Zaharid commented Jul 18, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

Zaharid commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 29, 2019

Uh oh!

Zaharid commented Jul 29, 2019

Uh oh!

wilsonmr commented Aug 27, 2019

Uh oh!

wilsonmr commented Feb 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wilsonmr commented Jul 18, 2019

Uh oh!

Zaharid commented Jul 18, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

Zaharid commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 19, 2019

Uh oh!

wilsonmr commented Jul 29, 2019

Uh oh!

Zaharid commented Jul 29, 2019

Uh oh!

wilsonmr commented Aug 27, 2019

Uh oh!

wilsonmr commented Feb 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants