[WIP]: Python commondata parser by RosalynLP · Pull Request #769 · NNPDF/nnpdf

RosalynLP · 2020-05-08T14:17:48Z

Addresses the first part of #379. In a similar vein to #404 we want to load in the commondata from file and parse it, converting to central values, systematics etc. This is to be done using pandas.

…npdf into python-commondata-parser

RosalynLP · 2020-05-20T13:52:25Z

So far I've implemented most of this, i.e.

Created CommonData class
parse_commondata parses a file to a CommonData
load_commondata takes a CommonDataSpec and produces a CommonData using parse_commondata
made test function in test_commondataparser.py

Still to do:

Populate the CommonDataInfo object I created - do we actually need this though? I am not sure if it's that useful
Error messages, e.g. raise a BadCommonDataError if parsing can't be done
Similarly parse the systypes files
Allow for >3 types of kinematics

siranipour · 2020-06-08T11:51:13Z

There is a problem at the moment that the commondata files start indexing from 1, while cuts objects start indexing from 0. This makes things a bit of a mess, so what's the verdict on indexing the dataframes from 0?

RosalynLP · 2020-06-08T13:09:04Z

There is a problem at the moment that the commondata files start indexing from 1, while cuts objects start indexing from 0. This makes things a bit of a mess, so what's the verdict on indexing the dataframes from 0?

Although I like the idea of indexing from 0 I worry that if someone were to identify point 224 for example, and then wanted to find that entry in the commondata file for whatever reason we'd want those two to match up. I'd think shifting the cuts would be best in this scenario.

siranipour · 2020-06-08T13:27:32Z

Yeah I worry both will probably be a bit of a mess, but your method less so. I'll do that in the next commit!

RosalynLP · 2020-06-08T13:30:27Z

Thanks @siranipour

Zaharid · 2020-06-08T14:09:07Z

+        Note if the first data point passes cuts, the first entry
+        of ``cuts`` should be ``0``.


This is rather unclear. It reads to me like it should be a list starting with zero.

That is what we mean right? If datapoint 1 passed cuts then cuts = [0, ...

We must do this as cuts indexing starts at 0 while commondata indexing starts at 1.

siranipour · 2020-06-17T09:35:01Z

What else is left for this PR?

Rosalyn added 4 commits May 5, 2020 10:49

copying code from PR476 for data loading

4fbe489

altering load to use new function

d9b632b

Changing dataset -> name in config

6c0351c

Importing pandas

a840f2c

RosalynLP added the destroyingc++ label May 8, 2020

RosalynLP self-assigned this May 8, 2020

Zaharid requested a review from scarrazza May 11, 2020 10:48

RosalynLP marked this pull request as draft May 13, 2020 10:19

Rosalyn and others added 15 commits May 13, 2020 12:03

data container for commondata

2833664

moving coredata.py to correct loc

08ba1d4

adding commondata parser script

e31a47d

move load_data to commondataparser

971c535

remove pandas import from core

e99bcb3

adding CommonDataInfo class

3c702fb

reverting back to old behvaiour in core

3aa32e1

populating CommonData object

dd90cae

removing space in core

01ff097

separating structure into parse_commondata and load_commondata

7b38cba

changing structure of CommonData object

0756b3b

Update core.py

2dac35c

searhing for setname in file name

64a9645

Merge branch 'python-commondata-parser' of https://github.com/NNPDF/n…

a1d4ad5

…npdf into python-commondata-parser

adding test for commondata parser

e457a6c

siranipour self-requested a review May 18, 2020 08:16

RosalynLP and others added 5 commits May 20, 2020 16:24

changing name to dataset in config

fa32bd7

reverting to old config behaviour

1ac7492

adding class for SystypeData

26710ee

adding systypeinfo object

eda4a5d

parse systype files as well

b3ea44e

RosalynLP and others added 3 commits June 6, 2020 13:43

changing comments

0e2bf01

Removing blank lines

a92b92e

loading empty systematics as empty dataframe

c11746e

siranipour force-pushed the python-commondata-parser branch from d774074 to 3a08e67 Compare June 6, 2020 12:43

RosalynLP added 5 commits June 6, 2020 13:43

comment explaining try/except for sys load

5495638

remove dropna line

a3bd404

test for ds with no systematics

78e2010

separating parsing of systype files

84ed62e

checking table info against metadata from peek_commondata_metadata

c89c33e

siranipour force-pushed the python-commondata-parser branch 3 times, most recently from dba0f95 to 1cc8329 Compare June 8, 2020 10:36

siranipour force-pushed the python-commondata-parser branch from 61f737f to 2325c4c Compare June 8, 2020 13:31

Zaharid reviewed Jun 8, 2020

View reviewed changes

Comment thread validphys2/src/validphys/coredata.py Outdated

siranipour added 4 commits June 8, 2020 15:20

Adding a with_cuts method

c5a39ec

Adding tests for with_cuts method

e79bd2d

Correcting consistency check

ad1ab6b

Incrementing cuts by 1

11c4b5d

We must do this as cuts indexing starts at 0 while commondata indexing starts at 1.

siranipour force-pushed the python-commondata-parser branch from 2325c4c to 11c4b5d Compare June 8, 2020 14:21

scarrazza merged commit 25056e9 into master Jun 24, 2020

scarrazza deleted the python-commondata-parser branch June 24, 2020 14:55

siranipour mentioned this pull request Jun 25, 2020

Python covariance matrix #813

Merged

voisey mentioned this pull request Jul 15, 2020

Python parser for commondata #378

Closed

siranipour mentioned this pull request Oct 2, 2020

Strategy for destroying C++ #952

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]: Python commondata parser#769

[WIP]: Python commondata parser#769
scarrazza merged 61 commits into
masterfrom
python-commondata-parser

RosalynLP commented May 8, 2020

Uh oh!

RosalynLP commented May 20, 2020 •

edited

Loading

Uh oh!

siranipour commented Jun 8, 2020

Uh oh!

RosalynLP commented Jun 8, 2020

Uh oh!

siranipour commented Jun 8, 2020

Uh oh!

RosalynLP commented Jun 8, 2020

Uh oh!

Zaharid Jun 8, 2020

Uh oh!

siranipour Jun 8, 2020

Uh oh!

Uh oh!

siranipour commented Jun 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		Note if the first data point passes cuts, the first entry
		of ``cuts`` should be ``0``.

Conversation

RosalynLP commented May 8, 2020

Uh oh!

RosalynLP commented May 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siranipour commented Jun 8, 2020

Uh oh!

RosalynLP commented Jun 8, 2020

Uh oh!

siranipour commented Jun 8, 2020

Uh oh!

RosalynLP commented Jun 8, 2020

Uh oh!

Zaharid Jun 8, 2020

Choose a reason for hiding this comment

Uh oh!

siranipour Jun 8, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

siranipour commented Jun 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

RosalynLP commented May 20, 2020 •

edited

Loading