Python closure sampling (with the only python commondata branch) by scarlehoff · Pull Request #1660 · NNPDF/nnpdf

scarlehoff · 2023-01-12T09:48:54Z

As discussed yesterday during the Code Meeting, I've merged PR #1651 and #1650 so that #1650 can be merged since they are mostly orthogonal as previously mentioned. Indeed, the only difference is the export function that can be simply removed so that it uses the new ones, the relevant commit is cc868fb

I would suggest the write functions are put elsewhere so that circular imports are not generated, but I guess for that you can discuss among yourselves where to put what. With this the two branches should be without conflict.

(unless people requests changes in #1650 of course, but I'm not taking responsibility for that xd)

… (Z.K.)

…mmonDataSpec

… create systype dir

…used for the generation of level1 noise: filterseed. rngalgo and seed are not needed anymore to run a closure test

…with cuts already applied

…ables

scarlehoff

Thank you very much @comane! I'm looking forward to merging this.

I'm already using a pip-only version of NNPDF for some fits so we already almost there for a pip-only installation of the code in master (and even a validphys only installation separated from n3fit which would be very useful!)

One thing I would like to link in this PR is some reports of closure tests done with this branch (I guess you are already using it for that). I'm guessing some of the last CTs in the server are done with this branch but wouldn't want to select the wrong ones.

As a final thing, it would be great to have a comparison between a closure test done with this branch and another one done with master, we already had some trouble reproducing the CT of the 4.0 paper so I wouldn't like to add another layer of changes.

On the other hand, I don't want to inflict CT on anybody so I won't insist if you don't feel like doing it.

Ps: I cannot "approve" this PR because I opened it, but -as you can see- all my comments are quite minor, and functionality or structure wise I'm happy with it (I don't think that moving the write_ functions to a different module counts as a changing the structure).

scarlehoff · 2023-02-10T08:38:20Z

+            write_systype_to_file,
+            write_commondata_to_file,
+        )
+


I'd rather have all the write functions in another file so they don't create a circular import.

And since they are the only part of commondataparser.py that gets imported in filters.py it won't change the structure. I also think that "namewise" they do not belong into a "parser" since they are doing the opposite.

I disagree that they don't belong in the same place. The place I would look for write functions is the same where the read functios are. Besides, unless we have the write functions in coredata.py its not going to be trivial to avoid circular imports.

None of the lines added to commondataparser.py need to import anything. You can just create a commondatawriter.py and then the problem is fixed. It is trivial to avoid circular imports.

The place I would look for write functions ...

If there is a file called commondatawriter.py next to it I'm sure you will open the right file.

The reader functions need to import the data definitions they are reading into. Also I don't think there is anything wrong with circular imports in this case that justifies adding files with ten lines worth of code (not that it would avoid them, as said).

The reader functions need to import the data definitions they are reading into.

Which reader functions? I think you are thinking of a different thing?

I was answering to

None of the lines added to commondataparser.py need to import anything.

saying that the functions that take files and return data structures do need to import the data structure, and there would be a circular import if you want a convenience method to read.

Sorry, but I really cannot see which of the lines added to commondataparser.py is using anything imported form the outside. They are all self-contained functions.

There are no reads methods in the changes introduced by this Pull Request. I just want to have a commondatawriter.py with those functions.

edit: i just tried it out and the tests pass without a circular import

scarlehoff · 2023-02-10T08:38:56Z

    def export(self, path):
        """Export the data, and error types
-        Use the same format as libNNPDF:



Does this mean the format is no longer the same as libNNPDF and so they are not compatible?

scarlehoff · 2023-02-10T09:07:52Z

+    list containing commondata instances with cuts
+
+    """
+    return data.load_commondata_instance()


I'm very confused about this function, there is nothing enforcing the cuts here, right? How is this different from simply doing data.load_commondata_instance() inside make_level1_data?

Maybe there's something I've missed along the way?

Hi @scarlehoff, I think you are right. This is completely unnecessary. I changed it in the new commits

…rovider as unnecessary

comane · 2023-02-13T09:17:17Z

As a final thing, it would be great to have a comparison between a closure test done with this branch and another one done with master, we already had some trouble reproducing the CT of the 4.0 paper so I wouldn't like to add another layer of changes.

Hi @Zaharid, @scarlehoff following the link below you can find a comparison between a CT done with the master and one done with this branch. Note that the initialisation seeds for level 1 noise are not the same. This is because the current branch does not depend on filters.prepare_nnpdf_rng anymore.

https://vp.nnpdf.science/jj9pitd5Q_KJ6zB7lBGKkg==

…_rngalgo

scarlehoff · 2023-02-13T09:22:56Z

Thanks! Let me ping @RoyStegeman who is much more experienced than I am looking at CTs.

RoyStegeman · 2023-02-13T09:38:59Z

I'd have to properly this to understand it better, but all this does is replace the cpp CT sampling with the new python implementation, right? If so, I'm a bit surprised by the difference between the two fits - are the runcards exactly the same (up to interpretation of the seeds)?

@comane would you mind uploading the fits to the nnpdf server? In general it's a good habit to upload the fit if you're sharing a corresponding result, even just for possible later reference.

Zaharid · 2023-02-13T09:55:17Z

@RoyStegeman the random numbers are not the same since one is using the cpp rng while the other is using numpy.

RoyStegeman · 2023-02-13T10:01:21Z

Thanks, that's what I thought. If that's the only difference in the code, the difference between the fits seems large to me, wouldn't you agree?

I don't know, I'd probably have to compare this to an old L1 CT to say anything sensible.

Zaharid · 2023-02-13T10:19:52Z

@RoyStegeman I guess a lot of the studies were based on "multiclosure" to avoid the dependency on the L1 seed; Looking at

https://vp.nnpdf.science/P0SSEuO2RdWW753vlYG_sA==

and particularly at

https://vp.nnpdf.science/P0SSEuO2RdWW753vlYG_sA==/plot_delta_chi2_report_report.html

I can be easily convinced that the difference is just the seed. I am not sure if we have the detailed stats somewhere.

comane · 2023-02-13T10:29:41Z

@comane would you mind uploading the fits to the nnpdf server? In general it's a good habit to upload the fit if you're sharing a corresponding result, even just for possible later reference.

Hi @RoyStegeman, I just uploaded the fits and they can be found at

https://data.nnpdf.science/fits/NNPDF40_nnlo_as_01180_CT_python.tar.gz

https://data.nnpdf.science/fits/NNPDF40_nnlo_as_01180_CT_master.tar.gz

I am currently running another closure test using the branch master but with different seed. This will probably allow us to understand better how much the CT depends on the L1 seed.

RoyStegeman · 2023-02-13T10:31:35Z

Yes I realised my thinking was wrong after sending that last message, hence the edit ;). You're right that looking at MW's fits the difference could well be explained due to a different L1 seed, though his fits were also done using the NNPDF3.1 dataset. On top of that fluctuations are relatively large so I guess a single (or two) L1 CT fit is just not enough to confidently make any conclusions.

I am not sure if we have the detailed stats somewhere.

I suppose these would just be the estimators quoted in the nnpdf40 paper, or redetermined by Samuele. But they are all based on ~25 fits of 40 replicas...

RoyStegeman · 2023-02-13T10:36:02Z

@comane thanks.

Indeed, it's probably a good idea to do at least one more fit. I don't know how much computational resources you have, but since we really don't want any surprises when it comes to CT, it may be worth simply doing a full CT so at least we have a reference point in case it gets messed up in the future.

Zaharid · 2023-02-13T10:38:55Z

cc @MaeveMadiganMM

comane · 2023-02-13T17:49:52Z

Hi @Zaharid, @MaeveMadiganMM, @RoyStegeman and @scarlehoff this is a comparison between two CT's both performed with the Master branch but with different L1 seeds.

https://vp.nnpdf.science/t2Z6P2ctS5eQaSzbvUS2sg==

On

https://data.nnpdf.science/fits/13_02_23_MNC_NNPDF40_nnlo_as_01180_CT_master_new_seed.tar.gz

You can find the fit corresponding to the new seed.

scarlehoff · 2023-02-15T10:00:51Z

Are "we" happy with the closure test result?

From my side I would like these two points to be addressed:

https://github.com/NNPDF/nnpdf/pull/1660/files#r1102438850

https://github.com/NNPDF/nnpdf/pull/1660/files#r1102438226

… error

comane · 2023-02-22T08:50:31Z

In my opinion it looks like the deviation that one sees between CT(python) and CT(master) is due to stat fluctuations, meaning that if one would repeat the CT with the new branch n times then 0.68 * n of the times the CT (python) would lie within the uncertainty bands of CT(master).
The report below compares a CT(master) against 4 CT(python) each runned with a different seed.

https://vp.nnpdf.science/jrWmrlYNTPikrEfm8BNOCg==

https://data.nnpdf.science/fits/17_02_23_MNC_NNPDF40_nnlo_as_01180_CT_python_ns1.tar.gz
https://data.nnpdf.science/fits/17_02_23_MNC_NNPDF40_nnlo_as_01180_CT_python_ns2.tar.gz
https://data.nnpdf.science/fits/18_02_23_MNC_NNPDF40_nnlo_as_01180_CT_python_ns3.tar.gz

Mark Nestor Costantini and others added 30 commits December 13, 2022 14:04

added method that returns a new instance with modified central values…

a331e04

… (Z.K.)

added method to load validphys.core.CommonData from validphys.core.Co…

beca837

…mmonDataSpec

added functions to write commondata and systype files and function to…

e192f80

… create systype dir

added functions used to generate pseudo data for closure tests

7e465c5

added test for validphys.pseudodata.make_level0_data function

e4c8097

generation of level1 data done by make_replica function. Random seed …

61a9889

…used for the generation of level1 noise: filterseed. rngalgo and seed are not needed anymore to run a closure test

import logging module within commondataparser

7bf9018

test_make_level0 data updated

d9c2499

theory 162 added

c2af92b

.

19d1f63

make_level0_data test done with theoryid 162

0a7442d

added description to make_level1_data

c1e5a16

added method to load list of validphys.coredata.CommonData instances …

b2e3a2c

…with cuts already applied

list of commondata loaded with new DataGroupSpec method

ab0ca4c

method name changed

c4dc48a

name of DataGroupSpec method changed

cd75cc7

reset_index of commondata tables

33d2ce9

deleted test_filter_rebuild_closure_data.csv

9a859b5

regressions/test_filter_rebuild_closure_data.csv file updated

eb438f9

bug in sytypes file name fixed

2c1abaa

added functions to write commondata tables to files

2c7f128

import new validphys.commondataparser functions to write commondata t…

16760bf

…ables

added single_dataset

9b8f5bc

import info from conftest.py

a1f9689

unusued fakeset loaded with c++ removed

d782d7a

added functions to write commondata and systype data to buffer

0ce9470

write commondata and systype using commondataparser functions

66821be

comment using numpy doc style

6e86c5a

use assert_allclose from numpy.testing for arrays

bd6be97

use experiments_index to index level1 data in make_level1_data

a8d2055

import at top of module

3e8161e

scarlehoff commented Feb 10, 2023

View reviewed changes

comane added 3 commits February 10, 2023 16:03

simplification of logic of _filter_closure_data function

bc056bf

eliminated dependence of make_level1_data function on commondata_wc p…

1f61dfc

…rovider as unnecessary

commondata_wc provider deleted as superfluos

167cbe2

_filter_closure_data no longer depends on prepare_nnpdf_rng and check…

804ebca

…_rngalgo

scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Feb 15, 2023

comane approved these changes Feb 15, 2023

View reviewed changes

comane added 4 commits February 21, 2023 15:46

import write functions from commondatawriter to avoid circular import…

9025c95

… error

module for writing commondata and systype table to file

a4e6b39

write function in commondatawrite

d5fb7a0

.

5c2aaec

scarlehoff commented Feb 22, 2023

View reviewed changes

Comment thread validphys2/src/validphys/commondataparser.py Outdated

Comment thread validphys2/src/validphys/coredata.py Outdated

Apply suggestions from code review

0da9735

scarlehoff merged commit 97e0e13 into master Feb 22, 2023

scarlehoff deleted the python_closure_sampling_merged branch February 22, 2023 11:03

scarlehoff mentioned this pull request Mar 14, 2023

Tidy up closure test random generation #1536

Closed

Conversation

scarlehoff commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zaharid Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scarlehoff Feb 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comane commented Feb 13, 2023

Uh oh!

scarlehoff commented Feb 13, 2023

Uh oh!

RoyStegeman commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zaharid commented Feb 13, 2023

Uh oh!

RoyStegeman commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zaharid commented Feb 13, 2023

Uh oh!

comane commented Feb 13, 2023

Uh oh!

RoyStegeman commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyStegeman commented Feb 13, 2023

Uh oh!

Zaharid commented Feb 13, 2023

Uh oh!

comane commented Feb 13, 2023

Uh oh!

scarlehoff commented Feb 15, 2023

Uh oh!

comane commented Feb 22, 2023

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scarlehoff commented Jan 12, 2023 •

edited

Loading

scarlehoff left a comment •

edited

Loading

Zaharid Feb 10, 2023 •

edited

Loading

scarlehoff Feb 15, 2023 •

edited

Loading

RoyStegeman commented Feb 13, 2023 •

edited

Loading

RoyStegeman commented Feb 13, 2023 •

edited

Loading

RoyStegeman commented Feb 13, 2023 •

edited

Loading