Skip to content

Python closure sampling (with the only python commondata branch)#1660

Merged
scarlehoff merged 52 commits into
masterfrom
python_closure_sampling_merged
Feb 22, 2023
Merged

Python closure sampling (with the only python commondata branch)#1660
scarlehoff merged 52 commits into
masterfrom
python_closure_sampling_merged

Conversation

@scarlehoff
Copy link
Copy Markdown
Member

@scarlehoff scarlehoff commented Jan 12, 2023

Hi @Zaharid @comane

As discussed yesterday during the Code Meeting, I've merged PR #1651 and #1650 so that #1650 can be merged since they are mostly orthogonal as previously mentioned. Indeed, the only difference is the export function that can be simply removed so that it uses the new ones, the relevant commit is cc868fb

I would suggest the write functions are put elsewhere so that circular imports are not generated, but I guess for that you can discuss among yourselves where to put what. With this the two branches should be without conflict.

(unless people requests changes in #1650 of course, but I'm not taking responsibility for that xd)

Mark Nestor Costantini and others added 30 commits December 13, 2022 14:04
…used for the generation of level1 noise: filterseed. rngalgo and seed are not needed anymore to run a closure test
Copy link
Copy Markdown
Member Author

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @comane! I'm looking forward to merging this.

I'm already using a pip-only version of NNPDF for some fits so we already almost there for a pip-only installation of the code in master (and even a validphys only installation separated from n3fit which would be very useful!)

One thing I would like to link in this PR is some reports of closure tests done with this branch (I guess you are already using it for that). I'm guessing some of the last CTs in the server are done with this branch but wouldn't want to select the wrong ones.

As a final thing, it would be great to have a comparison between a closure test done with this branch and another one done with master, we already had some trouble reproducing the CT of the 4.0 paper so I wouldn't like to add another layer of changes.

On the other hand, I don't want to inflict CT on anybody so I won't insist if you don't feel like doing it.

Ps: I cannot "approve" this PR because I opened it, but -as you can see- all my comments are quite minor, and functionality or structure wise I'm happy with it (I don't think that moving the write_ functions to a different module counts as a changing the structure).

write_systype_to_file,
write_commondata_to_file,
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have all the write functions in another file so they don't create a circular import.

And since they are the only part of commondataparser.py that gets imported in filters.py it won't change the structure. I also think that "namewise" they do not belong into a "parser" since they are doing the opposite.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that they don't belong in the same place. The place I would look for write functions is the same where the read functios are. Besides, unless we have the write functions in coredata.py its not going to be trivial to avoid circular imports.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the lines added to commondataparser.py need to import anything. You can just create a commondatawriter.py and then the problem is fixed. It is trivial to avoid circular imports.

The place I would look for write functions ...

If there is a file called commondatawriter.py next to it I'm sure you will open the right file.

Copy link
Copy Markdown
Contributor

@Zaharid Zaharid Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader functions need to import the data definitions they are reading into. Also I don't think there is anything wrong with circular imports in this case that justifies adding files with ten lines worth of code (not that it would avoid them, as said).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader functions need to import the data definitions they are reading into.

Which reader functions? I think you are thinking of a different thing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was answering to

None of the lines added to commondataparser.py need to import anything.

saying that the functions that take files and return data structures do need to import the data structure, and there would be a circular import if you want a convenience method to read.

Copy link
Copy Markdown
Member Author

@scarlehoff scarlehoff Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I really cannot see which of the lines added to commondataparser.py is using anything imported form the outside. They are all self-contained functions.

There are no reads methods in the changes introduced by this Pull Request. I just want to have a commondatawriter.py with those functions.

edit: i just tried it out and the tests pass without a circular import

def export(self, path):
"""Export the data, and error types
Use the same format as libNNPDF:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the format is no longer the same as libNNPDF and so they are not compatible?

Comment thread validphys2/src/validphys/filters.py Outdated
Comment thread validphys2/src/validphys/filters.py Outdated
Comment thread validphys2/src/validphys/filters.py Outdated
Comment thread validphys2/src/validphys/pseudodata.py Outdated
Comment thread validphys2/src/validphys/filters.py
Comment thread validphys2/src/validphys/pseudodata.py Outdated
Comment thread validphys2/src/validphys/pseudodata.py Outdated
Comment thread validphys2/src/validphys/results.py Outdated
list containing commondata instances with cuts

"""
return data.load_commondata_instance()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused about this function, there is nothing enforcing the cuts here, right? How is this different from simply doing data.load_commondata_instance() inside make_level1_data?

Maybe there's something I've missed along the way?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @scarlehoff, I think you are right. This is completely unnecessary. I changed it in the new commits

@comane
Copy link
Copy Markdown
Member

comane commented Feb 13, 2023

As a final thing, it would be great to have a comparison between a closure test done with this branch and another one done with master, we already had some trouble reproducing the CT of the 4.0 paper so I wouldn't like to add another layer of changes.

Hi @Zaharid, @scarlehoff following the link below you can find a comparison between a CT done with the master and one done with this branch. Note that the initialisation seeds for level 1 noise are not the same. This is because the current branch does not depend on filters.prepare_nnpdf_rng anymore.

https://vp.nnpdf.science/jj9pitd5Q_KJ6zB7lBGKkg==

@scarlehoff
Copy link
Copy Markdown
Member Author

Thanks! Let me ping @RoyStegeman who is much more experienced than I am looking at CTs.

@RoyStegeman
Copy link
Copy Markdown
Member

RoyStegeman commented Feb 13, 2023

I'd have to properly this to understand it better, but all this does is replace the cpp CT sampling with the new python implementation, right? If so, I'm a bit surprised by the difference between the two fits - are the runcards exactly the same (up to interpretation of the seeds)?

@comane would you mind uploading the fits to the nnpdf server? In general it's a good habit to upload the fit if you're sharing a corresponding result, even just for possible later reference.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Feb 13, 2023

@RoyStegeman the random numbers are not the same since one is using the cpp rng while the other is using numpy.

@RoyStegeman
Copy link
Copy Markdown
Member

RoyStegeman commented Feb 13, 2023

Thanks, that's what I thought. If that's the only difference in the code, the difference between the fits seems large to me, wouldn't you agree?

I don't know, I'd probably have to compare this to an old L1 CT to say anything sensible.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Feb 13, 2023

@RoyStegeman I guess a lot of the studies were based on "multiclosure" to avoid the dependency on the L1 seed; Looking at

https://vp.nnpdf.science/P0SSEuO2RdWW753vlYG_sA==

and particularly at

https://vp.nnpdf.science/P0SSEuO2RdWW753vlYG_sA==/plot_delta_chi2_report_report.html

I can be easily convinced that the difference is just the seed. I am not sure if we have the detailed stats somewhere.

@comane
Copy link
Copy Markdown
Member

comane commented Feb 13, 2023

@comane would you mind uploading the fits to the nnpdf server? In general it's a good habit to upload the fit if you're sharing a corresponding result, even just for possible later reference.

Hi @RoyStegeman, I just uploaded the fits and they can be found at

https://data.nnpdf.science/fits/NNPDF40_nnlo_as_01180_CT_python.tar.gz

https://data.nnpdf.science/fits/NNPDF40_nnlo_as_01180_CT_master.tar.gz

I am currently running another closure test using the branch master but with different seed. This will probably allow us to understand better how much the CT depends on the L1 seed.

@RoyStegeman
Copy link
Copy Markdown
Member

RoyStegeman commented Feb 13, 2023

Yes I realised my thinking was wrong after sending that last message, hence the edit ;). You're right that looking at MW's fits the difference could well be explained due to a different L1 seed, though his fits were also done using the NNPDF3.1 dataset. On top of that fluctuations are relatively large so I guess a single (or two) L1 CT fit is just not enough to confidently make any conclusions.

I am not sure if we have the detailed stats somewhere.

I suppose these would just be the estimators quoted in the nnpdf40 paper, or redetermined by Samuele. But they are all based on ~25 fits of 40 replicas...

@RoyStegeman
Copy link
Copy Markdown
Member

@comane thanks.

Indeed, it's probably a good idea to do at least one more fit. I don't know how much computational resources you have, but since we really don't want any surprises when it comes to CT, it may be worth simply doing a full CT so at least we have a reference point in case it gets messed up in the future.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Feb 13, 2023

cc @MaeveMadiganMM

@comane
Copy link
Copy Markdown
Member

comane commented Feb 13, 2023

Hi @Zaharid, @MaeveMadiganMM, @RoyStegeman and @scarlehoff this is a comparison between two CT's both performed with the Master branch but with different L1 seeds.

https://vp.nnpdf.science/t2Z6P2ctS5eQaSzbvUS2sg==

On

https://data.nnpdf.science/fits/13_02_23_MNC_NNPDF40_nnlo_as_01180_CT_master_new_seed.tar.gz

You can find the fit corresponding to the new seed.

@scarlehoff scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Feb 15, 2023
@scarlehoff
Copy link
Copy Markdown
Member Author

Are "we" happy with the closure test result?

From my side I would like these two points to be addressed:

https://github.com/NNPDF/nnpdf/pull/1660/files#r1102438850

https://github.com/NNPDF/nnpdf/pull/1660/files#r1102438226

@comane
Copy link
Copy Markdown
Member

comane commented Feb 22, 2023

In my opinion it looks like the deviation that one sees between CT(python) and CT(master) is due to stat fluctuations, meaning that if one would repeat the CT with the new branch n times then 0.68 * n of the times the CT (python) would lie within the uncertainty bands of CT(master).
The report below compares a CT(master) against 4 CT(python) each runned with a different seed.

https://vp.nnpdf.science/jrWmrlYNTPikrEfm8BNOCg==

https://data.nnpdf.science/fits/17_02_23_MNC_NNPDF40_nnlo_as_01180_CT_python_ns1.tar.gz
https://data.nnpdf.science/fits/17_02_23_MNC_NNPDF40_nnlo_as_01180_CT_python_ns2.tar.gz
https://data.nnpdf.science/fits/18_02_23_MNC_NNPDF40_nnlo_as_01180_CT_python_ns3.tar.gz

Comment thread validphys2/src/validphys/commondataparser.py Outdated
Comment thread validphys2/src/validphys/coredata.py Outdated
@scarlehoff scarlehoff merged commit 97e0e13 into master Feb 22, 2023
@scarlehoff scarlehoff deleted the python_closure_sampling_merged branch February 22, 2023 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants