Python closure sampling (with the only python commondata branch)#1660
Conversation
… create systype dir
…used for the generation of level1 noise: filterseed. rngalgo and seed are not needed anymore to run a closure test
…with cuts already applied
There was a problem hiding this comment.
Thank you very much @comane! I'm looking forward to merging this.
I'm already using a pip-only version of NNPDF for some fits so we already almost there for a pip-only installation of the code in master (and even a validphys only installation separated from n3fit which would be very useful!)
One thing I would like to link in this PR is some reports of closure tests done with this branch (I guess you are already using it for that). I'm guessing some of the last CTs in the server are done with this branch but wouldn't want to select the wrong ones.
As a final thing, it would be great to have a comparison between a closure test done with this branch and another one done with master, we already had some trouble reproducing the CT of the 4.0 paper so I wouldn't like to add another layer of changes.
On the other hand, I don't want to inflict CT on anybody so I won't insist if you don't feel like doing it.
Ps: I cannot "approve" this PR because I opened it, but -as you can see- all my comments are quite minor, and functionality or structure wise I'm happy with it (I don't think that moving the write_ functions to a different module counts as a changing the structure).
| write_systype_to_file, | ||
| write_commondata_to_file, | ||
| ) | ||
|
|
There was a problem hiding this comment.
I'd rather have all the write functions in another file so they don't create a circular import.
And since they are the only part of commondataparser.py that gets imported in filters.py it won't change the structure. I also think that "namewise" they do not belong into a "parser" since they are doing the opposite.
There was a problem hiding this comment.
I disagree that they don't belong in the same place. The place I would look for write functions is the same where the read functios are. Besides, unless we have the write functions in coredata.py its not going to be trivial to avoid circular imports.
There was a problem hiding this comment.
None of the lines added to commondataparser.py need to import anything. You can just create a commondatawriter.py and then the problem is fixed. It is trivial to avoid circular imports.
The place I would look for write functions ...
If there is a file called commondatawriter.py next to it I'm sure you will open the right file.
There was a problem hiding this comment.
The reader functions need to import the data definitions they are reading into. Also I don't think there is anything wrong with circular imports in this case that justifies adding files with ten lines worth of code (not that it would avoid them, as said).
There was a problem hiding this comment.
The reader functions need to import the data definitions they are reading into.
Which reader functions? I think you are thinking of a different thing?
There was a problem hiding this comment.
I was answering to
None of the lines added to commondataparser.py need to import anything.
saying that the functions that take files and return data structures do need to import the data structure, and there would be a circular import if you want a convenience method to read.
There was a problem hiding this comment.
Sorry, but I really cannot see which of the lines added to commondataparser.py is using anything imported form the outside. They are all self-contained functions.
There are no reads methods in the changes introduced by this Pull Request. I just want to have a commondatawriter.py with those functions.
edit: i just tried it out and the tests pass without a circular import
| def export(self, path): | ||
| """Export the data, and error types | ||
| Use the same format as libNNPDF: | ||
|
|
There was a problem hiding this comment.
Does this mean the format is no longer the same as libNNPDF and so they are not compatible?
| list containing commondata instances with cuts | ||
|
|
||
| """ | ||
| return data.load_commondata_instance() |
There was a problem hiding this comment.
I'm very confused about this function, there is nothing enforcing the cuts here, right? How is this different from simply doing data.load_commondata_instance() inside make_level1_data?
Maybe there's something I've missed along the way?
There was a problem hiding this comment.
Hi @scarlehoff, I think you are right. This is completely unnecessary. I changed it in the new commits
Hi @Zaharid, @scarlehoff following the link below you can find a comparison between a CT done with the master and one done with this branch. Note that the initialisation seeds for level 1 noise are not the same. This is because the current branch does not depend on |
|
Thanks! Let me ping @RoyStegeman who is much more experienced than I am looking at CTs. |
|
I'd have to properly this to understand it better, but all this does is replace the cpp CT sampling with the new python implementation, right? If so, I'm a bit surprised by the difference between the two fits - are the runcards exactly the same (up to interpretation of the seeds)? @comane would you mind uploading the fits to the nnpdf server? In general it's a good habit to upload the fit if you're sharing a corresponding result, even just for possible later reference. |
|
@RoyStegeman the random numbers are not the same since one is using the cpp rng while the other is using numpy. |
|
Thanks, that's what I thought. If that's the only difference in the code, the difference between the fits seems large to me, wouldn't you agree? I don't know, I'd probably have to compare this to an old L1 CT to say anything sensible. |
|
@RoyStegeman I guess a lot of the studies were based on "multiclosure" to avoid the dependency on the L1 seed; Looking at https://vp.nnpdf.science/P0SSEuO2RdWW753vlYG_sA== and particularly at https://vp.nnpdf.science/P0SSEuO2RdWW753vlYG_sA==/plot_delta_chi2_report_report.html I can be easily convinced that the difference is just the seed. I am not sure if we have the detailed stats somewhere. |
Hi @RoyStegeman, I just uploaded the fits and they can be found at https://data.nnpdf.science/fits/NNPDF40_nnlo_as_01180_CT_python.tar.gz https://data.nnpdf.science/fits/NNPDF40_nnlo_as_01180_CT_master.tar.gz I am currently running another closure test using the branch master but with different seed. This will probably allow us to understand better how much the CT depends on the L1 seed. |
|
Yes I realised my thinking was wrong after sending that last message, hence the edit ;). You're right that looking at MW's fits the difference could well be explained due to a different L1 seed, though his fits were also done using the NNPDF3.1 dataset. On top of that fluctuations are relatively large so I guess a single (or two) L1 CT fit is just not enough to confidently make any conclusions.
I suppose these would just be the estimators quoted in the nnpdf40 paper, or redetermined by Samuele. But they are all based on ~25 fits of 40 replicas... |
|
@comane thanks. Indeed, it's probably a good idea to do at least one more fit. I don't know how much computational resources you have, but since we really don't want any surprises when it comes to CT, it may be worth simply doing a full CT so at least we have a reference point in case it gets messed up in the future. |
|
Hi @Zaharid, @MaeveMadiganMM, @RoyStegeman and @scarlehoff this is a comparison between two CT's both performed with the Master branch but with different L1 seeds. https://vp.nnpdf.science/t2Z6P2ctS5eQaSzbvUS2sg== On https://data.nnpdf.science/fits/13_02_23_MNC_NNPDF40_nnlo_as_01180_CT_master_new_seed.tar.gz You can find the fit corresponding to the new seed. |
|
Are "we" happy with the closure test result? From my side I would like these two points to be addressed: https://github.com/NNPDF/nnpdf/pull/1660/files#r1102438850 https://github.com/NNPDF/nnpdf/pull/1660/files#r1102438226 |
|
In my opinion it looks like the deviation that one sees between CT(python) and CT(master) is due to stat fluctuations, meaning that if one would repeat the CT with the new branch n times then 0.68 * n of the times the CT (python) would lie within the uncertainty bands of CT(master). https://vp.nnpdf.science/jrWmrlYNTPikrEfm8BNOCg== https://data.nnpdf.science/fits/17_02_23_MNC_NNPDF40_nnlo_as_01180_CT_python_ns1.tar.gz |
Hi @Zaharid @comane
As discussed yesterday during the Code Meeting, I've merged PR #1651 and #1650 so that #1650 can be merged since they are mostly orthogonal as previously mentioned. Indeed, the only difference is the
exportfunction that can be simply removed so that it uses the new ones, the relevant commit is cc868fbI would suggest the write functions are put elsewhere so that circular imports are not generated, but I guess for that you can discuss among yourselves where to put what. With this the two branches should be without conflict.
(unless people requests changes in #1650 of course, but I'm not taking responsibility for that xd)