MakeReplica in python#866
Conversation
8c94e4f to
0108a0e
Compare
d7013ba to
1c1633f
Compare
import numpy as np
from validphys.covmats import make_replica
from validphys.loader import Loader
l = Loader()
import NNPDF
for dataset in l.available_datasets:
try:
cd = l.check_commondata(dataset)
ds = l.check_dataset(dataset, theoryid=53, cuts=None)
exp = l.check_experiment('foo', [ds])
ld = exp.load()
NNPDF.RandomGenerator.InitRNG(0, 1)
ld.MakeReplica()
new_code = make_replica(cd)
np.testing.assert_allclose(ld.get_cv(), new_code)
except FileNotFoundError:
continueseems to be working |
1c1633f to
5c665b1
Compare
|
Will take a look soon! Out of interest, have you tested the difference in speed between the python and C++? |
|
Yeah I checked very briefly. It's difficult to make an apples to apples comparison because the question is do you time the In general it's fairly comparable, but it's not blisteringly quicker. Btw, I may have been premature with the review requests, maybe just wait until the next commit since I've yet to sort out the docstring |
|
Ok this is ready for review now. I was wondering, what do people think on having some sort of temporary tests we can use to ensure the python replacement replicates the old c++ code? We can then delete this test once we fully switch to python |
256f873 to
a26ba80
Compare
|
I don't think it would hurt to have something like that so I'd be for it. We have something kind of similar in here https://github.com/NNPDF/nnpdf/blob/master/validphys2/src/validphys/tests/test_covmats.py in that we currently test that both the C++ and the python work, where one of the tests will eventually be deleted. I know what you suggest is slightly different, but the point is that we already have tests that we plan on deleting |
|
More tests don't hurt especially in that they increase coverage. |
|
I like how the code is nice and well commented. It will certainly help any efforts for making it fast. |
Ah yes very good point! Forgot we implemented that test. Ok sure, I guess we can add a new test altogether that does all the comparisons between c++ and python. I'll do this in another PR so they're all kept together |
|
FWIW there is a 30 pages paper (that I have no patience to read) on how to sample from a truncated gaussian https://www4.stat.ncsu.edu/~boos/library/papers/mimeo2649_Li.pdf On a somewhat related note, the fact that this is so complicated makes me think it's not such a great idea: If we are imposing positivity in PDFs anyway, why are we also doing it in the sampling? More importantly, does it make any difference other than increasing the complexity and the surface for bugs? |
|
I've quickly glanced at that paper and it certainly looks like it could be fun to take a look at. I just worry how easy it would be to accommodate some of the edge cases we have going on. Indeed, it's what forced me to resign in trying to understand the papers and blindly translate the C++ code. I've also wondered whether having positive definite pseudodata is essential. It'd be interesting to see how many times the algorithm stumbles across a sample that fails this criterion. Even if it does, it could be that the fit doesn't really care too much. I guess the only way to answer that question would be to construct pseudodata that has negative entries and then run a fit to see. |
1f0275c to
3792b8b
Compare
36126cf to
bbbd226
Compare
|
Just a quick series of commits that makes |
|
@tgiani would you be able to take another look at this, since you have experience with the replica generation in c++? |
|
I double checked and what I was saying at the code meeting is true: we do not generate shifts for Looking at datasets which performed badly on the KS test: ATLAS ZpT rapidity and WZ rapidity @RosalynLP how exactly did you generate the old data replicas? It occurs to me that because Provided that's all fine, and that there were no errors in matching data point index between new and old replicas then one suggestion is that the purely multiplicative sets might require more replicas the get a stable result |
yes sure, I ll look at this before the next code meeting |
|
Yeah if you look at #1087 and line 43 of mc_gen_checks.py a copy of the |
|
sure, I was more wondering if all of the results presented were definitely produced using that code or equivalent |
|
@wilsonmr I used adapted versions of those in |
|
Great, just checking. The test isn't particularly simple here so when the results look weird it seems worth looking for errors on both ends 😆 I just tried generating for just ATLASWZRAP with both codes and I get similar result (expected since the ATLAS wide systematic is hardly going to be dominant here) I'm trying with 10000 reps now since it doesn't take as long as a global generation. I also am producing the histograms for each bin. On the first pass with 1000 reps the histograms looked close enough to me. Results for 10000 reps should be ready in 15 mins. EDIT: I should say I also picked 10000 because it uses a different method in auto mode of ks_2samp |
|
huh so the KS with 10000 reps was about 0.025 which seems pretty low but the p value was pretty similar to your plot. Having said that look at the distributions for each bin - I really don't think there is a problem here. So I think we can stop worrying https://vp.nnpdf.science/J2wE4X3jRuWGaBjHZXKM-A== |
|
@wilsonmr thanks yeah this looks good re bin by bin distributions - to me this check makes more intuitive sense than the p-values do |
|
Is each histogram a bin? Forgive my ignorance but I'm a bit confused about what the x axis refers to here, nor what qualifies as a count for the histogram |
|
Each histogram is for a data point which I was referring to as a bin. ATLASWZRAP11 has 34 data points so there are 34 histograms. The histograms are made up of 10000 replicas and the x axis is the raw value of that particular data point. So the first histogram there were ~550 replicas whose first data point had a value of ~580000. So in other words each histogram is an unnormalised approximation of the (marginalised*) distribution from which the pseudodata has been drawn from. *marginalised because we are looking data point by data point so each plot is as though we have integrated over all other datapoints. |
|
Also, the variance estimated from the samples which were used to produce these histograms are the variance which Rosalyn calculated. |
|
I was a bit sloppy with the heading and the x labels |
|
Ah I follow!! In which case this is a very convincing comparison!! Weird about the p-value situation though I wonder what that's about. Thanks for these histograms though! |
| 1 + mult_corr_errors * rng.normal(size=(1, mult_corr_errors.shape[1])) / 100 | ||
| ).prod(axis=1) | ||
|
|
||
| mult_shifts.append(mult_shift) |
There was a problem hiding this comment.
I don't understand the logic here. I don't see how these things are being correlated across datasets. Random nubers for thing like lumi should only be sampled once, and here it seems it is once per dataset... That logic only seems to be there for the "special" systematics.
There was a problem hiding this comment.
mult_corr_errors = mult_errors.loc[:, mult_errors.columns == "CORR"].to_numpy()
this only gets applied to systematics which have a systype row of MULT, CORR. Everything that needs to be correlated across datasets i.e lumi will have a special name which is not CORR and, as you point out, is handled below
There was a problem hiding this comment.
Ah, I see. I had missed the ~ in mult_errors.loc[:, ~mult_errors.columns.isin(INTRA_DATASET_SYS_NAME)]
There was a problem hiding this comment.
yeah, perhaps the comments could be made more obvious
There was a problem hiding this comment.
i.e
# all mult errors must be converted to from percent to fraction
# generate all uncorrelated multiplicative shifts
mult_shift = (
1 + mult_uncorr_errors * rng.normal(size=mult_uncorr_errors.shape) / 100
).prod(axis=1)
# generate shifts for systematics which are only correlated within this dataset
mult_corr_errors = mult_errors.loc[:, mult_errors.columns == "CORR"].to_numpy()
# store multiplicative shifts to apply after all additive shifts.
mult_shifts.append(mult_shift)
etc.
|
Ok, I went over this, I tested it and I think that the tests are sufficient. |
|
Good, about time we merged this! |
This is a direct translation of the C++ code for now, I will optimize it using numpy next, hopefully should be fairly straightforward.
I've tried it with
NMCso far and if you initalize the RNG with the same seed it will generate exactly the same pseudodata.Also, I've not implemented the final check on the positivity of the pseudodata