[Speculative] Modifications used for reweighting by nhartland · Pull Request #120 · NNPDF/nnpdf

nhartland · 2018-03-06T18:02:30Z

A bit of a speculative P.R this one, it contains the modifications I made to quickly get a reweighting exercise up and running using external predictions.

There are three modifications

The addition of an 'empty' ThPredictions which can be filled manually.

ThPredictions::ThPredictions(std::string pdfname, std::string setname, int nPDF, int nDat, PDFSet::erType erty):

The extraction of the covariance matrix generation from DataSet into a separate function

  matrix<double> ComputeCovMat(CommonData const& cd, std::vector<double> const& t0)

Changing the computation of N_{effective exponents} in the vp2 reweighting code:

-    return np.exp(np.nansum(w*np.log(N/w))/N)
+    return np.exp(np.nansum(w*np.log(N)-w*np.log(w))/N)

Which handles w=0 a little better.

Point (2) being the more controversial one, it being a kind of half-assed solution to #21.
This is probably not in any state to be merged right now, but might form a basis to actually getting a better solution to #21. The important thing being that here DataSet and Experiment have their covariance matrices being handled rather asymmetrically.

(Also, apologies for borking up the rebase, now there are a bunch of superfluous commits, this'll have to be a squash merge if it ever does get merged).

…ration of sqrt matrix

Zaharid

In general I am strongly in favour of this change. Most of the problems are for existing code, but would be good to improve them (especially the very inefficient covmat building).

The interface for ComputeCovMat is not ideal in that it doesn't enforce at compile time that the t0 predictions are actually compatible.
Can we take a t0pdf and compute the predictions in place?

Zaharid · 2018-03-06T18:11:40Z

 namespace NNPDF{
+  matrix<double> ComputeCovMat(CommonData const& cd, std::vector<double> const& t0);
+  matrix<double> ComputeSqrtMat(matrix<double> const& inmatrix);
+  void ComputeChi2_basic(int const& nDat, int const& nMem,


Please do int nDat instead of int const & nDat and similar for the others. There is no point in taking basic types by reference.

Should this be public? Seems rather cumbersome.

So for the case I was using it for, sortof yes. The problem with reweighting is that mostly you don't have FK tables and so can't build a DataSet. Making ComputeChi_basic public (and indeed, separating it from ComputeChi2 at all) was so that I can call it without a DataSet.

The next steps would be figuring out a less tightly-coupled version of DataSet I guess.

Zaharid · 2018-03-06T18:14:16Z

+    const int ndat = cd.GetNData();
+    const int nsys = cd.GetNSys();
+    if (ndat <= 0)
+      throw LengthError("ComputeCovMat","invalid number of datapoints!");


What is the use case for this check? Shouldn't this be taken care of somewhere else?

uhh, ok, sure

Zaharid · 2018-03-06T18:17:06Z

+              throw RuntimeException("ComputeCovMat", "Inconsistent naming of systematics");
+          if (isys.name == "SKIP")
+              continue;
+          const bool is_correlated = ( isys.name != "UNCORR" && isys.name !="THEORYUNCORR");


Can we somehow avoid these checks being run ndata**2*nsys times? We have plots showing that it is a huge bottleneck for anything relying on these computations. Also it really looks that we can only look at the systematics of the first point, and do away with the i==j check?

Ah, that feels like a different battle to me, the battle for #25

Zaharid · 2018-03-06T18:18:36Z

-  template<class T>
-  void ComputeChi2(const T* set, int const& nMem, real *const& theory, real *chi2)
+    auto CovMat = NNPDF::matrix<double>(ndat, ndat);
+    for (int i = 0; i < ndat; i++)


Sorry for the stylistic complaint (let's use clang-format!) but I really feel strongly that the outer for loop needs curly braces.

Zaharid · 2018-03-06T18:28:59Z

+      throw LengthError("CholeskyDecomposition","attempting a decomposition of an empty matrix!");
+    matrix<double> sqrtmat(n,n);
+
+    gsl_matrix* mat = gsl_matrix_calloc(n, n);


Maybe we can use gsl_matrix_view on inmatrix.data()?

Oh nice, I didn't know about gsl_matrix_view. Could help with the CMA minimiser too.

Zaharid · 2018-03-06T19:56:58Z

Does the regression test pass with this change?

nhartland · 2018-03-06T23:54:45Z

The answer to

The interface for ComputeCovMat is not ideal in that it doesn't enforce at compile time that the t0 predictions are actually compatible.
Can we take a t0pdf and compute the predictions in place?

Is the same as the answer Re: ComputeChi2_basic. I need to compute it without a FK table.

nhartland · 2018-03-07T00:26:37Z

Does the regression test pass with this change?

What regression test?

Zaharid · 2018-03-07T09:44:56Z

https://github.com/NNPDF/nnpdf/blob/master/libnnpdf/tests/vp/test_regressions.py

Run with pytest in that directory. #113 is to make it execute automatically.

nhartland · 2018-03-07T14:28:03Z

Alright, I've changed over to using the matrix views (they are great) and with that your regression tests appear to work fine.

`

nhartland · 2018-03-07T14:34:48Z

Ideally now I'd like to get Experiment to also use these functions for covariance matrix generation. Because the asymmetry there is gross. The most natural way I can see (which is something I wanted to do for a while) is to implement a CommonData constructor which 'merges' CommonData. Something like

CommonData(vector<CommonData const&>& subsets);

Which would handle the management of named systypes etc. Experiment while it still lives would then have most of it's data attributes replaced, and it would inherit from CommonData identically as for DataSet (while that still lives also).

nhartland · 2018-03-07T14:36:25Z

This would naturally also be half-way to an Experiment and Dataset-free future.

Zaharid · 2018-03-07T18:03:22Z

I think Experiment shouldn't have a covariance matrix that it owns at all, for all the reasons discussed in the other issues. But that is far from a proposal.

Zaharid · 2018-03-07T18:11:37Z

+    return sqrtmat;
+  }
+
+  // TODO to sort this out, need to make data and theory vectors


I'm actually starting to think that vectors everywhere is not such a good idea (though we could have it as a higher level interface), especially in view of wanting to use something like this:

https://arrow.apache.org/docs/python/plasma.html

I don't know what that is, nor why we would want to use it.

What would you use other than vectors? Stick to plain old pointers?

I think both validphys and nnfit would benefit a lot from sharing memory between processes (like fktables). I think fktables are big enough that you will not see the performance difference in masking the train/valid split as opposed to actually slicing the tables. For vp the cost of initializing the same pdfs and fktables in several processes often offset the advantages of the parallel mode, and so it would be good if these things were loaded in shared memory once. That thing seems like a convenient way to do just that, but then you must control the allocator, which is a pain to do with the std containers.

nhartland · 2018-03-08T12:03:00Z

I was going to wait to merge this until I had chance to sort out the asymmetry with Experiment. But I'm not sure how long that will take really. Do you think this is worth merging now? I.e could you do with the (potentially marginal) covmat generation speed improvement?

Zaharid · 2018-03-08T12:15:13Z

I think it would be good if the validphys actions used this and avoided copying the experiments like crazy.

nhartland · 2018-03-08T12:18:45Z

I'll take that as a yes

Atlas wjets 8 tev

nhartland added 16 commits February 14, 2018 17:52

Separated dataset covariance matrix generation, new function for gene…

36cf2b8

…ration of sqrt matrix

Add a basic form of the chi2 calculation without using any special types

0e891d0

Better handling of very small weights in N_eff:=

bf087a0

Empty ThPredictions constructor

4d68741

Separated dataset covariance matrix generation, new function for gene…

fd393b8

…ration of sqrt matrix

Add a basic form of the chi2 calculation without using any special types

0ad5fd9

Better handling of very small weights in N_eff:=

51eb137

Empty ThPredictions constructor

65e5401

Merge branch 'covmat_mods' of github.com:NNPDF/nnpdf into covmat_mods

7e58eb5

Separated dataset covariance matrix generation, new function for gene…

1a3c2c5

…ration of sqrt matrix

Add a basic form of the chi2 calculation without using any special types

e5b03c9

Better handling of very small weights in N_eff:=

2662a27

Empty ThPredictions constructor

4fa5732

Separated dataset covariance matrix generation, new function for gene…

2bcd6d3

…ration of sqrt matrix

Add a basic form of the chi2 calculation without using any special types

35fe396

Merge branch 'covmat_mods' of github.com:NNPDF/nnpdf into covmat_mods

c69fd87

nhartland requested review from Zaharid and scarrazza March 6, 2018 18:02

Zaharid requested changes Mar 6, 2018

View reviewed changes

Cosmetics

241956a

gsl matrix view version of SqrtCovMat

c483bad

Zaharid approved these changes Mar 7, 2018

View reviewed changes

Zaharid reviewed Mar 7, 2018

View reviewed changes

nhartland merged commit 1243690 into master Mar 8, 2018

nhartland deleted the covmat_mods branch March 8, 2018 12:19

voisey pushed a commit that referenced this pull request Aug 26, 2020

Merge pull request #120 from NNPDF/ATLAS_WJETS_8TEV

e16986b

Atlas wjets 8 tev

Conversation

nhartland commented Mar 6, 2018

Uh oh!

Zaharid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zaharid commented Mar 6, 2018

Uh oh!

nhartland commented Mar 6, 2018

Uh oh!

nhartland commented Mar 7, 2018

Uh oh!

Zaharid commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nhartland commented Mar 7, 2018

Uh oh!

nhartland commented Mar 7, 2018

Uh oh!

nhartland commented Mar 7, 2018

Uh oh!

Zaharid commented Mar 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nhartland commented Mar 8, 2018

Uh oh!

Zaharid commented Mar 8, 2018

Uh oh!

nhartland commented Mar 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Zaharid commented Mar 7, 2018 •

edited

Loading