Skip to content

[Speculative] Modifications used for reweighting#120

Merged
nhartland merged 18 commits into
masterfrom
covmat_mods
Mar 8, 2018
Merged

[Speculative] Modifications used for reweighting#120
nhartland merged 18 commits into
masterfrom
covmat_mods

Conversation

@nhartland
Copy link
Copy Markdown
Member

A bit of a speculative P.R this one, it contains the modifications I made to quickly get a reweighting exercise up and running using external predictions.

There are three modifications

  1. The addition of an 'empty' ThPredictions which can be filled manually.
ThPredictions::ThPredictions(std::string pdfname, std::string setname, int nPDF, int nDat, PDFSet::erType erty): 
  1. The extraction of the covariance matrix generation from DataSet into a separate function
  matrix<double> ComputeCovMat(CommonData const& cd, std::vector<double> const& t0)  
  1. Changing the computation of N_{effective exponents} in the vp2 reweighting code:
-    return np.exp(np.nansum(w*np.log(N/w))/N)
+    return np.exp(np.nansum(w*np.log(N)-w*np.log(w))/N)

Which handles w=0 a little better.

Point (2) being the more controversial one, it being a kind of half-assed solution to #21.
This is probably not in any state to be merged right now, but might form a basis to actually getting a better solution to #21. The important thing being that here DataSet and Experiment have their covariance matrices being handled rather asymmetrically.

(Also, apologies for borking up the rebase, now there are a bunch of superfluous commits, this'll have to be a squash merge if it ever does get merged).

@nhartland nhartland requested review from Zaharid and scarrazza March 6, 2018 18:02
Copy link
Copy Markdown
Contributor

@Zaharid Zaharid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I am strongly in favour of this change. Most of the problems are for existing code, but would be good to improve them (especially the very inefficient covmat building).

The interface for ComputeCovMat is not ideal in that it doesn't enforce at compile time that the t0 predictions are actually compatible.
Can we take a t0pdf and compute the predictions in place?

Comment thread libnnpdf/src/NNPDF/chisquared.h Outdated
namespace NNPDF{
matrix<double> ComputeCovMat(CommonData const& cd, std::vector<double> const& t0);
matrix<double> ComputeSqrtMat(matrix<double> const& inmatrix);
void ComputeChi2_basic(int const& nDat, int const& nMem,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do int nDat instead of int const & nDat and similar for the others. There is no point in taking basic types by reference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public? Seems rather cumbersome.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the case I was using it for, sortof yes. The problem with reweighting is that mostly you don't have FK tables and so can't build a DataSet. Making ComputeChi_basic public (and indeed, separating it from ComputeChi2 at all) was so that I can call it without a DataSet.

The next steps would be figuring out a less tightly-coupled version of DataSet I guess.

Comment thread libnnpdf/src/chisquared.cc Outdated
const int ndat = cd.GetNData();
const int nsys = cd.GetNSys();
if (ndat <= 0)
throw LengthError("ComputeCovMat","invalid number of datapoints!");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this check? Shouldn't this be taken care of somewhere else?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhh, ok, sure

throw RuntimeException("ComputeCovMat", "Inconsistent naming of systematics");
if (isys.name == "SKIP")
continue;
const bool is_correlated = ( isys.name != "UNCORR" && isys.name !="THEORYUNCORR");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow avoid these checks being run ndata**2*nsys times? We have plots showing that it is a huge bottleneck for anything relying on these computations. Also it really looks that we can only look at the systematics of the first point, and do away with the i==j check?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that feels like a different battle to me, the battle for #25

template<class T>
void ComputeChi2(const T* set, int const& nMem, real *const& theory, real *chi2)
auto CovMat = NNPDF::matrix<double>(ndat, ndat);
for (int i = 0; i < ndat; i++)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the stylistic complaint (let's use clang-format!) but I really feel strongly that the outer for loop needs curly braces.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment thread libnnpdf/src/chisquared.cc Outdated
throw LengthError("CholeskyDecomposition","attempting a decomposition of an empty matrix!");
matrix<double> sqrtmat(n,n);

gsl_matrix* mat = gsl_matrix_calloc(n, n);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use gsl_matrix_view on inmatrix.data()?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, I didn't know about gsl_matrix_view. Could help with the CMA minimiser too.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Mar 6, 2018

Does the regression test pass with this change?

@nhartland
Copy link
Copy Markdown
Member Author

The answer to

The interface for ComputeCovMat is not ideal in that it doesn't enforce at compile time that the t0 predictions are actually compatible.
Can we take a t0pdf and compute the predictions in place?

Is the same as the answer Re: ComputeChi2_basic. I need to compute it without a FK table.

@nhartland
Copy link
Copy Markdown
Member Author

Does the regression test pass with this change?

What regression test?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Mar 7, 2018

https://github.com/NNPDF/nnpdf/blob/master/libnnpdf/tests/vp/test_regressions.py

Run with pytest in that directory. #113 is to make it execute automatically.

@nhartland
Copy link
Copy Markdown
Member Author

Alright, I've changed over to using the matrix views (they are great) and with that your regression tests appear to work fine.

`

@nhartland
Copy link
Copy Markdown
Member Author

Ideally now I'd like to get Experiment to also use these functions for covariance matrix generation. Because the asymmetry there is gross. The most natural way I can see (which is something I wanted to do for a while) is to implement a CommonData constructor which 'merges' CommonData. Something like

CommonData(vector<CommonData const&>& subsets);

Which would handle the management of named systypes etc. Experiment while it still lives would then have most of it's data attributes replaced, and it would inherit from CommonData identically as for DataSet (while that still lives also).

@nhartland
Copy link
Copy Markdown
Member Author

This would naturally also be half-way to an Experiment and Dataset-free future.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Mar 7, 2018

I think Experiment shouldn't have a covariance matrix that it owns at all, for all the reasons discussed in the other issues. But that is far from a proposal.

return sqrtmat;
}

// TODO to sort this out, need to make data and theory vectors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually starting to think that vectors everywhere is not such a good idea (though we could have it as a higher level interface), especially in view of wanting to use something like this:

https://arrow.apache.org/docs/python/plasma.html

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what that is, nor why we would want to use it.

What would you use other than vectors? Stick to plain old pointers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both validphys and nnfit would benefit a lot from sharing memory between processes (like fktables). I think fktables are big enough that you will not see the performance difference in masking the train/valid split as opposed to actually slicing the tables. For vp the cost of initializing the same pdfs and fktables in several processes often offset the advantages of the parallel mode, and so it would be good if these things were loaded in shared memory once. That thing seems like a convenient way to do just that, but then you must control the allocator, which is a pain to do with the std containers.

@nhartland
Copy link
Copy Markdown
Member Author

I was going to wait to merge this until I had chance to sort out the asymmetry with Experiment. But I'm not sure how long that will take really. Do you think this is worth merging now? I.e could you do with the (potentially marginal) covmat generation speed improvement?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Mar 8, 2018

I think it would be good if the validphys actions used this and avoided copying the experiments like crazy.

@nhartland
Copy link
Copy Markdown
Member Author

I'll take that as a yes

@nhartland nhartland merged commit 1243690 into master Mar 8, 2018
@nhartland nhartland deleted the covmat_mods branch March 8, 2018 12:19
voisey pushed a commit that referenced this pull request Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants