[Speculative] Modifications used for reweighting#120
Conversation
…ration of sqrt matrix
…ration of sqrt matrix
…ration of sqrt matrix
…ration of sqrt matrix
Zaharid
left a comment
There was a problem hiding this comment.
In general I am strongly in favour of this change. Most of the problems are for existing code, but would be good to improve them (especially the very inefficient covmat building).
The interface for ComputeCovMat is not ideal in that it doesn't enforce at compile time that the t0 predictions are actually compatible.
Can we take a t0pdf and compute the predictions in place?
| namespace NNPDF{ | ||
| matrix<double> ComputeCovMat(CommonData const& cd, std::vector<double> const& t0); | ||
| matrix<double> ComputeSqrtMat(matrix<double> const& inmatrix); | ||
| void ComputeChi2_basic(int const& nDat, int const& nMem, |
There was a problem hiding this comment.
Please do int nDat instead of int const & nDat and similar for the others. There is no point in taking basic types by reference.
There was a problem hiding this comment.
Should this be public? Seems rather cumbersome.
There was a problem hiding this comment.
So for the case I was using it for, sortof yes. The problem with reweighting is that mostly you don't have FK tables and so can't build a DataSet. Making ComputeChi_basic public (and indeed, separating it from ComputeChi2 at all) was so that I can call it without a DataSet.
The next steps would be figuring out a less tightly-coupled version of DataSet I guess.
| const int ndat = cd.GetNData(); | ||
| const int nsys = cd.GetNSys(); | ||
| if (ndat <= 0) | ||
| throw LengthError("ComputeCovMat","invalid number of datapoints!"); |
There was a problem hiding this comment.
What is the use case for this check? Shouldn't this be taken care of somewhere else?
| throw RuntimeException("ComputeCovMat", "Inconsistent naming of systematics"); | ||
| if (isys.name == "SKIP") | ||
| continue; | ||
| const bool is_correlated = ( isys.name != "UNCORR" && isys.name !="THEORYUNCORR"); |
There was a problem hiding this comment.
Can we somehow avoid these checks being run ndata**2*nsys times? We have plots showing that it is a huge bottleneck for anything relying on these computations. Also it really looks that we can only look at the systematics of the first point, and do away with the i==j check?
There was a problem hiding this comment.
Ah, that feels like a different battle to me, the battle for #25
| template<class T> | ||
| void ComputeChi2(const T* set, int const& nMem, real *const& theory, real *chi2) | ||
| auto CovMat = NNPDF::matrix<double>(ndat, ndat); | ||
| for (int i = 0; i < ndat; i++) |
There was a problem hiding this comment.
Sorry for the stylistic complaint (let's use clang-format!) but I really feel strongly that the outer for loop needs curly braces.
| throw LengthError("CholeskyDecomposition","attempting a decomposition of an empty matrix!"); | ||
| matrix<double> sqrtmat(n,n); | ||
|
|
||
| gsl_matrix* mat = gsl_matrix_calloc(n, n); |
There was a problem hiding this comment.
Maybe we can use gsl_matrix_view on inmatrix.data()?
There was a problem hiding this comment.
Oh nice, I didn't know about gsl_matrix_view. Could help with the CMA minimiser too.
|
Does the regression test pass with this change? |
|
The answer to
Is the same as the answer Re: ComputeChi2_basic. I need to compute it without a FK table. |
What regression test? |
|
https://github.com/NNPDF/nnpdf/blob/master/libnnpdf/tests/vp/test_regressions.py Run with |
|
Alright, I've changed over to using the matrix views (they are great) and with that your regression tests appear to work fine. ` |
|
Ideally now I'd like to get CommonData(vector<CommonData const&>& subsets);Which would handle the management of named systypes etc. |
|
This would naturally also be half-way to an |
|
I think |
| return sqrtmat; | ||
| } | ||
|
|
||
| // TODO to sort this out, need to make data and theory vectors |
There was a problem hiding this comment.
I'm actually starting to think that vectors everywhere is not such a good idea (though we could have it as a higher level interface), especially in view of wanting to use something like this:
There was a problem hiding this comment.
I don't know what that is, nor why we would want to use it.
What would you use other than vectors? Stick to plain old pointers?
There was a problem hiding this comment.
I think both validphys and nnfit would benefit a lot from sharing memory between processes (like fktables). I think fktables are big enough that you will not see the performance difference in masking the train/valid split as opposed to actually slicing the tables. For vp the cost of initializing the same pdfs and fktables in several processes often offset the advantages of the parallel mode, and so it would be good if these things were loaded in shared memory once. That thing seems like a convenient way to do just that, but then you must control the allocator, which is a pain to do with the std containers.
|
I was going to wait to merge this until I had chance to sort out the asymmetry with |
|
I think it would be good if the validphys actions used this and avoided copying the experiments like crazy. |
|
I'll take that as a yes |
A bit of a speculative P.R this one, it contains the modifications I made to quickly get a reweighting exercise up and running using external predictions.
There are three modifications
Which handles w=0 a little better.
Point (2) being the more controversial one, it being a kind of half-assed solution to #21.
This is probably not in any state to be merged right now, but might form a basis to actually getting a better solution to #21. The important thing being that here
DataSetandExperimenthave their covariance matrices being handled rather asymmetrically.(Also, apologies for borking up the rebase, now there are a bunch of superfluous commits, this'll have to be a squash merge if it ever does get merged).