Add tools to both fit and analyse fits with regularised covmats#541
Conversation
|
This is just to say I am doing a preliminary global fit, which appears to be running fine here in Edinburgh with regularized covmats (with max condition number 500 as per the hardcoded value) Obviously I should come up with a more robust way of being able to do this again with different acceptable condition numbers |
|
Sorry, I find difficult to understand what this thing is doing in the end. Is. I'd say the thing as discussed applied to dataset covariance matrices (and then you put together somehow the experiment ones), but seems to me this is working on the experiment ones directly. I haven't thought very much on how to do this precisely. I guess one question I have is whether regularizing a block diagonal matrix is the same as regularizing the individual blocks (up to possibly large numerical errors). If that is the case, then this is probably fine. In general I find this part of the code needs to be made more clear but unfortunately we need things like #404 and #476 to be able to meaningfully improve. In any case it seems to me that |
|
Also it should be trivial to pass runcard options to these functions... |
| corr = (covmat/d)/d[:, np.newaxis] | ||
| e_val, e_vec = la.eigh(corr) | ||
| new_e_val = np.clip(e_val, a_min=max(e_val)/cond_num_threshold, a_max=None) | ||
| new_corr = e_vec@(np.diag(new_e_val)@e_vec.T) |
There was a problem hiding this comment.
Probably we don't need to construct a diagonal matrix, but we can use the matrix-vector * operator.
|
I think regularizing the blocks should indeed be the same. In the end it is the union of independent subspaces. |
ah yeah, you're right I didn't think this through, it indeed needs to be done on the level of the datasets. I don't think that doing it on the block diagonal can possibly be the same because it will set the clip value according to the max eigenvalue of the experiment not each dataset, which means the fit I'm running is not right. Also I was wondering if really we need to regularize before data generation? Since Tommaso pointed out to me that the data generation relies on cholesky decomp so might still be affected by instabilities? Perhaps to do this properly I really need to do this in libnnpdf, or wait for the PRs you mentioned |
|
Indeed. Talking about block matrices, the eigenvalue decomposition is the union (with suitable direct sums of subspaces), but the condition number is different in that it picks the largest eigenvalue in any subspace and compares it to the smallest. This is then too pessimistic because we don't expect fluctuations rotating across the blocks. |
|
Regarding the sampling, I am less worried because it does not depend on inverse matrices. So the sampling of original and regularized covmats should not change much. This is of course not to say that w shouldn't do it consistently if we could, but I wouldn't stop running things because of it. |
|
Although the original prescription concerned dataset covmats, I would say from an PDF fitter perspective regularizing the experiment covmats is not necessarily incorrect because these are the things we are inverting in the fit. From a practical point of view I want the thing I'm inverting to have a condiiton number < 500 I don't see a problem with this, just that it's a bit different than what I was aiming to do |
…ovmat functions in rsults, improved documentation
…d in as many functions, added comment in n3fit to help others understand why it is hardcoded to be true
|
Ok it made sense to do something about #532 I split the covariance matrix function into two seperate actions. Let me know what you think |
|
Oh I should say that half the time before the libnnpdf cholesky was being used, now I'm using the |
|
On Wed, Sep 4, 2019 at 12:04 PM wilsonmr ***@***.***> wrote:
Oh I should say that half the time before the libnnpdf cholesky was being
used, now I'm using the scipy.linalg one, I ran pytest and it didn't
appear to make a differene according to the tests, so I think this is fine?
I think that at some point I convinced myself that they end up running
essentially the same code, after many indirections. Except that they might
use different LAPACK providers.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#541?email_source=notifications&email_token=ABLJWUXKR4LXOQLUPKJ24YLQH6I2JA5CNFSM4ITHXLZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD53GKZQ#issuecomment-527852902>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABLJWUWRYQMJRGML5RLM4OLQH6I2JANCNFSM4ITHXLZQ>
.
|
…ct at least) added collect over datasets covmats
|
Ok I removed the changes I made in
Happy for review/merge |
|
So this does not affect the (n3)fit at all at the moment, right? |
|
As far as I know it shouldn't affect any fitting code.
N3fit loading of data is orthogonal to this (since I undid the changes to
use validphys.results)
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
|
|
Is this waiting on anything? |
After @Zaharid 's talk we want to escalate studies into unstable covmats. This will involve being able to regularize the covariance matrices using the proposed method on a more industrial scale.
Actions can be added for various analyses, just the base level function has been added to calcutils
Also since the tools will be available in the validphys structure, it is pretty trivial to add the possibility of modifying the covmats at the level of a fit (in the new fitting code)
The implementation of regularizing covmats in the fit needs to be done in a more flexible/sensible way, not hardcoded as it is now.