Add tools to both fit and analyse fits with regularised covmats by wilsonmr · Pull Request #541 · NNPDF/nnpdf

wilsonmr · 2019-09-03T15:27:46Z

After @Zaharid 's talk we want to escalate studies into unstable covmats. This will involve being able to regularize the covariance matrices using the proposed method on a more industrial scale.

Actions can be added for various analyses, just the base level function has been added to calcutils

Also since the tools will be available in the validphys structure, it is pretty trivial to add the possibility of modifying the covmats at the level of a fit (in the new fitting code)

The implementation of regularizing covmats in the fit needs to be done in a more flexible/sensible way, not hardcoded as it is now.

wilsonmr · 2019-09-03T15:29:05Z

This is just to say I am doing a preliminary global fit, which appears to be running fine here in Edinburgh with regularized covmats (with max condition number 500 as per the hardcoded value)

Obviously I should come up with a more robust way of being able to do this again with different acceptable condition numbers

Zaharid · 2019-09-03T16:23:46Z

Sorry, I find difficult to understand what this thing is doing in the end. Is. I'd say the thing as discussed applied to dataset covariance matrices (and then you put together somehow the experiment ones), but seems to me this is working on the experiment ones directly. I haven't thought very much on how to do this precisely. I guess one question I have is whether regularizing a block diagonal matrix is the same as regularizing the individual blocks (up to possibly large numerical errors). If that is the case, then this is probably fine.

In general I find this part of the code needs to be made more clear but unfortunately we need things like #404 and #476 to be able to meaningfully improve. In any case it seems to me that common_data_reader_experiment and common_data_reader_dataset do something completely different from common_data_reader and so they should be called differently.

Zaharid · 2019-09-03T16:25:13Z

Also it should be trivial to pass runcard options to these functions...

Zaharid · 2019-09-03T16:29:01Z

+    corr = (covmat/d)/d[:, np.newaxis]
+    e_val, e_vec = la.eigh(corr)
+    new_e_val = np.clip(e_val, a_min=max(e_val)/cond_num_threshold, a_max=None)
+    new_corr = e_vec@(np.diag(new_e_val)@e_vec.T)


Probably we don't need to construct a diagonal matrix, but we can use the matrix-vector * operator.

Zaharid · 2019-09-03T16:37:14Z

I think regularizing the blocks should indeed be the same. In the end it is the union of independent subspaces.

wilsonmr · 2019-09-03T16:45:04Z

Sorry, I find difficult to understand what this thing is doing in the end. Is. I'd say the thing as discussed applied to dataset covariance matrices (and then you put together somehow the experiment ones), but seems to me this is working on the experiment ones directly. I haven't thought very much on how to do this precisely. I guess one question I have is whether regularizing a block diagonal matrix is the same as regularizing the individual blocks (up to possibly large numerical errors). If that is the case, then this is probably fine.

ah yeah, you're right I didn't think this through, it indeed needs to be done on the level of the datasets. I don't think that doing it on the block diagonal can possibly be the same because it will set the clip value according to the max eigenvalue of the experiment not each dataset, which means the fit I'm running is not right.

Also I was wondering if really we need to regularize before data generation? Since Tommaso pointed out to me that the data generation relies on cholesky decomp so might still be affected by instabilities?

Perhaps to do this properly I really need to do this in libnnpdf, or wait for the PRs you mentioned

Zaharid · 2019-09-03T17:02:37Z

Indeed. Talking about block matrices, the eigenvalue decomposition is the union (with suitable direct sums of subspaces), but the condition number is different in that it picks the largest eigenvalue in any subspace and compares it to the smallest.

This is then too pessimistic because we don't expect fluctuations rotating across the blocks.

Zaharid · 2019-09-03T17:04:47Z

Regarding the sampling, I am less worried because it does not depend on inverse matrices. So the sampling of original and regularized covmats should not change much. This is of course not to say that w shouldn't do it consistently if we could, but I wouldn't stop running things because of it.

wilsonmr · 2019-09-03T17:09:48Z

Although the original prescription concerned dataset covmats, I would say from an PDF fitter perspective regularizing the experiment covmats is not necessarily incorrect because these are the things we are inverting in the fit.

From a practical point of view I want the thing I'm inverting to have a condiiton number < 500 I don't see a problem with this, just that it's a bit different than what I was aiming to do

…ovmat functions in rsults, improved documentation

…3fit

…d in as many functions, added comment in n3fit to help others understand why it is hardcoded to be true

wilsonmr · 2019-09-04T11:02:23Z

Ok it made sense to do something about #532 I split the covariance matrix function into two seperate actions.
I then added validphys.results to n3fit providers so I can use experiments_covariance_matrix to collect up the covmats. I checked that they definitely are loading t0, this doesn't affect replica generation so I managed to remove a bunch of t0sets which were being passed around some n3fit functions.
I added the regularize function to both the vp covmat providers so that we can control them at the runcard level and use them in both fitting and analyses

Let me know what you think

wilsonmr · 2019-09-04T11:04:04Z

Oh I should say that half the time before the libnnpdf cholesky was being used, now I'm using the scipy.linalg one, I ran pytest and it didn't appear to make a differene according to the tests, so I think this is fine?

Zaharid · 2019-09-04T11:07:01Z

On Wed, Sep 4, 2019 at 12:04 PM wilsonmr ***@***.***> wrote: Oh I should say that half the time before the libnnpdf cholesky was being used, now I'm using the scipy.linalg one, I ran pytest and it didn't appear to make a differene according to the tests, so I think this is fine?

I think that at some point I convinced myself that they end up running essentially the same code, after many indirections. Except that they might use different LAPACK providers.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#541?email_source=notifications&email_token=ABLJWUXKR4LXOQLUPKJ24YLQH6I2JA5CNFSM4ITHXLZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD53GKZQ#issuecomment-527852902>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABLJWUWRYQMJRGML5RLM4OLQH6I2JANCNFSM4ITHXLZQ> .

…ct at least) added collect over datasets covmats

…ging purposes

wilsonmr · 2019-09-19T11:19:39Z

Ok I removed the changes I made in n3fit and distilled this down to:

base level function which performs regularization on a symmetric matrix
config parser which allows one to specify in validphys runcard to do this on all covmats in report
improved the covmat related providers in results so they actually do a thing similar to what their name suggests

Happy for review/merge

Zaharid · 2019-09-19T12:06:06Z

So this does not affect the (n3)fit at all at the moment, right?

wilsonmr · 2019-09-19T12:35:16Z

As far as I know it shouldn't affect any fitting code. N3fit loading of data is orthogonal to this (since I undid the changes to use validphys.results) The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

wilsonmr · 2019-10-02T12:40:22Z

Is this waiting on anything?

wilsonmr added 2 commits September 3, 2019 12:39

add simple corrmat reg to calcutils

b4aabdd

regularize covmats at the level of n3fit

d0fbf5e

wilsonmr requested a review from Zaharid September 3, 2019 15:27

Zaharid reviewed Sep 3, 2019

View reviewed changes

wilsonmr added 3 commits September 4, 2019 10:29

restructured covmat functions to just return covmats and added sqrt c…

66020f8

…ovmat functions in rsults, improved documentation

added covmat reg to covmat providers, use these covmat functions in n…

955879f

…3fit

since using collected experiments_covariance_matrices, t0 not require…

ffb45c1

…d in as many functions, added comment in n3fit to help others understand why it is hardcoded to be true

slightly improved log message about regularizing covmats (to be corre…

04015bd

…ct at least) added collect over datasets covmats

wilsonmr requested a review from enocera September 4, 2019 14:59

wilsonmr added 2 commits September 19, 2019 12:14

make regularize_covmat nicer and remove log message I added for debug…

17cf16c

…ging purposes

removed changes to n3fit

f47f4de

wilsonmr changed the title ~~[WIP] Add tools to both fit and analyse fits with regularised covmats~~ Add tools to both fit and analyse fits with regularised covmats Sep 19, 2019

enocera mentioned this pull request Sep 20, 2019

Single- and di-jet measurements from ATLAS and CMS #565

Closed

enocera approved these changes Oct 2, 2019

View reviewed changes

scarrazza merged commit 2c6bd6e into master Oct 2, 2019

wilsonmr mentioned this pull request Oct 2, 2019

regularization should have unit tests #584

Closed

Zaharid deleted the reg_corr_mat branch June 5, 2020 13:20

Conversation

wilsonmr commented Sep 3, 2019

Uh oh!

wilsonmr commented Sep 3, 2019

Uh oh!

Zaharid commented Sep 3, 2019

Uh oh!

Zaharid commented Sep 3, 2019

Uh oh!

Zaharid Sep 3, 2019

Choose a reason for hiding this comment

Uh oh!

Zaharid commented Sep 3, 2019

Uh oh!

wilsonmr commented Sep 3, 2019

Uh oh!

Zaharid commented Sep 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zaharid commented Sep 3, 2019

Uh oh!

wilsonmr commented Sep 3, 2019

Uh oh!

wilsonmr commented Sep 4, 2019

Uh oh!

wilsonmr commented Sep 4, 2019

Uh oh!

Zaharid commented Sep 4, 2019 via email

Uh oh!

wilsonmr commented Sep 19, 2019

Uh oh!

Zaharid commented Sep 19, 2019

Uh oh!

wilsonmr commented Sep 19, 2019 via email

Uh oh!

wilsonmr commented Oct 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Zaharid commented Sep 3, 2019 •

edited

Loading