Skip to content

Add tools to both fit and analyse fits with regularised covmats#541

Merged
scarrazza merged 8 commits into
masterfrom
reg_corr_mat
Oct 2, 2019
Merged

Add tools to both fit and analyse fits with regularised covmats#541
scarrazza merged 8 commits into
masterfrom
reg_corr_mat

Conversation

@wilsonmr
Copy link
Copy Markdown
Contributor

@wilsonmr wilsonmr commented Sep 3, 2019

After @Zaharid 's talk we want to escalate studies into unstable covmats. This will involve being able to regularize the covariance matrices using the proposed method on a more industrial scale.

Actions can be added for various analyses, just the base level function has been added to calcutils

Also since the tools will be available in the validphys structure, it is pretty trivial to add the possibility of modifying the covmats at the level of a fit (in the new fitting code)

The implementation of regularizing covmats in the fit needs to be done in a more flexible/sensible way, not hardcoded as it is now.

@wilsonmr wilsonmr requested a review from Zaharid September 3, 2019 15:27
@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented Sep 3, 2019

This is just to say I am doing a preliminary global fit, which appears to be running fine here in Edinburgh with regularized covmats (with max condition number 500 as per the hardcoded value)

Obviously I should come up with a more robust way of being able to do this again with different acceptable condition numbers

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Sep 3, 2019

Sorry, I find difficult to understand what this thing is doing in the end. Is. I'd say the thing as discussed applied to dataset covariance matrices (and then you put together somehow the experiment ones), but seems to me this is working on the experiment ones directly. I haven't thought very much on how to do this precisely. I guess one question I have is whether regularizing a block diagonal matrix is the same as regularizing the individual blocks (up to possibly large numerical errors). If that is the case, then this is probably fine.

In general I find this part of the code needs to be made more clear but unfortunately we need things like #404 and #476 to be able to meaningfully improve. In any case it seems to me that common_data_reader_experiment and common_data_reader_dataset do something completely different from common_data_reader and so they should be called differently.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Sep 3, 2019

Also it should be trivial to pass runcard options to these functions...

Comment thread validphys2/src/validphys/calcutils.py Outdated
corr = (covmat/d)/d[:, np.newaxis]
e_val, e_vec = la.eigh(corr)
new_e_val = np.clip(e_val, a_min=max(e_val)/cond_num_threshold, a_max=None)
new_corr = e_vec@(np.diag(new_e_val)@e_vec.T)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we don't need to construct a diagonal matrix, but we can use the matrix-vector * operator.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Sep 3, 2019

I think regularizing the blocks should indeed be the same. In the end it is the union of independent subspaces.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented Sep 3, 2019

Sorry, I find difficult to understand what this thing is doing in the end. Is. I'd say the thing as discussed applied to dataset covariance matrices (and then you put together somehow the experiment ones), but seems to me this is working on the experiment ones directly. I haven't thought very much on how to do this precisely. I guess one question I have is whether regularizing a block diagonal matrix is the same as regularizing the individual blocks (up to possibly large numerical errors). If that is the case, then this is probably fine.

ah yeah, you're right I didn't think this through, it indeed needs to be done on the level of the datasets. I don't think that doing it on the block diagonal can possibly be the same because it will set the clip value according to the max eigenvalue of the experiment not each dataset, which means the fit I'm running is not right.

Also I was wondering if really we need to regularize before data generation? Since Tommaso pointed out to me that the data generation relies on cholesky decomp so might still be affected by instabilities?

Perhaps to do this properly I really need to do this in libnnpdf, or wait for the PRs you mentioned

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Sep 3, 2019

Indeed. Talking about block matrices, the eigenvalue decomposition is the union (with suitable direct sums of subspaces), but the condition number is different in that it picks the largest eigenvalue in any subspace and compares it to the smallest.

This is then too pessimistic because we don't expect fluctuations rotating across the blocks.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Sep 3, 2019

Regarding the sampling, I am less worried because it does not depend on inverse matrices. So the sampling of original and regularized covmats should not change much. This is of course not to say that w shouldn't do it consistently if we could, but I wouldn't stop running things because of it.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented Sep 3, 2019

Although the original prescription concerned dataset covmats, I would say from an PDF fitter perspective regularizing the experiment covmats is not necessarily incorrect because these are the things we are inverting in the fit.

From a practical point of view I want the thing I'm inverting to have a condiiton number < 500 I don't see a problem with this, just that it's a bit different than what I was aiming to do

…ovmat functions in rsults, improved documentation
…d in as many functions, added comment in n3fit to help others understand why it is hardcoded to be true
@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented Sep 4, 2019

Ok it made sense to do something about #532 I split the covariance matrix function into two seperate actions.
I then added validphys.results to n3fit providers so I can use experiments_covariance_matrix to collect up the covmats. I checked that they definitely are loading t0, this doesn't affect replica generation so I managed to remove a bunch of t0sets which were being passed around some n3fit functions.
I added the regularize function to both the vp covmat providers so that we can control them at the runcard level and use them in both fitting and analyses

Let me know what you think

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented Sep 4, 2019

Oh I should say that half the time before the libnnpdf cholesky was being used, now I'm using the scipy.linalg one, I ran pytest and it didn't appear to make a differene according to the tests, so I think this is fine?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Sep 4, 2019 via email

…ct at least) added collect over datasets covmats
@wilsonmr wilsonmr requested a review from enocera September 4, 2019 14:59
@wilsonmr wilsonmr changed the title [WIP] Add tools to both fit and analyse fits with regularised covmats Add tools to both fit and analyse fits with regularised covmats Sep 19, 2019
@wilsonmr
Copy link
Copy Markdown
Contributor Author

Ok I removed the changes I made in n3fit and distilled this down to:

  • base level function which performs regularization on a symmetric matrix
  • config parser which allows one to specify in validphys runcard to do this on all covmats in report
  • improved the covmat related providers in results so they actually do a thing similar to what their name suggests

Happy for review/merge

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Sep 19, 2019

So this does not affect the (n3)fit at all at the moment, right?

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented Sep 19, 2019 via email

@wilsonmr
Copy link
Copy Markdown
Contributor Author

wilsonmr commented Oct 2, 2019

Is this waiting on anything?

@scarrazza scarrazza merged commit 2c6bd6e into master Oct 2, 2019
@Zaharid Zaharid deleted the reg_corr_mat branch June 5, 2020 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants