Skip to content

Fix covmat reg tests#779

Merged
wilsonmr merged 3 commits into
masterfrom
fix_covmat_test
May 18, 2020
Merged

Fix covmat reg tests#779
wilsonmr merged 3 commits into
masterfrom
fix_covmat_test

Conversation

@wilsonmr
Copy link
Copy Markdown
Contributor

I actually kept the old test and added another because I think that the procedure before wasn't robust and the test actually helped find this issues. To (hopefully) fix that I added in an abs() around the correlation matrix eigenvalue in case it is close to zero but negative

Then I added another test which checks with a slightly more sane matrix. Also using numpy.testing.assert_allclose whereever possible in that test module for better errors

closes #777

@wilsonmr wilsonmr requested review from Zaharid and scarlehoff May 13, 2020 10:14
@wilsonmr
Copy link
Copy Markdown
Contributor Author

I guess the true test is whether the CI passes because I can't easily test this locally

Copy link
Copy Markdown
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for me. I think we should merge this soonish (go Travis!)

just out of curiosity tried it in my computer to see whether it happens for me and it seems to be a numerical precision thing:

>>> a = np.arange(9, dtype=np.float32).reshape(3,3) ; eigvalsh(a@a.T)[0]
3.1273037e-06
>>> a = np.arange(9, dtype=np.float64).reshape(3,3) ; eigvalsh(a@a.T)[0]
-3.771475733269901e-15

I wonder what made it start failing though.

Comment thread validphys2/src/validphys/calcutils.py Outdated
if 1 / e_val[0] <= sqr_threshold: # eigh gives eigenvals in ascending order
# if eigenvalues are close to zero, can be negative which messes this up
# take abs here to be safe
if abs(1 / e_val[0]) <= sqr_threshold: # eigh gives eigenvals in ascending order
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather than an absolute value a + epsilon where epsilon=1e-12 or something would be better? That way if it is not 0 but actually negative it will "correctly" fail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I'm not sure what the best behaviour is if the method is supplied with a covariance matrix which is actually not positive semidefinite.

So with your suggestion if min eigenval was like -10 then (1/-10 + epsilon) < sqr_threshold
and it just wouldn't do anything to the covmat but also wouldn't raise any errors

I think that this method assumes that the covmat is semi-positive definite and it would be better to raise an error if e_val[0] + epsilon < 0 but I'm not sure exactly what we should set that threshold to be

Copy link
Copy Markdown
Member

@scarlehoff scarlehoff May 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, float64 is 16 digits and it seems that the problem is around that bound so 1e-14 seems fair. If the precision problems are greater than that we might want to start thinking about using a different library.

But yeah, I also don't know what the best behavior would be. Personally I like failures when things are not what they are supposed to be, isn't there a @check_covmat validphys decorator anywhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we did a bit of both then we at least know when it fails

PRECISION_THREHSOLD = 1e-12
...
if e_val[0] < PRECISION_THREHSOLD:
    raise SomeError("print what the value was so we can see whether its numerical or a non PSD covmat")
if 1 / (e_val[0] + PRECISION_THREHSOLD) <= sqr_threshold:
    ...

…l if smallest eigenvalue is more negative than -tolerance
@wilsonmr
Copy link
Copy Markdown
Contributor Author

thoughts on that? I quite like it as is because it fails loudly if the covmat is not positive definite within some tolerance we set.

Seeing as it relates to a project which @Zaharid was leading I wouldn't mind getting his opinion on it before merge

@scarlehoff
Copy link
Copy Markdown
Member

I'm happy with it. I agree, we can talk during the CM today.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 14, 2020

I'd say that rather than picking arbitrary thresholds, we should treat negative eigenvalues returned by eigh as zero, possibly with a big warning. These are "truly" either really zero or very small meaning that that matrix is not positive definite but only semipositive (which indeed is a case we should handle), but the eigenvalues can never actually be negative.

Note that it doesn't check if the matrix is symmetric and simply assumes it is, so under that assumption the above is exactly true.

Then for the purposes of regularization, zero or close to zero (as in less than the threshold) doesn't make any difference, which is kind of the point.

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 14, 2020

I guess a shorter way of saying the same thing is that we always want to regularize on the condition e_val[0] <= 0 so we could just add that.

@wilsonmr
Copy link
Copy Markdown
Contributor Author

ok in practice we can change the if to be

if e_val[0] > 1/sqr_threshold:
    return ...

right? Only positive numbers which are big enough cause matrix to not be regularized. Then raise a warning if e_val[0] < 0?

@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented May 14, 2020

Looks OK to me, but I was never very good with inequalities. Indeed we want an inequality that is not equivalent to the previous one!

@scarlehoff
Copy link
Copy Markdown
Member

Should we merge this?

@voisey voisey mentioned this pull request May 18, 2020
@wilsonmr
Copy link
Copy Markdown
Contributor Author

I think that would be good

@wilsonmr wilsonmr merged commit 7694bb5 into master May 18, 2020
@wilsonmr wilsonmr deleted the fix_covmat_test branch May 18, 2020 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

covmat regularization test fails

3 participants