Disambiguate the sign (phase) of the mSSA PCs#110
Conversation
There was a problem hiding this comment.
Nice changes! I've also put these changes through a standard set of tests, including running on a 'real research' project and produced all expected outputs.
Users might notice the time hit by default if they are running production values; but we can work with this in documentation. To provide some context: on a research problem, I saw approximately a doubling in time with Sign : true vs Sign : false; of course this is problem dependent but that's probably roughly true in general.
The only reason I can see for not merging this immediately is if we think there will be any more algorithmic changes?
Agreed. I tried subsampling to reduce the rank, but doing the strided computation row-by-column rather than the full matrix multiply was dreadfully slow. I think it breaks the optimization heuristics built into Eigen3. However, there are ways of speeding it up. For example, making two new data arrays with every nth row and every nth column and then doing the matrix multiplies should get around the matrix arithmetic bottleneck that is slowing it down. Not sure whether that's a good or bad idea, however. It would still be 'deterministic' so I wouldn't be terribly worried. But I'm not sure how to best test that either. |
|
I am not planning any more work on this in the near future. Unless we have an entirely new idea for a heuristic sign-choice algorithm. |
Problem
Signs of SVD left and right singular values have ambiguous signs. Subsequent SVD invocations in
expMSSAcan result in PCs with different signs (i.e. pi radians out of phase).Fix
Choose signs of the vectors in the eigen triple based on their squared-norm-weight projection on the original data matrix (either the trajectory or covariance matrix, in our case).
Comments
The algorithm is implemented in a helper function called
SvdSignChoice. This routine has been checked by decomposing a run of random matrices and confirming that the algorithm reconstructs the input data to machine precision after the sign choice is applied.There is one big down side: this literal algorithm is slow since it's passing through the entire trajectory matrix. I have tried to speed this up by striding the sampling the trajectory so that the number of samples equals the rank rather than a full computation. But the resulting inefficiency was large, so it wasn't helpful.
The addition to
expMSSAis then trivial. I have included a config flag,Signto turn this algorithm on and off, if need be. It is nowonby default. Code has been tested on thepyEXP-examples/Tutorials/mSSAnotebooks.