Skip to content

Conversation

@alexrockhill
Copy link
Contributor

@alexrockhill alexrockhill commented Jun 4, 2020

Addresses #7862.

Passes all tests and matches example from https://github.com/alberto-ara/Surface-Laplacian but testing data needs to be added to mne-testing-data and some code review would be appreciated. I think the for loops could or mostly all be changed to matrix multiplication or called through numpy/scipy but I'll need to look into that. Thanks @mservantufc for bringing this to attention. It appears as if the previous version was not working.

Maybe @larsoner and/or @jasmainak might have a minute to review since they were nice enough to help me with the original PR :) ?

  • Figure out units (don't necessarily use unit sphere)
  • Figure out cosine similarity versus 1 minus half the squared Euclidean distance

@agramfort
Copy link
Member

agramfort commented Jun 4, 2020

@alexrockhill can you touch the example file so we see how it looks now on the circleci artifact?

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

Happy to help make it more efficient, etc. but we need to get the test files over to mne-testing-data first. If you want, you can just email them to me and I can do it

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

Oh wait, I see ... actually these are small enough that they can probably just stay here. I'll fix the code

@agramfort
Copy link
Member

agramfort commented Jun 4, 2020 via email

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

can you compress the files a bit? maybe ship them as npz?

Oof actually I didn't realize that test_eeg.csv is 30 MB. This should indeed be moved to mne-testing-data. I can do this quickly

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

(I will probably also just save it as .mat for ease of cross-platform loading)

@agramfort
Copy link
Member

agramfort commented Jun 4, 2020 via email

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

@alexrockhill I pushed a commit to:

  1. Vectorize everything
  2. Move test files to .mat in mne-testing-data
  3. Touch the CSD example

The CSD example looks bad now. Also, I noticed that the norm of the matrix G is about six orders of magnitude different between the EEG-file test and the FIF-file test, which suggests that there is some scale factor problem going on (maybe a units problem, everything radius/pos should be done in meters, maybe one at some point is in mm?

On master the norm of G appears to be around 0.001, whereas on this PR it's much larger. This effectively changes the usefulness of lambda2 quite a bit. We're going to need to fix this -- perhaps we should always use lambda2 * linalg.norm(G) or so, which will make it more invariant to these sorts of scale changes. We'll also need to adjust the default from 1e-5 to something much higher (0.05? maybe set by eye) to match the new norm levels.

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

... one thing that might help would be also saving the G and H matrices so we can check intermediate-step equivalence against them as we used to do.

@alexrockhill
Copy link
Contributor Author

@alexrockhill I pushed a commit to:

  1. Vectorize everything
  2. Move test files to .mat in mne-testing-data
  3. Touch the CSD example

The CSD example looks bad now. Also, I noticed that the norm of the matrix G is about six orders of magnitude different between the EEG-file test and the FIF-file test, which suggests that there is some scale factor problem going on (maybe a units problem, everything radius/pos should be done in meters, maybe one at some point is in mm?

On master the norm of G appears to be around 0.001, whereas on this PR it's much larger. This effectively changes the usefulness of lambda2 quite a bit. We're going to need to fix this -- perhaps we should always use lambda2 * linalg.norm(G) or so, which will make it more invariant to these sorts of scale changes. We'll also need to adjust the default from 1e-5 to something much higher (0.05? maybe set by eye) to match the new norm levels.

Thanks so much, I am definitely not as skilled at vectorizing.

I think maybe pos * 1e-3 can be added with sphere updated accordingly and it should still pass the tests. I can check.

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

Pushed one more efficiency commit, no more redundant lpn calculations, or unnecessary diagonal ones.

This might make it more difficult to debug, though, so feel free to go back to your old code for _calc_G_H for testing and iteration if you need to. I can easily restore the more efficient version later (or feel free to copy-paste it back in once you have everything working).

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

Ahh indeed pos appears to be in mm as does sphere, I think adjusting this plus lambda2 might fix it ...

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

But radius also shows up in a later calculation. I guess this means that it's not scale invariant...?

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

BTW you can see in the output of this example that something is broken (and that lambda2 does not currently matter):

https://20396-1301584-gh.circle-artifacts.com/0/dev/auto_examples/preprocessing/plot_eeg_csd.html#sphx-glr-auto-examples-preprocessing-plot-eeg-csd-py

@alexrockhill
Copy link
Contributor Author

Hmmm yeah changing to mm and then changing 85 radius to 0.085 didn't work, I agree on it not being scale invariant. Not sure what's going on here though.

@alexrockhill
Copy link
Contributor Author

Although the tests are passing and the evoked data is the same, when I went back and plotted this example https://github.com/alberto-ara/Surface-Laplacian/blob/master/surface%20laplacian%20example.ipynb the topomap it was uniform... I think the tests aren't covering the issue for some reason
csd_example

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

Maybe the atol is wrong/meaningless. Does the topomap look okay with your code?

Also, is it possible that the electrode locations are supposed to be projected onto a sphere? They currently are not, unless by happenstance you are using a spherical montage with zero origin.

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

... based on code comments for the old _calc_g, indeed it looks like cos_dist should be between 0 and 1 because it's meant to be "cosine of angles between pairs of points on a spherical surface". So really it shouldn't be calculated using the squared euclidean distance, but rather directly via the dot product of unit vectors. I'll see if it fixes the FIF data...

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

Okay this might have fixed things, let me push...

@alexrockhill
Copy link
Contributor Author

Ahh I think you're right the the error tolerance is meaningless. The scale of the example data is 1e-6 to 1e-7 so etol would have to be much lower.

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

If I go back to c2586be (when you opened the PR) and fix the path problems and use a sensible atol, you can see things are already broken there:

>       assert_allclose(epochs_csd.average().data, csd, atol=1e-7)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=1e-07
E       
E       Mismatched elements: 38831 / 40960 (94.8%)
E       Max absolute difference: 1.23793879e-05
E       Max relative difference: 9.19420201
E        x: array([[-3.616917e-10, -7.107747e-10, -6.617863e-10, ..., -7.082218e-10,
E               -6.186906e-10, -7.052339e-10],
E              [-3.351042e-10, -7.403796e-10, -7.674658e-10, ...,  1.577782e-10,...
E        y: array([[-1.023747e-06, -5.399796e-07, -1.163600e-06, ..., -1.681479e-06,
E               -1.770453e-06, -1.318637e-06],
E              [-8.542492e-07, -5.915384e-07, -1.396437e-06, ...,  9.873015e-07,...

So maybe go back to that commit and fix things first? Then you can force-push and I can cherry-pick my commits on top...

@alexrockhill
Copy link
Contributor Author

Totally my fault I just realized the test data is likely the before data and not the after laplacian data because of a very stupid switching return error, my bad!!!

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

We can update the testing data to fix it, if you want to work from c2586be and push-force over my changes I can fix the testing data. But we should wait until we have something that works with a reasonable atol first

@alexrockhill
Copy link
Contributor Author

Ok I plotted and double checked and that should be the right comparison csd data and everything should hopefully make more sense in a minute.

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

If the CSV you pushed is the one to compare against and it passes on c2586be, I can take care of the rest of the annoying steps.

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

... with that file on top of c2586be I get:

mne/preprocessing/tests/test_csd.py:84: in test_csd_matlab
    assert_allclose(epochs_csd.average().data, csd, atol=1e-10)
E   AssertionError: 
E   Not equal to tolerance rtol=1e-07, atol=1e-10
E   
E   Mismatched elements: 40960 / 40960 (100%)
E   Max absolute difference: 4.65739658e-05
E   Max relative difference: 4.92637861
E    x: array([[-3.616917e-10, -7.107747e-10, -6.617863e-10, ..., -7.082218e-10,
E           -6.186906e-10, -7.052339e-10],
E          [-3.351042e-10, -7.403796e-10, -7.674658e-10, ...,  1.577782e-10,...
E    y: array([[ 3.122692e-06,  6.683897e-06,  9.506682e-06, ..., -2.452742e-05,
E           -2.426614e-05, -2.801882e-05],
E          [ 7.973345e-06,  9.962245e-06,  1.211932e-05, ..., -9.860589e-06,...

So hopefully there is more to come and I just should have been more patient :)

@alexrockhill
Copy link
Contributor Author

I can go back and fix tests, sorry I may need a bit of time to clean it all up from where it worked

@larsoner
Copy link
Member

larsoner commented Jun 4, 2020

Sounds good, as I said feel free to force push over my changes, it's possible that my calculations are not equivalent since I was assuming the tests would tell me if something broke. Once you have something that matches, knowing the efficiency bits and the project-to-sphere-for-cosine-distance things I think I can reincorporate my stuff fairly easily!

@larsoner
Copy link
Member

larsoner commented Jun 5, 2020

PR vs master

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jun 5, 2020

Here is the relevant code from the original MATLAB source as far as I can tell

Details
% CSD - Current Source Density (CSD) transformation based on spherical spline
%       surface Laplacian as suggested by Perrin et al. (1989, 1990)
%
% (published in appendix of Kayser J, Tenke CE, Clin Neurophysiol 2006;117(2):348-368)
%
% Usage: [X, Y] = CSD(Data, G, H, lambda, head);
%
% Implementation of algorithms described by Perrin, Pernier, Bertrand, and
% Echallier in Electroenceph Clin Neurophysiol 1989;72(2):184-187, and 
% Corrigenda EEG 02274 in Electroenceph Clin Neurophysiol 1990;76:565.
% 
% Input parameters:
%   Data = surface potential electrodes-by-samples matrix
%      G = g-function electrodes-by-electrodes matrix
%      H = h-function electrodes-by-electrodes matrix
% lambda = smoothing constant lambda (default = 1.0e-5)
%   head = head radius (default = no value for unit sphere [µV/m²])
%          specify a value [cm] to rescale CSD data to smaller units [µV/cm²]
%          (e.g., use 10.0 to scale to more realistic head size)
%
% Output parameter:
%      X = current source density (CSD) transform electrodes-by-samples matrix
%      Y = spherical spline surface potential (SP) interpolation electrodes-
%          by-samples matrix (only if requested)
%
% Copyright (C) 2003 by Jürgen Kayser (Email: kayserj@pi.cpmc.columbia.edu)
% GNU General Public License (http://www.gnu.org/licenses/gpl.txt)
% Updated: $Date: 2005/02/11 14:00:00 $ $Author: jk $
%        - code compression and comments 
% Updated: $Date: 2007/02/07 11:30:00 $ $Author: jk $
%        - recommented rescaling (unit sphere [µV/m²] to realistic head size [µV/cm²])
%   Fixed: $Date: 2009/05/16 11:55:00 $ $Author: jk $
%        - memory claim for output matrices used inappropriate G and H dimensions
%   Added: $Date: 2009/05/21 10:52:00 $ $Author: jk $
%        - error checking of input matrix dimensions
%
function [X, Y] = CSD(Data, G, H, lambda, head)
[nElec,nPnts] = size(Data);            % get data matrix dimensions
if ~(size(G,1) == size(G,2)) | ...     % matrix dimension error checking: 
   ~(size(H,1) == size(H,2)) | ...     % G and H matrix must be nElec-by-nElec
   ~(size(G,1) == nElec) | ...         
   ~(size(H,1) == nElec)
   X = NaN; Y = NaN;                   % default to invalid output 
   help CSD
   disp(sprintf(strcat('*** Error: G (%d-by-%d) and H (%d-by-%d) matrix', ...
                       ' dimensions must match rows (%d) of data matrix'), ...
                       size(G),size(H),nElec));
   return 
end   
mu = mean(Data);                       % get grand mean
Z = (Data - repmat(mu,nElec,1));       % compute average reference
Y = Data;  X = Y;                      % claim memory for output matrices
if nargin < 5; head = 1.0; end;        % initialize scaling variable [µV/m²]
head = head * head;                    % or rescale data to head sphere [µV/cm²]
if nargin < 4; lambda = 1.0e-5; end;   % initialize smoothing constant
for e = 1:size(G,1);                   % add smoothing constant to diagonale
  G(e,e) = G(e,e) + lambda; 
end; 
Gi = inv(G);                           % compute G inverse
for i = 1:size(Gi,1);                  % compute sums for each row
  TC(i) = sum(Gi(i,:));
end;
sgi = sum(TC);                         % compute sum total
for p = 1:nPnts
  Cp = Gi * Z(:,p);                    % compute preliminary C vector
  c0 = sum(Cp) / sgi;                  % common constant across electrodes
  C = Cp - (c0 * TC');                 % compute final C vector
  for e = 1:nElec;                     % compute all CSDs ...
    X(e,p) = sum(C .* H(e,:)') / head; % ... and scale to head size
  end;
  if nargout > 1; for e = 1:nElec;     % if requested ...
    Y(e,p) = c0 + sum(C .* G(e,:)');   % ... compute all SPs
  end; end;
end;
% function [G,H] = GetGH (M, m)
% 
% Modification of GetCSD.m to obtain the G- and H-matrices, which are 
% invariant for a given EEG montage, which are then used as input with
% the EEG/ERP data to the CSD.m routine
%
% Usage: [G,H] = GetGH (M, m)
% 
% Input parameter:
%         M = a MatLab structure for an EEG montage consisting of <n> sites
%             as returned by the function ExtractEEGMontage.m;
%             M consists of:  <n> cell string array 'lab'
%                             <n> double 'theta'
%                             <n> double 'phi'
%                             <n * 2> double matrix 'xy'
%                             ('xy' values are not used in this function)
%         m = positive integer constant that affects spherical spline flexibility
%             values of m can be:    2 - most flexible interpolation
%                                    3 - medium flexibility
%                                    4 - more rigid splines (default)
%                                5..10 - increasingly rigid splines  
%
% Output parameters:
%         G = G electrodes-by-electrodes matrix (SP spline)
%         H = H electrodes-by-electrodes matrix (CSD spline)
%
% Notes: 1) GetGH.m differs from GetCSD.m by removing the first three lines
%           in the GetCSD.m code because spherical angles are defined through
%           <M>, and by removing the last line handling the EEG/ERP data
%        2) In contrast to GetCSD.m, the <m> constant may be optionally
%           specified as an input parameter (default: m = 4)
%        3) Using the G and H matrices with data not conforming to the original
%           EEG montage will not produce valid CSDs
%        4) The orientation of Theta and Phi is defined in the appendix of
%           Kayser J, Tenke CE, Clin Neurophysiol 2006;117(2):348-368), and
%           is not identical to other spherical notifications using the
%           same spherical angle names (e.g., as in EEGlab)
%
% -----------------------------------------------------------------------
% GetCSD - Compute Current Source Density (CSD) estimates using the spherical spline
%          surface Laplacian algorithm suggested by Perrin et al. (1989, 1990)
%
% (published in appendix of Kayser J, Tenke CE, Clin Neurophysiol 2006;117(2):348-368)
%
% Usage: [CE] = GetCSD(EEGmont, ERPdata);
%
% Implementation of algorithms described by Perrin, Pernier, Bertrand, and
% Echallier in Electroenceph Clin Neurophysiol 1989;72(2):184-187, and 
% Corrigenda EEG 02274 in Electroenceph Clin Neurophysiol 1990;76:565.
% 
% Input parameters:
%   EEGmont = name of EEG montage spherical coordinates file in ASCII format,
%             with rows consisting of six columns: electrode label, two 
%             spherical angles (theta, phi), and Cartesian coordinates x,y,z
%   ERPdata = name of ERP data file in ASCII format stored as 
%             electrodes-by-samples (rows-by-columns) matrix
%
% Output parameter:
%        CE = CSD estimates as electrodes-by-samples matrix
%
% Copyright (C) 2003 by Jürgen Kayser (Email: kayserj@pi.cpmc.columbia.edu)
% GNU General Public License (http://www.gnu.org/licenses/gpl.txt)
% Updated: $Date: 2005/02/11 14:00:00 $ $Author: jk $
% -----------------------------------------------------------------------
% Modified: $Date: 2009/05/12 19:12:00 $ $Author: jk $
%        - eliminate <ERPdata> input argument and direct use of CSD.m
%        - return G and H transformation matrices
%    Added: $Date: 2009/05/14 17:20:00 $ $Author: jk $
%        - input argument <m> to specify constant for spline flexibility
%    Added: $Date: 2009/05/22 11:28:00 $ $Author: jk $
%        - progress bar to indicate status of iteration 
%  Changed: $Date: 2010/04/11 19:46:00 $ $Author: jk $
%        - allow more rigid values for spline constant m [2..10] 
%
function [G,H] = GetGH (M, m)
ThetaRad = (2 * pi * M.theta) / 360;     % convert Theta and Phi to radians ...
PhiRad = (2 * pi * M.phi) / 360;         % ... and Cartesian coordinates ...
[X,Y,Z] = sph2cart(ThetaRad,PhiRad,1.0); % ... for optimal resolution
nElec = length(M.lab);                   % determine size of EEG montage
EF(nElec,nElec) = 0;                     % initialize interelectrode matrix ...
for i = 1:nElec; for j = 1:nElec;        % ... and compute all cosine distances
  EF(i,j) = 1 - ( ( (X(i) - X(j))^2 + ...
    (Y(i) - Y(j))^2 + (Z(i) - Z(j))^2 ) / 2 );
end; end;
if nargin < 2
   m = 4;                                % set m constant default
end
if ~ismember(m,[2:10])                    % verify m constant
%   disp(sprintf('Error: Invalid m = %d [use 2, 3, or 4]',[m]));
   disp(sprintf('Error: Invalid m = %d [use an integer between 2 and 10]',[m]));
   G = NaN;
   H = NaN;
   return
end
disp(sprintf('Spline flexibility:  m = %d',[m])); 
N = 50;                                  % set N iterations
G(nElec,nElec) = 0; H(nElec,nElec) = 0;  % claim memory for G- and H-matrices
fprintf('%d iterations for %d sites [',N,nElec); % intialize progress bar
for i = 1:nElec; for j = 1:nElec;
  P = zeros(N);                          % compute Legendre polynomial
  for n = 1:N;
    p = legendre(n,EF(i,j));
    P(n) = p(1);
  end;
  g = 0.0; h = 0.0;                      % compute h- and g-functions
  if j == 1; fprintf('*'); end;          % show progress
  for n = 1:N;
    g = g + ( (( 2.0*n+1.0) * P(n)) / ((n*n+n)^m    ) );
    h = h + ( ((-2.0*n-1.0) * P(n)) / ((n*n+n)^(m-1)) );
  end;
  G(i,j) =  g / 4.0 / pi;                % finalize cell of G-matrix
  H(i,j) = -h / 4.0 / pi;                % finalize cell of H-matrix
end; end;
disp(']');                               % finalize progress bar

Just so that this is here as reference.

@larsoner
Copy link
Member

larsoner commented Jun 5, 2020

(FYI, fortunately we got permission from Jurgen a couple of years ago to relicense his implementation under BSD, otherwise we could not use / look at this snippet because of the GPL license)

@alexrockhill
Copy link
Contributor Author

(FYI, fortunately we got permission from Jurgen a couple of years ago to relicense his implementation under BSD, otherwise we could not use / look at this snippet because of the GPL license)

Great I think Dennis had mentioned that earlier. Thanks so much for finishing everything @larsoner!

It might be worth also citing:

  • Cohen, M.X. (2014). Surface Laplacian In Analyzing neural time series data: theory and practice
    (pp. 275-290). London, England: The MIT Press.

The formatting for that looks complex, I'll have to look into the documentation on how citations are done now.

@larsoner
Copy link
Member

larsoner commented Jun 5, 2020

Just add entries like I did already in this PR. It's standard bibtex entries, should be easier rather than more difficult

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jun 5, 2020

Ok doing now. Where are you on projecting onto the sphere or not? Is it just preliminary commenting it out?

# from scipy.spatial.distance import squareform, pdist
# cos_dist = 1 - squareform(pdist(pos, 'sqeuclidean')) / 2.
# Project onto the sphere and do the distance (effectively)
pos /= np.linalg.norm(pos, axis=1, keepdims=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexrockhill this implicitly projects to a unit sphere. It preserves the phi/theta but sets r=1, which is the same thing as doing cart2sph, set r=1, then sph2cart...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so ideally the sphere would have the radius of the head? Is that what's done in the original?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Mike and Jurgen's code they both effectively do this (at least for a spherical montage it's equivalent), and based on computing a cosine similarity I think it's the right thing to do.

Mike never uses an actual radius, but Jurgen does (divides by head * head effectively), which conceptually makes sense to me.

@larsoner larsoner changed the title [WIP, BUG] Updated Current Source Density MRG, BUG: Updated Current Source Density Jun 5, 2020
@larsoner
Copy link
Member

larsoner commented Jun 5, 2020

Okay @alexrockhill I think I'm happy with our code now, having read through Mike's and Jurgen's code.

Alex and others added 6 commits June 5, 2020 12:24
@larsoner
Copy link
Member

larsoner commented Jun 5, 2020

Sorry @alexrockhill I had to rebase, so while I was in there I squashed our commits, hope that's okay. This one can be rebase+merged to preserve history if whoever hits the green button can remember to do so...

@alexrockhill
Copy link
Contributor Author

Sorry @alexrockhill I had to rebase, so while I was in there I squashed our commits, hope that's okay. This one can be rebase+merged to preserve history if whoever hits the green button can remember to do so...

Of course, that's great. Thanks for doing the hard parts :)

@agramfort
Copy link
Member

agramfort commented Jun 5, 2020 via email

@larsoner larsoner merged commit a6ea90c into mne-tools:master Jun 5, 2020
@larsoner
Copy link
Member

larsoner commented Jun 5, 2020

Thanks @alexrockhill !

@alexrockhill alexrockhill deleted the csd2 branch June 5, 2020 18:26
@agramfort
Copy link
Member

@larsoner you merged this without squashing so we now have in the history the big CSV files that were pushed by @alexrockhill in the first commits.... grrrrr

I saw this as it required 15s for me to fetch master :(

rewriting history to remove these files may put a mess on the current PRs...

@larsoner wdyt?

@agramfort
Copy link
Member

0f8788f is the commit now in master

@larsoner
Copy link
Member

larsoner commented Jun 7, 2020

Argh sorry forgot about those. The rebase and merge was intentional to preserve code history/blame, but I forgot about the file commits. Thanks for push forcing to fix it

@agramfort
Copy link
Member

@cbrnr @larsoner @drammock @alexrockhill @hoechenberger @GuillaumeFavelier FYI I force pushed to master to remove the 60MB of csv files that were commited by mistake from this PR. I hope it does not create too much of a mess on your PRs....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants