Skip to content

Conversation

@nbara
Copy link
Contributor

@nbara nbara commented Sep 11, 2017

This is an attempt to implement inverse coefficient computation according to [1], which has been discussed in #3884 . This is mostly useful to visualise weights for backward models (aka stimulus reconstruction).

ATM the computation is done in ReceptiveField.fit(), so I had to expose the data covariance x_xt_ in TimeDelayingRidge(). I can compute the inverse coefficients directly inside TDR if you prefer.

These are specific points where your input would be appreciated:

  • proper unit tests. I was thinking to compute a forward model, and compare the coefficients to an inverse-transformed backward model, but do we have any idea how close these two sets of coefficient should be numerically?
  • API compatibility with the decoding framework in MNE (cf ENH: More features for ReceptiveField #3884 (comment)). Since the inverse coefs are computed by default here, is it just a matter of renaming inverse_coefs_ to patterns_ ?
  • Do I need to handle intercept explicitly before the final dot product?
  • Update example

Thanks in advance.

[1] | Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.-D., Blankertz, B., & Bießmann, F. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage, 87, 96–110. doi:10.1016/j.neuroimage.2013.10.067

@codecov-io
Copy link

codecov-io commented Sep 11, 2017

Codecov Report

Merging #4546 into master will decrease coverage by 1.19%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #4546     +/-   ##
=========================================
- Coverage   87.05%   85.85%   -1.2%     
=========================================
  Files         349      349             
  Lines       65780    65830     +50     
  Branches    10203    10208      +5     
=========================================
- Hits        57263    56518    -745     
- Misses       5634     6456    +822     
+ Partials     2883     2856     -27

@kingjr
Copy link
Member

kingjr commented Sep 11, 2017

I would change x_xt_ -for cov_

I can compute the inverse coefficients directly inside TDR if you prefer.

+1 for computing them directly. How much longer does it take?

I was thinking to compute a forward model, and compare the coefficients to an inverse-transformed backward model, but do we have any idea how close these two sets of coefficient should be numerically?

Should be identical if no regularization is applied

API compatibility with the decoding framework in MNE (cf #3884 (comment)). Since the inverse coefs are computed by default here, is it just a matter of renaming inverse_coefs_ to patterns_ ?

yes

Do I need to handle intercept explicitly before the final dot product?

I would just center X and Y a priori, same as in LinearModel

proper unit tests. What do you think they should be?

  • Check X and Y data type and dims
  • Check error handling when not enough data (pinv may not be solvable is p >> n and no regularization is applied)

kingjr
kingjr previously requested changes Sep 11, 2017
np.linalg.inv(np.dot(y.T, y))])
else:
inv_coef = np.linalg.multi_dot([self.estimator_.x_xt_, coef,
np.linalg.inv(np.dot(y.T, y))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to ensure that y is centered to 0.

Prefer using np.cov when possible to simplify equation

@larsoner
Copy link
Member

+1 for computing them directly. How much longer does it take?

The autocorrelation calculation can be very slow, especially for long recordings.

@nbara
Copy link
Contributor Author

nbara commented Sep 11, 2017

Thanks for the comments @kingjr.

+1 for computing them directly. How much longer does it take?

It's exactly the same computation, the only difference is that I wouldn't expose x_xt outside of TimeDelayingRidge() (but it would be computed anyway)

@kingjr
Copy link
Member

kingjr commented Sep 11, 2017

@larsoner Ok. @nbara, could you list the exact use-cases in the PR description, to identify whether we should have an option to automatically compute the inverse in the object, notably to ensure CV compatibility.

coef = np.reshape(self.coef_, (n_feats * n_delays, n_outputs))
if not isinstance(self.estimator_, TimeDelayingRidge):
inv_coef = np.linalg.multi_dot([np.dot(X.T, X), coef,
np.linalg.inv(np.dot(y.T, y))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use linalg.inv from scipy.linalg not numpy

@nbara nbara force-pushed the inverse-transform-rf branch from 0f5f9f5 to 72df78f Compare September 12, 2017 07:34
@nbara
Copy link
Contributor Author

nbara commented Sep 12, 2017

OK, I rebased after #4543. I fixed your first few comments, but I need more tests still.

I'm still unclear as to what the consensus is regarding API : are you OK with the way things are now, or should patterns_ only be computed upon request?

If it's the latter, should it be an extra parameter to ReceptiveField(patterns=True) or a separate class function altogether, such as rf.inverse_transform(X, y)?

The current code doesn't add more than a couple seconds tops, but I guess this can add up when doing CVs and such.

@nbara nbara force-pushed the inverse-transform-rf branch from 8ba21f8 to e6ced48 Compare September 12, 2017 14:24
@larsoner
Copy link
Member

I'd rather compute them because I worry about the speed of recalculating. But @kingjr understands the sklearn ramifications better, will it make things worse there?

@larsoner
Copy link
Member

Sorry, I mean I'd rather store the XtX and Xy than recompute them.

@larsoner
Copy link
Member

Can some example be updated to show the utility of this?

@nbara
Copy link
Contributor Author

nbara commented Sep 12, 2017

Can some example be updated to show the utility of this?

Yep, sure. Can update the existing example to show Backward modelling+inverse patterns on the same data.

@kingjr
Copy link
Member

kingjr commented Sep 12, 2017

I'm in favour of ReceptiveField(patterns=True)

@larsoner
Copy link
Member

Default False to save time or True for convenience?

@kingjr
Copy link
Member

kingjr commented Sep 12, 2017

Default False for time

@nbara nbara force-pushed the inverse-transform-rf branch from a776fcd to a402d19 Compare September 14, 2017 13:41
@larsoner
Copy link
Member

@nbara the positive-lags PR was merged so feel free to rebase

@larsoner
Copy link
Member

(and update the examples/tutorials as necessary)

@nbara nbara force-pushed the inverse-transform-rf branch from a402d19 to 9405aea Compare September 15, 2017 17:36
@nbara
Copy link
Contributor Author

nbara commented Sep 15, 2017

@larsoner I've rebased again and updated the example. Let me know what you think.

FWIW I can't reproduce FIG5 in Crosse et al. I've tried using sklearn's Ridge regression, as well as different hyperparameter values to match the paper, but it doesn't look similar. Maybe I've missed something.

Y, _ = raw[:] # Outputs for the model
Y = Y.T
EEG, _ = raw[:] # Outputs for the model
EEG = EEG.T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't rename variables as PR can become controversial for little gain.

we usually use data (not EEG) and here it is written Y as it is the standard name for predicted target variable in ML

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The EEG is not the predicted variable in the backward case so I thought it'd make more sense.

@nbara
Copy link
Contributor Author

nbara commented Sep 18, 2017

If you have an idea of what a good test would be to check that the inverse coefficients have the right values besides checking that np.dot(coef.T, patterns) = np.eye(...) please let me know.

PS: squeezed in PR #4205 in whats_new as I'd forgotten back then. let me know if that's not kosher.

@nbara nbara force-pushed the inverse-transform-rf branch 4 times, most recently from 87eb7a7 to cac4223 Compare September 18, 2017 13:38
Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linear encoding model using the continuously-varying speech envelope to
predict activity of a 128 channel EEG system.
can perform a similar function along with scikit-learn. We will first fit a
linear encoding (or forward) model using the continuously-varying speech
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going to get an angry GIF from @choldgraf, I had a similar "encoding (or forward)" in my PR and it did not go over well

Copy link
Contributor

@choldgraf choldgraf Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I am fine with it if they're both specified, as long as encoding/decoding come first ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also now I feel like a curmudgeonly old academic


# Initialize the model. Here the features are the EEG data. We also specify
# ``patterns=True`` to compute forward-transformed coefficients during model
# fitting (cf. next section). We'll use a Ridge regression estimator with a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ridge regression (no caps)

# recreate `figure 5`_ from [1]_. The backward model weights reflect the
# channels that contribute most toward reconstructing the stimulus signal, but
# are not directly interpretable in a neurophysiological sense. Here we also
# look at the inversed model weights, which are easier to understand [2]_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last sentence should probably start another section (so the plot is separate) and also give some explanation about what you mean about being easier to understand -- one or two sentences is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree - we want one figure per section

y = y.reshape(-1, y.shape[-1], order='F')
else:
X = X - X.mean(0, keepdims=True)
cov_ = np.cov(X.T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the TimeDelayingRidge class to have a patterns option? Then this can be stored instead of being recomputed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could create a _compute_patterns(model, X, y) that would be shared between them.

Copy link
Contributor Author

@nbara nbara Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner Not sure I understand. What is being recomputed here ? I'm only computing the pattern once either way...

Do you want the pattern/inverse coef to be computed inside TDR if pattern=True ? That means there's going to be some code redundancy between receptive_field.py and TDR.py, is that OK?

Copy link
Contributor Author

@nbara nbara Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner Sorry if I'm being slow but can you elaborate on what it is you want to see?

Or are you happy with the current state of the code ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you've already changed the code to be fast for TDR so I think we're good

del X

# Inverse output covariance
inv_Y = 1. / float(n_times * n_epochs - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go in an else clause because it's not used in the y.ndim == 2 and y.shape[1] != 1 case

# assert_allclose(inv_rf.coef_, np.transpose(rf.patterns_, (1, 0, 2)),
# rtol=0.1)
# assert_allclose(rf.coef_, np.transpose(inv_rf.patterns_, (1, 0, 2)),
# atol=0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might only be true if regularization is 0., can you check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my sense is that the regularization will do some strange things. Plus we're including extra information in there when we do the inverse transform. If we just do encoding model ridge, we're getting weights by normalizing w/ the stimulus covariance matrix, if we do an inverted decoding model, we're getting weights normalized w/ the channels covariance matrix, then normalized again by the stimulus covariance matrix, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner Yeah I checked and this also fails without regularization. Like @choldgraf I tend to think that they're something going on with the regularisation, although I can't put my finger on what exactly.

mne.viz.plot_topomap(np.mean(mean_patterns[:, ix_plot], axis=1),
pos=info, axes=ax[1],
show=False, vmin=-max_patterns, vmax=max_patterns)
ax[1].set(title="Forward-transformed coefficients")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the "for delay ..." so there is a better correspondence with the earlier section?

@nbara nbara force-pushed the inverse-transform-rf branch from 786983e to 6d1b6e4 Compare September 20, 2017 07:25
@choldgraf
Copy link
Contributor

@kingjr left another block on this PR...I'm hesitant to merge until he unblocks it...

@choldgraf
Copy link
Contributor

@choldgraf
Copy link
Contributor

LGTM - my only question is the delays in the final plot (-.14 and -.125). I think they're negative because stimulus activity lags in front of the brain activity, which in the model will show up as positive delays (aka, from the perspective of the stimulus, brain activity in the future is predicting stimulus activity now). I think we should mention this so that it doesn't confuse people.

@nbara
Copy link
Contributor Author

nbara commented Sep 20, 2017

@choldgraf How about this as a comment just prior to specifying the lags :

We use the same lags as in [1]. Negative lags now index the relationship between the neural response and the speech envelope earlier in time, whereas positive lags would index how a unit change in the amplitude of the EEG would affect later stimulus activity (obviously this should have an amplitude of zero).

@kingjr
Copy link
Member

kingjr commented Sep 20, 2017

LGTM - my only question is the delays in the final plot (-.14 and -.125) ...

? IMO - should always be the past.

@kingjr
Copy link
Member

kingjr commented Sep 20, 2017

@kingjr left another block on this PR...I'm hesitant to merge until he unblocks it...

I can't (because it's been rebranched?), but my comment has been addressed

@larsoner larsoner dismissed kingjr’s stale review September 20, 2017 17:49

He said it's okay

@larsoner
Copy link
Member

@nbara I like the proposed change, let's see if @choldgraf agrees then merge

@larsoner larsoner added this to the 0.15 milestone Sep 20, 2017
@choldgraf
Copy link
Contributor

nbara and others added 8 commits September 21, 2017 09:03
only works with estimator=Ridge() at the moment
- demean prior to computing coefficients
- rename inverse_coef_ to patterns_
- scipy.linalg
- some tests (incomplete)
Can't reproduce Lalor's figures.
+ squeezed in GDF support in whatsnew
- replace `backward` by `decoding`
- rewrite example
- fix section headers in example

TODO: add patterns option to TimeDelayingRidge
@nbara nbara force-pushed the inverse-transform-rf branch 2 times, most recently from 3a2310d to 9d39873 Compare September 21, 2017 07:08
@nbara nbara force-pushed the inverse-transform-rf branch from 9d39873 to 4bb7955 Compare September 21, 2017 07:11
@nbara
Copy link
Contributor Author

nbara commented Sep 21, 2017

@larsoner OK it's pushed, but CircleCI fails. I'm not sure why. Let me know if there's anything to do on my end.

@larsoner
Copy link
Member

@larsoner larsoner merged commit ce040f0 into mne-tools:master Sep 21, 2017
@larsoner
Copy link
Member

Thanks @nbara !

@choldgraf
Copy link
Contributor

woooo way to go! nice addition!

@nbara nbara deleted the inverse-transform-rf branch September 22, 2017 07:13
@nbara nbara restored the inverse-transform-rf branch November 22, 2018 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants