Validated new algorithm for ICA MEG reference channel method #7212

jshanna100 · 2020-01-15T13:15:40Z

This revists the use of ICA decompositions on reference channel data to remove noise from the main channels. This was first implemented in #5807. Adding a 2nd algorithm was discussed in #5959, but ultimately postponed until the method was validated. A validation using simulations has now been successfully carried out, and the results will be published shortly in the Journal of Neuroscience Methods (preprint: https://arxiv.org/abs/2001.03397).

This PR has the following changes:

the addition of the "together" algorithm
The "together" algorithm required that mne.preprocessing.bads.find_outliers could restrict itself to one-tail distributions. This is now possible.
Not strictly related to the new algorithm, but preprocessing.ICA.find_bads_ch now has a bad_measure parameter, where one can specify whether to use "zscore" (default), which is the iterated Z score method, or "cor", which is the traditional, raw correlation threshold method.

… correlation thesholds instead of Z score

agramfort · 2020-01-15T14:22:00Z

do we have an MNE dataset that allows to illustrate the benefit of this code in an example?

larsoner

As for datasets, it looks like we use gradient compensation with this dataset:

raw_fname = op.join(data_path, 'MEG', 'bst_resting',
                    'subj002_spontaneous_20111102_01_AUX.ds')

Can you see if there are some artifacts that can be removed with your method?

If not, maybe one of the other brainstorm datasets?

mne/preprocessing/bads.py

mne/preprocessing/ica.py

jshanna100 · 2020-01-17T09:57:38Z

Regarding datasets, I've tried all of the brainstorm, as well as the BTI Phantom. Unfortunately, they're all too clean. There may be an inherent selection bias problem, here: The method focusses on intermittent external noise which isn't corrected by fixed weights solutions, but presumably a dataset which has such intermittent noise would be considered not suitable as an example dataset.

agramfort

It would be really nice to have an example to show how to use this from raw data. Worse case can we use a simulation if we don't have an adapted public dataset we can use?

mne/preprocessing/bads.py

agramfort · 2020-01-18T09:41:49Z

mne/preprocessing/ica.py

-                      stop=None, l_freq=None, h_freq=None,
-                      reject_by_annotation=True, verbose=None):
+                      stop=None, l_freq=None, h_freq=None, method='together',
+                      reject_by_annotation=True, bad_measure="zscore",


bad_measure -> measure? function already has bads in its name

Sure. Should I also extend this option to EOG and ECG?

+1 to do it for consistency

mne/preprocessing/ica.py

jshanna100 · 2020-01-21T11:30:29Z

I'll see what I can do with simulated data. There should be a good solution there, somewhere...

…arameter expanded to EOG/ECG

codecov · 2020-01-21T13:43:22Z

Codecov Report

Merging #7212 into master will decrease coverage by 0.01%.
The diff coverage is 61.9%.

@@            Coverage Diff             @@
##           master    #7212      +/-   ##
==========================================
- Coverage   89.77%   89.76%   -0.02%     
==========================================
  Files         445      445              
  Lines       79873    79897      +24     
  Branches    12773    12781       +8     
==========================================
+ Hits        71703    71716      +13     
- Misses       5372     5379       +7     
- Partials     2798     2802       +4

jshanna100 · 2020-01-21T13:43:48Z

Hopefully these cover all the issues in review. The example should follow in a few days.

bloyl · 2020-02-26T20:52:31Z

Artemis data has a reference array that we are currently not using (certainly not effectively) that might be useful for an example.

There is 1 second hpi phantom run in the testing_data that might be useful.

raw_fname = op.join(mne.datasets.testing.data_path(), 'ARTEMIS123', 'Artemis_Data_2017-04-14-10h-38m-59s_Phantom_1k_HPI_1s.txt')

hpis are at 140hz, 150hz, 160hz everything else is 'noise'.

@jshanna100 if that looks promising, I may be able to find subject who I can share, either offline or via some mne.dataset mechanism.

larsoner · 2020-03-04T15:12:24Z

@jshanna100 any chance to get to this in the next week or two to make it into the 0.20 release?

jshanna100 · 2020-03-04T15:42:16Z

Possibly within two weeks, definitely not in one. Sorry, a bunch of deadlines with work threw me off track here!

jshanna100 · 2020-03-04T16:19:42Z

@larsoner @agramfort doing simulations of external noise that also manifest in the reference channels will require also making small modifications to forward and simulate. I already did these a while ago for doing the simulations I needed to validate the methods. They are contained in the refsim branch on my MNE fork.

master...jshanna100:refsim

Should we go forward with this? Alternatively, we have a real MEG dataset that demonstrates the technique very well. I'm not sure how conservative you are with adding new datasets...

larsoner · 2020-03-04T16:22:09Z

Alternatively, we have a real MEG dataset that demonstrates the technique very well. I'm not sure how conservative you are with adding new datasets...

Real data would be nicer than simulations I think. You could take your data file and crop it to some suitable length (30 sec? 60 sec?) to demonstrate the method's utility, then we add it as a new mne.datasets dataset.

larsoner · 2020-03-04T16:22:56Z

Here is an example of what's needed at the MNE end:

50f4d64#diff-3d9673bf0e7407f5bb6bf482aada4fb2

jshanna100 · 2020-05-20T14:13:44Z

Let me know what you think. The dataset is unfortunately fairly large at around 90MB - also the two full ICA decompositions that run in the example do not get done so quickly. I spent some time trying to get it down as much as I could, but the "together" algorithm does not seem to function well with very short stretches of data.

I was able to reduce it by a lot however by downsampling to 100Hz, and throwing out every other magnetometer.

larsoner · 2020-05-20T15:42:34Z

The dataset is unfortunately fairly large at around 90MB

sample is > 1 GB so I would not worry about < 100 MB...

the two full ICA decompositions that run in the example do not get done so quickly

Have you tried method='picard' to see if it can achieve the same or better results with fewer iterations (or the same number of iterations but faster)?

larsoner · 2020-05-20T15:44:58Z

/home/circleci/project/doc/auto_examples/preprocessing/plot_find_ref_artifacts.rst:14: WARNING: Title overline too short.
/home/circleci/project/mne/preprocessing/ica.py:docstring of mne.preprocessing.ICA.find_bads_ref:93: WARNING: Too many autonumbered footnote references: only 0 corresponding footnotes available.
/home/circleci/project/mne/preprocessing/ica.py:docstring of mne.preprocessing.ICA.find_bads_ref:93: WARNING: Unknown target name: "hannaetal2020".
/home/circleci/project/doc/overview/datasets_index.rst:412: WARNING: Title underline too short.

and

181.67 sec   1884.7 MB

This is going to be our longest-running example, and memory > 1.5 GB is typically problematic because it can cause CircleCI to crash on a full run. Okay if I take a look and see if there is some way to speed it up?

and

E   mne.preprocessing.ica.ICA.find_bads_ecg : GL03 : Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings
3353

examples/preprocessing/plot_find_ref_artifacts.py

mne/preprocessing/ica.py

larsoner · 2020-05-20T15:47:37Z

mne/preprocessing/ica.py

+            Method to use to identify reference channel related components.
+            Defaults to "together." See notes.
+
+            .. versionadded:: 0.20


Update version

mne/preprocessing/ica.py

larsoner · 2020-05-20T15:48:39Z

mne/preprocessing/ica.py

+        measure : {'zscore', "cor"}
+            Which method to use for finding outliers. "zscore" (default) is
+            the iterated Z-scoring method, and "cor" is an absolute raw
+            correlation threshold with a range of 0 to 1.


versionadded

jshanna100 · 2020-05-20T16:38:58Z

One of the two ICA decompositions is actually superfluous, so that now nearly cuts it in half. Picard does not converge, even with the default number of iterations. Presently I have it set at the default "fastica." Feel free to have a go at speeding it up.

larsoner · 2020-05-20T16:46:02Z

Style run failed

jshanna100 · 2020-05-20T16:51:23Z

Hmm. It's passing on my side. What could be causing the discrepancy?

larsoner · 2020-05-20T16:54:34Z

examples/preprocessing/plot_find_ref_artifacts.py

+# external magnetic noise
+ref_comps = ica_ref.get_sources(raw_sep)
+for c in ref_comps.ch_names: # they need to have REF_ prefix to be recognised
+    ref_comps.rename_channels({c:"REF_" + c})


For example this is definitely E231 missing whitespace after :

Did you forget to push? Does make flake actually run on your machine?

larsoner · 2020-05-20T16:55:35Z

Presently I have it set at the default "fastica." Feel free to have a go at speeding it up.

I see method='infomax' in the example in two places, is it meant to be like this / necessary?

jshanna100 · 2020-05-20T16:56:56Z

Oh, sorry, I see now I forgot to stage the new plot_find_ref_artifacts.py. Here it comes...

…mmit.

larsoner · 2020-05-20T17:09:04Z

I can't push commits because edits by maintainers are not allowed. I also can't seem to open a PR to your repo for some reason, so you can cherry-pick this if you want

larsoner@85bda0b

jshanna100 · 2020-05-20T17:19:54Z

That's about a five-fold speed up on my machine.

larsoner · 2020-05-20T17:30:46Z

@jshanna100 in the future you can do:

git remote add larsoner https://github.com/larsoner/mne-python.git
git fetch larsoner
git cherry-pick 4799f17
git push

And it would have preserved the history instead of the lines becoming yours. Not a problem here really but can be useful / easier than manual copy-paste + commit of the changes.

larsoner · 2020-05-20T19:55:42Z

mne/preprocessing/ica.py

+        ----------
+        .. footbibliography::
+
        .. versionadded:: 0.18


versionadded should stay in Notes section, not move to References

larsoner · 2020-05-20T20:50:49Z

.circleci/config.yml

                      python -c "import mne; print(mne.datasets.limo.data_path(subject=1, update_path=True))";
                    fi;
+                    if [[ $(cat $FNAME | grep -x ".*datasets.*megref_noise.*" | wc -l) -gt 0 ]]; then
+                      python -c "import mne; print(mne.datasets.megref_noise.data_path(update_path=True))";


These two lines must be wrong because CircleCI has the download status:

https://20202-1301584-gh.circle-artifacts.com/0/dev/auto_examples/preprocessing/plot_find_ref_artifacts.html#sphx-glr-auto-examples-preprocessing-plot-find-ref-artifacts-py

Looks like megref_noise -> refmeg_noise

Can you check the output and see if it's otherwise how you want it to look?

The topomaps look a bit crazy but that's a separate plotting bug I think (feel free to open an issue), not something you changed here

The first plot now looks more cramped than it did before, but I could live with it. Otherwise it seems fine.

I noticed the topomaps as well. My guess is that this is somehow related to throwing out every other channel to reduce the size of the dataset.

larsoner

Looks reasonable to me. @drammock do you want to look?

drammock · 2020-05-21T18:26:46Z

doc/references.bib

  year = {1989}
 }

+@article{HannaEtAl2020,


drammock · 2020-05-21T18:35:02Z

examples/preprocessing/plot_find_ref_artifacts.py

+.. _ex-megnoise_processing:
+
+====================================
+Find MEG reference channel artefacts


throughout the rest of the docstrings / documentation we use the "artifact" spelling variant (not "artefact"); please change to be consistent.

drammock · 2020-05-21T18:37:39Z

examples/preprocessing/plot_find_ref_artifacts.py

+
+Use ICA decompositions of MEG reference channels to remove intermittent noise.
+
+Many MEG systems have an array of reference channels which are used to remove


Suggested change

Many MEG systems have an array of reference channels which are used to remove

Many MEG systems have an array of reference channels which are used to detect

drammock · 2020-05-21T18:39:22Z

examples/preprocessing/plot_find_ref_artifacts.py

+Use ICA decompositions of MEG reference channels to remove intermittent noise.
+
+Many MEG systems have an array of reference channels which are used to remove
+external magnetic noise. However, standard removal techniques often fail when


Suggested change

external magnetic noise. However, standard removal techniques often fail when

external magnetic noise. However, standard techniques that use reference

channels to remove noise from standard channels often fail when

drammock · 2020-05-21T18:40:25Z

examples/preprocessing/plot_find_ref_artifacts.py

+
+Many MEG systems have an array of reference channels which are used to remove
+external magnetic noise. However, standard removal techniques often fail when
+noise is intermittent. This technique often succeeds where the standard


Suggested change

noise is intermittent. This technique often succeeds where the standard

noise is intermittent. The technique described here (using ICA on the reference

channels) often succeeds where the standard

drammock · 2020-05-21T20:19:36Z

mne/preprocessing/ica.py

+        is similar to an EOG/ECG, with reference components replacing the
+        EOG/ECG channels. Recommended procedure is to perform ICA separately
+        on reference channels, extract them using .get_sources(), and then
+        append them to the inst using .add_channels(), preferably with the


Suggested change

append them to the inst using .add_channels(), preferably with the

append them to the inst using :meth:`~mne.io.Raw.add_channels`,

preferably with the

drammock · 2020-05-21T20:19:59Z

mne/preprocessing/ica.py

+        EOG/ECG channels. Recommended procedure is to perform ICA separately
+        on reference channels, extract them using .get_sources(), and then
+        append them to the inst using .add_channels(), preferably with the
+        prefix REF_ICA so that they can be automatically detected.


Suggested change

prefix REF_ICA so that they can be automatically detected.

prefix ``REF_ICA`` so that they can be automatically detected.

drammock · 2020-05-21T20:20:25Z

mne/preprocessing/ica.py

+        prefix REF_ICA so that they can be automatically detected.
+
+        Thresholding in both cases is based on adaptive z-scoring:
+        The above threshold components will be masked and the z-score will be


Suggested change

The above threshold components will be masked and the z-score will be

The above-threshold components will be masked and the z-score will be

drammock · 2020-05-21T20:21:05Z

mne/preprocessing/ica.py

+        recomputed until no supra-threshold component remains.
+
+        Validation and further documentation for this technique can be found
+        in :footcite:`HannaEtAl2020`


Suggested change

in :footcite:`HannaEtAl2020`

in :footcite:`HannaEtAl2020`.

drammock · 2020-05-21T20:22:02Z

mne/preprocessing/ica.py

            If True, data annotated as bad will be omitted. Defaults to True.

            .. versionadded:: 0.14.0
+        measure : {'zscore', "cor"}


Suggested change

measure : {'zscore', "cor"}

measure : 'zscore' | 'cor'

jshanna100 · 2020-05-22T10:03:18Z

Thanks @drammock, the raw.plot in particular looks much better now.

larsoner · 2020-05-22T14:09:03Z

CircleCI failure is unrelated. Last idea: raw.plot at the end to mirror the one at the beginning? It's another way of showing how the data has been cleaned. Still can't direct push or open a PR to your fork, so feel free to cherry-pick 4c3ae06 from larsoner:icaref if you agree

larsoner · 2020-05-22T14:13:18Z

Coverage actually looks okay so I think we can ignore that one, too

larsoner · 2020-05-22T14:17:48Z

Commit updated on larsoner:icaref to be 5952c8a

jshanna100 · 2020-05-22T14:44:51Z

I'm away from my computer for the day. I've tried again enabling contributions. If it works, feel free to try any changes. Otherwise I'll add the plot tomorrow.

larsoner · 2020-05-22T15:44:47Z

To github.com:/jshanna100/mne-python.git
 ! [remote rejected]     icaref -> icaref (permission denied)

Does not seem to work unfortunately

larsoner · 2020-05-22T15:46:45Z

I opened #7810, will merge if CIs come back happy and everything looks okay. Thanks in advance @jshanna100 !

jeff added 2 commits January 15, 2020 13:35

reference ica "together" algorithm, tests, add possibility for direct…

047f32f

… correlation thesholds instead of Z score

docs for ICA.find_bads_ch method parameter bad_measure

c235eca

jshanna100 requested a review from larsoner January 15, 2020 13:15

larsoner reviewed Jan 15, 2020

View reviewed changes

mne/preprocessing/bads.py Outdated Show resolved Hide resolved

mne/preprocessing/ica.py Outdated Show resolved Hide resolved

mne/preprocessing/ica.py Show resolved Hide resolved

mne/preprocessing/ica.py Show resolved Hide resolved

agramfort reviewed Jan 18, 2020

View reviewed changes

doc changes, "tail" parameter conformed to general usage, "measure" p…

4104a79

…arameter expanded to EOG/ECG

larsoner added this to the 0.21 milestone Mar 4, 2020

jeff added 2 commits May 19, 2020 19:36

Merge remote-tracking branch 'upstream/master' into icaref

025b5f6

Add dataset "refmeg-noise" and example "find_ref_artifacts"

bd158f9