Skip to content

Conversation

@jshanna100
Copy link

This revists the use of ICA decompositions on reference channel data to remove noise from the main channels. This was first implemented in #5807. Adding a 2nd algorithm was discussed in #5959, but ultimately postponed until the method was validated. A validation using simulations has now been successfully carried out, and the results will be published shortly in the Journal of Neuroscience Methods (preprint: https://arxiv.org/abs/2001.03397).

This PR has the following changes:

  1. the addition of the "together" algorithm
  2. The "together" algorithm required that mne.preprocessing.bads.find_outliers could restrict itself to one-tail distributions. This is now possible.
  3. Not strictly related to the new algorithm, but preprocessing.ICA.find_bads_ch now has a bad_measure parameter, where one can specify whether to use "zscore" (default), which is the iterated Z score method, or "cor", which is the traditional, raw correlation threshold method.

@jshanna100 jshanna100 requested a review from larsoner January 15, 2020 13:15
@agramfort
Copy link
Member

do we have an MNE dataset that allows to illustrate the benefit of this code in an example?

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for datasets, it looks like we use gradient compensation with this dataset:

raw_fname = op.join(data_path, 'MEG', 'bst_resting',
                    'subj002_spontaneous_20111102_01_AUX.ds')

Can you see if there are some artifacts that can be removed with your method?

If not, maybe one of the other brainstorm datasets?

@jshanna100
Copy link
Author

Regarding datasets, I've tried all of the brainstorm, as well as the BTI Phantom. Unfortunately, they're all too clean. There may be an inherent selection bias problem, here: The method focusses on intermittent external noise which isn't corrected by fixed weights solutions, but presumably a dataset which has such intermittent noise would be considered not suitable as an example dataset.

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice to have an example to show how to use this from raw data. Worse case can we use a simulation if we don't have an adapted public dataset we can use?

stop=None, l_freq=None, h_freq=None,
reject_by_annotation=True, verbose=None):
stop=None, l_freq=None, h_freq=None, method='together',
reject_by_annotation=True, bad_measure="zscore",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad_measure -> measure? function already has bads in its name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Should I also extend this option to EOG and ECG?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to do it for consistency

@jshanna100
Copy link
Author

I'll see what I can do with simulated data. There should be a good solution there, somewhere...

@codecov
Copy link

codecov bot commented Jan 21, 2020

Codecov Report

Merging #7212 into master will decrease coverage by 0.01%.
The diff coverage is 61.9%.

@@            Coverage Diff             @@
##           master    #7212      +/-   ##
==========================================
- Coverage   89.77%   89.76%   -0.02%     
==========================================
  Files         445      445              
  Lines       79873    79897      +24     
  Branches    12773    12781       +8     
==========================================
+ Hits        71703    71716      +13     
- Misses       5372     5379       +7     
- Partials     2798     2802       +4

@jshanna100
Copy link
Author

Hopefully these cover all the issues in review. The example should follow in a few days.

@bloyl
Copy link
Contributor

bloyl commented Feb 26, 2020

Artemis data has a reference array that we are currently not using (certainly not effectively) that might be useful for an example.

There is 1 second hpi phantom run in the testing_data that might be useful.

raw_fname = op.join(mne.datasets.testing.data_path(), 'ARTEMIS123', 'Artemis_Data_2017-04-14-10h-38m-59s_Phantom_1k_HPI_1s.txt')

hpis are at 140hz, 150hz, 160hz everything else is 'noise'.

@jshanna100 if that looks promising, I may be able to find subject who I can share, either offline or via some mne.dataset mechanism.

@larsoner
Copy link
Member

larsoner commented Mar 4, 2020

@jshanna100 any chance to get to this in the next week or two to make it into the 0.20 release?

@jshanna100
Copy link
Author

Possibly within two weeks, definitely not in one. Sorry, a bunch of deadlines with work threw me off track here!

@larsoner larsoner added this to the 0.21 milestone Mar 4, 2020
@jshanna100
Copy link
Author

@larsoner @agramfort doing simulations of external noise that also manifest in the reference channels will require also making small modifications to forward and simulate. I already did these a while ago for doing the simulations I needed to validate the methods. They are contained in the refsim branch on my MNE fork.

master...jshanna100:refsim

Should we go forward with this? Alternatively, we have a real MEG dataset that demonstrates the technique very well. I'm not sure how conservative you are with adding new datasets...

@larsoner
Copy link
Member

larsoner commented Mar 4, 2020

Alternatively, we have a real MEG dataset that demonstrates the technique very well. I'm not sure how conservative you are with adding new datasets...

Real data would be nicer than simulations I think. You could take your data file and crop it to some suitable length (30 sec? 60 sec?) to demonstrate the method's utility, then we add it as a new mne.datasets dataset.

@larsoner
Copy link
Member

larsoner commented Mar 4, 2020

Here is an example of what's needed at the MNE end:

50f4d64#diff-3d9673bf0e7407f5bb6bf482aada4fb2

@jshanna100
Copy link
Author

Let me know what you think. The dataset is unfortunately fairly large at around 90MB - also the two full ICA decompositions that run in the example do not get done so quickly. I spent some time trying to get it down as much as I could, but the "together" algorithm does not seem to function well with very short stretches of data.

I was able to reduce it by a lot however by downsampling to 100Hz, and throwing out every other magnetometer.

@larsoner
Copy link
Member

The dataset is unfortunately fairly large at around 90MB

sample is > 1 GB so I would not worry about < 100 MB...

the two full ICA decompositions that run in the example do not get done so quickly

Have you tried method='picard' to see if it can achieve the same or better results with fewer iterations (or the same number of iterations but faster)?

@larsoner
Copy link
Member

/home/circleci/project/doc/auto_examples/preprocessing/plot_find_ref_artifacts.rst:14: WARNING: Title overline too short.
/home/circleci/project/mne/preprocessing/ica.py:docstring of mne.preprocessing.ICA.find_bads_ref:93: WARNING: Too many autonumbered footnote references: only 0 corresponding footnotes available.
/home/circleci/project/mne/preprocessing/ica.py:docstring of mne.preprocessing.ICA.find_bads_ref:93: WARNING: Unknown target name: "hannaetal2020".
/home/circleci/project/doc/overview/datasets_index.rst:412: WARNING: Title underline too short.

and

181.67 sec   1884.7 MB

This is going to be our longest-running example, and memory > 1.5 GB is typically problematic because it can cause CircleCI to crash on a full run. Okay if I take a look and see if there is some way to speed it up?

and

E   mne.preprocessing.ica.ICA.find_bads_ecg : GL03 : Double line break found; please use only one blank line to separate sections or paragraphs, and do not leave blank lines at the end of docstrings
3353

Method to use to identify reference channel related components.
Defaults to "together." See notes.

.. versionadded:: 0.20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update version

measure : {'zscore', "cor"}
Which method to use for finding outliers. "zscore" (default) is
the iterated Z-scoring method, and "cor" is an absolute raw
correlation threshold with a range of 0 to 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

versionadded

@jshanna100
Copy link
Author

One of the two ICA decompositions is actually superfluous, so that now nearly cuts it in half. Picard does not converge, even with the default number of iterations. Presently I have it set at the default "fastica." Feel free to have a go at speeding it up.

@larsoner
Copy link
Member

Style run failed

@jshanna100
Copy link
Author

Hmm. It's passing on my side. What could be causing the discrepancy?

# external magnetic noise
ref_comps = ica_ref.get_sources(raw_sep)
for c in ref_comps.ch_names: # they need to have REF_ prefix to be recognised
ref_comps.rename_channels({c:"REF_" + c})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example this is definitely E231 missing whitespace after :

Did you forget to push? Does make flake actually run on your machine?

@larsoner
Copy link
Member

Presently I have it set at the default "fastica." Feel free to have a go at speeding it up.

I see method='infomax' in the example in two places, is it meant to be like this / necessary?

@jshanna100
Copy link
Author

Oh, sorry, I see now I forgot to stage the new plot_find_ref_artifacts.py. Here it comes...

@larsoner
Copy link
Member

I can't push commits because edits by maintainers are not allowed. I also can't seem to open a PR to your repo for some reason, so you can cherry-pick this if you want

larsoner@85bda0b

@jshanna100
Copy link
Author

That's about a five-fold speed up on my machine.

@larsoner
Copy link
Member

@jshanna100 in the future you can do:

git remote add larsoner https://github.com/larsoner/mne-python.git
git fetch larsoner
git cherry-pick 4799f17
git push

And it would have preserved the history instead of the lines becoming yours. Not a problem here really but can be useful / easier than manual copy-paste + commit of the changes.

----------
.. footbibliography::

.. versionadded:: 0.18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

versionadded should stay in Notes section, not move to References

python -c "import mne; print(mne.datasets.limo.data_path(subject=1, update_path=True))";
fi;
if [[ $(cat $FNAME | grep -x ".*datasets.*megref_noise.*" | wc -l) -gt 0 ]]; then
python -c "import mne; print(mne.datasets.megref_noise.data_path(update_path=True))";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines must be wrong because CircleCI has the download status:

https://20202-1301584-gh.circle-artifacts.com/0/dev/auto_examples/preprocessing/plot_find_ref_artifacts.html#sphx-glr-auto-examples-preprocessing-plot-find-ref-artifacts-py

Looks like megref_noise -> refmeg_noise

Can you check the output and see if it's otherwise how you want it to look?

The topomaps look a bit crazy but that's a separate plotting bug I think (feel free to open an issue), not something you changed here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first plot now looks more cramped than it did before, but I could live with it. Otherwise it seems fine.

I noticed the topomaps as well. My guess is that this is somehow related to throwing out every other channel to reduce the size of the dataset.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. @drammock do you want to look?

year = {1989}
}

@article{HannaEtAl2020,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs DOI

.. _ex-megnoise_processing:

====================================
Find MEG reference channel artefacts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throughout the rest of the docstrings / documentation we use the "artifact" spelling variant (not "artefact"); please change to be consistent.


Use ICA decompositions of MEG reference channels to remove intermittent noise.

Many MEG systems have an array of reference channels which are used to remove
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Many MEG systems have an array of reference channels which are used to remove
Many MEG systems have an array of reference channels which are used to detect

Use ICA decompositions of MEG reference channels to remove intermittent noise.

Many MEG systems have an array of reference channels which are used to remove
external magnetic noise. However, standard removal techniques often fail when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
external magnetic noise. However, standard removal techniques often fail when
external magnetic noise. However, standard techniques that use reference
channels to remove noise from standard channels often fail when


Many MEG systems have an array of reference channels which are used to remove
external magnetic noise. However, standard removal techniques often fail when
noise is intermittent. This technique often succeeds where the standard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
noise is intermittent. This technique often succeeds where the standard
noise is intermittent. The technique described here (using ICA on the reference
channels) often succeeds where the standard

is similar to an EOG/ECG, with reference components replacing the
EOG/ECG channels. Recommended procedure is to perform ICA separately
on reference channels, extract them using .get_sources(), and then
append them to the inst using .add_channels(), preferably with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
append them to the inst using .add_channels(), preferably with the
append them to the inst using :meth:`~mne.io.Raw.add_channels`,
preferably with the

EOG/ECG channels. Recommended procedure is to perform ICA separately
on reference channels, extract them using .get_sources(), and then
append them to the inst using .add_channels(), preferably with the
prefix REF_ICA so that they can be automatically detected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prefix REF_ICA so that they can be automatically detected.
prefix ``REF_ICA`` so that they can be automatically detected.

prefix REF_ICA so that they can be automatically detected.

Thresholding in both cases is based on adaptive z-scoring:
The above threshold components will be masked and the z-score will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The above threshold components will be masked and the z-score will be
The above-threshold components will be masked and the z-score will be

recomputed until no supra-threshold component remains.

Validation and further documentation for this technique can be found
in :footcite:`HannaEtAl2020`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in :footcite:`HannaEtAl2020`
in :footcite:`HannaEtAl2020`.

If True, data annotated as bad will be omitted. Defaults to True.

.. versionadded:: 0.14.0
measure : {'zscore', "cor"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
measure : {'zscore', "cor"}
measure : 'zscore' | 'cor'

@jshanna100
Copy link
Author

Thanks @drammock, the raw.plot in particular looks much better now.

@larsoner
Copy link
Member

CircleCI failure is unrelated. Last idea: raw.plot at the end to mirror the one at the beginning? It's another way of showing how the data has been cleaned. Still can't direct push or open a PR to your fork, so feel free to cherry-pick 4c3ae06 from larsoner:icaref if you agree

@larsoner
Copy link
Member

Coverage actually looks okay so I think we can ignore that one, too

@larsoner
Copy link
Member

Commit updated on larsoner:icaref to be 5952c8a

@jshanna100
Copy link
Author

I'm away from my computer for the day. I've tried again enabling contributions. If it works, feel free to try any changes. Otherwise I'll add the plot tomorrow.

@larsoner
Copy link
Member

To github.com:/jshanna100/mne-python.git
 ! [remote rejected]     icaref -> icaref (permission denied)

Does not seem to work unfortunately

@larsoner
Copy link
Member

I opened #7810, will merge if CIs come back happy and everything looks okay. Thanks in advance @jshanna100 !

@jshanna100 jshanna100 deleted the icaref branch May 22, 2020 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants