ENH: Add support for reading Neuralynx data #11969

KristijanArmeni · 2023-09-08T14:46:19Z

Reference issue

Work in progress for #11874

What does this implement/fix?

Implements a draft of RawNeuralynx() class, subclassing BaseRaw(), and the read_raw_neuralynx() wrapper.
relies on functionalities in the NeuralynxIO class from the Neo package for reading header information

To-Do

Testing data. We are missing testing data to test this implementation and move forward. I can ask if there's ways to get a sample recording on our end. Can also ask on forums (MNE, Fieldtrip etc.) for any samples.

welcome · 2023-09-08T14:46:21Z

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴

mne/io/neuralynx/__init__.py

mne/io/neuralynx/neuralynx.py

mne/io/neuralynx/tests/test_neuralynx.py

mne/io/neuralynx/neuralynx.py

larsoner · 2023-09-08T16:55:26Z

Looks like a good start! I left some initial comments. Let me know if they don't make sense and I can clarify further. Or if you want me to try pushing some commits I can.

A good next step would be to get ~1 sec of raw data (can be junk!) added to mne-testing-data:

https://github.com/mne-tools/mne-testing-data#add-or-change-files-in-the-repo-for-use-with-mne

KristijanArmeni · 2023-09-08T21:10:02Z

Thanks for the detailed comments! All makes sense to me! I'll have a look and patch these in. In the meantime, I've asked around in the lab if there's a way to get the tiny sample recording done. Report back sometime next week once I hear back.

KristijanArmeni · 2023-10-17T18:26:40Z

Just pushed an update to my remote branch. Seeing the commit history chaos, I should prob stick to rebasing mne:main into my local development branch as opposed to merging it.

We now have the testing dataset (dummy recording) available and I am trying to implement mne.io.read_raw_neuralynx(neuralynx_testing_data) (via RawNeuralynx() and _read_segments_file()) and plot the outcome. Currently, it will obtain the right number of channels and samples, but the values appear faulty (and are the same across channels). See below for details on how I benchmark.

I don't have a good understanding of _read_segments_file() yet, but I guess something goes wrong before np.fromfile is called to obtain the file (.ncs) contents. Perhaps I am doing something wrong in my call to _read_segments_file?

Values read by read_raw_neuralynx() don't match those obtained by neo.NeuralynxIO().read()

As a first attempt, I tried using _read_segments_file and setting n_channels argument to 1 (in neuralynx.py), because every .ncs file on disk is 1 channel. This does make the code run, but the outcome is faulty. For now, I benchmark against the neo.neuralynxIO() reader (test_neuralynx.py, see figure below for 3 channels):

the number of samples/timepoints and channels matches matches well
the actual sample values are different and most samples obtained by _read_segments_file() are actually 0 (not shown in figure).
the values obtained by neo seem to make more sense to me (for an unconnected recording)

larsoner · 2023-10-17T20:46:00Z

Wasn't sure the easiest way to fix the history so I did:

git checkout main
git fetch upstream
git reset --hard upstream/main
git checkout KristijanArmeni/nlx mne/io/neuralynx
# hack in a few tweaks
git commit --amend -a
# modify commit message
git push origin -u nlx

This gave the commit credit to you but created this branch / commit:

If you think this commit is okay you can do locally:

git remote add larsoner git@github.com:larsoner/mne-python.git
git fetch larsoner
git checkout nlx
git checkout -b nlx-bak  # create a local backup!
git checkout nlx
git reset --hard larsoner/nlx
git commit --amend -a
git push origin --force nlx

and the one commit should be shown in this PR.

I'll look into the data structures soon

larsoner · 2023-10-17T21:06:32Z

Okay I got tests to pass here (still on my nsx branch):

larsoner@5cb32a4

(via RawNeuralynx() and _read_segments_file())

Keep in mind _read_segments_file is a very dumb / simple function -- it assumes data in the filename are stored starting at byte zero with no metadata. So it will only work if that's the case for Neuralynx files. Is it?

You can see in my commit I just use neo for reading the data currently. It uses a very inefficient algorithm -- read all data from disk then subselect -- that we should improve upon. But hopefully it makes conceptually clear how things should be structured, and what self._read_segment_file must do internally.

KristijanArmeni · 2023-10-18T13:44:05Z

Thanks for suggesting the fix for the commit history! Will try this in a bit and see how it goes. The fixes in the last commit on your branch all make sense to me.

re: header info in .ncs. Right, good call, indeed each .ncs does have metadata I think, so there'd need to be a separate reader for header info, which is what probably causes the mess.
re: neo reading: I used neo to have something to benchmark against. I'm fine using it for reading in the data and I'm almost certain it has a way of reading in data after samples and channels are specified. Will have a look once I get this up and running.

KristijanArmeni · 2023-10-19T17:00:27Z

Ok, so 5dbce2a is a first attempt into more efficient loading into memory. Test failing, but see below for update and the main idea on how this could work via neo.

Test failing

Right now, test_neuralynx.py fails on my end at _test_raw_reader (l. 43). I tried stepping through it. It seems _mult_cal_one loops over _read_segment_file multiple times (?) and while on some loops idx looks fine (slice(0, 10, None) indexing 10 channels), on one iteration it looks off (array([7, 2, 1, 4, 8, 6, 3, 0, 5, 9])) and data_view and one have inconsistent sample dimension size and assertion on l.83 (assert data_view.shape[1] == one.shape[1]) fails.

I don't have a good insight into _mult_cal_one so not sure what goes one there.

Update

In short, in 5dbce2a I defer reading data into memory and only fetch the header info and empty data structures (bl = nlx_reader.read(lazy=True)). Based on start and stop samples given to _read_segment_file() I find the data chunks in neo.block.segments that contain them and then call neo.Segment.AnalogSignalProxy.load() only on the selected chunks/segments between start and (including) stop. I tested some options with different picks and start/stop values and seems on the right path to me.

# quick and dirty plotting
raw = read_raw_neuralynx(fname=testing_path, preload=True)
d1, t1 = raw.get_data(start=0, stop=1000, return_times=True)
raw_orig = raw.copy()

raw3 = raw.pick(picks=[0, 1, 5])
labels3 = raw3.ch_names
d3, t3 = raw3.get_data(start=400, stop=1200, return_times=True)

raw4 = raw_orig.pick(picks=[1, 2, 8])
labels4 = raw4.ch_names
d4, t4 = raw4.get_data(start=200, stop=1000, return_times=True)
# then plot d1, d3, d4 etc.

larsoner · 2023-10-19T17:25:21Z

mne/io/neuralynx/neuralynx.py

+            ]
+        ).T
+
+        block = all_data  # shape = (len(idx), n_samples))


Perhaps somewhat confusingly, whatever you pass to _mult_cal_one needs to be shape (n_channels, n_samples), not (len(idx), n_samples). I'll push a quick commit to show what I mean

larsoner · 2023-10-19T17:32:57Z

With my commit now I see:

mne/io/neuralynx/tests/test_neuralynx.py:43: in test_neuralynx
    _test_raw_reader(read_raw_neuralynx, fname=testing_path)
mne/io/tests/test_raw.py:168: in _test_raw_reader
    data2, times2 = other_raw[picks, sl_time]
mne/io/base.py:851: in __getitem__
    return self._getitem(item)
mne/io/base.py:858: in _getitem
    data = self._read_segment(start=start, stop=stop, sel=sel)
<decorator-gen-311>:12: in _read_segment
    ???
mne/io/base.py:479: in _read_segment
    _ReadSegmentFileProtector(self)._read_segment_file(
mne/io/base.py:2511: in _read_segment_file
    return self.__raw.__class__._read_segment_file(
mne/io/neuralynx/neuralynx.py:162: in _read_segment_file
    block[idx] = all_data  # shape = (n_channels, n_samples)
E   ValueError: shape mismatch: value array of shape (10,6783) could not be broadcast to indexing result of shape (10,2000)

So the all_data that you create does not have enough samples, so there must be some bug with _find_first_last_segment or how it's used

larsoner · 2023-10-19T17:40:26Z

mne/io/neuralynx/neuralynx.py

+            # shape = (n_segments, 2) where 2nd dim is (start_time, stop_time)
+            onst_offt = np.array(
+                [
+                    (signal.t_start.item(), signal.t_stop.item())
+                    for segment in segments
+                    for signal in segment.analogsignals
+                ]
+            )


Does this mean that there can be gaps and/or overlaps?

For gaps there is a way to handle it (insert zeros) but for overlaps I'm not even sure what the right behavior would be

Oh, good point. I'd assume no gaps/overlap as it should just be chunked representation of a continuous recording by the acquisition system (for some reason). That is, each segments' off time should be the next segments' start time etc. Our test dataset only has two segments (over a total of ~3 sec recording). I can perhaps test this on my local datasets which have on the order of 10's of segments.

If it's assumed that they are contiguous then I'd just construct an array of shape (n_samples,) of int type that gives you the mapping from sample numbers into segment numbers in __init__ and store it in _raw_extras[0] = dict(segment_map=segment-map) or so. Then use this to figure out which segments to read in _read_segments_file. If this isn't clear I can push a quick commit to show what I mean

Thanks! Makes total sense. Will implement.

KristijanArmeni · 2023-10-20T19:19:52Z

OK, added a smp2seg dict in _raw_extras[0] in df66e3.

I think this will work as a strategy to keep track of which samples belong to which segments. Seems like atm the tests are failing, but I can't see exactly where from the bot messages on here (stylistic things?).

In any case, if I manually evaluate test_neuralynx() locally it now runs with no errors. But if I try reading one of my bigger datasets I have locally I get some inconsistent sample numbers between all_data and block variables in RawNeuralynx._read_segment_file().

I suspect something fishy is going on with chunk reading based on time slices viaAnalogSignal.load(time_slice=(start_time, stop_time)). It seems the concatenated arrays it loads based on (start_time, stop_time) don't have the same number of samples as an array which is constructed from np.zeros(n_channels, stop-start).

Will explore and report here.

- use in _read_segment_file instead of _find_first_last_segment

for more information, see https://pre-commit.ci

KristijanArmeni · 2023-10-30T18:55:06Z

In e251d98 I implemented way to read in segments via neo based on sample and channel indices (Segment.AnalogSignal.load(channel_indexes=idx).magnitude[:, segment_start:segment_stop]) as opposed to converting time/stop to time units and using t_start/t_stop in AnalogSignal.load() (this, for some reason, resulted in array inconsistencies on larger datasets). This code still only loads in memory the data segments between start and stop and channels specified in idx. See figures below for extra info.

Filtering by channel type needed in read_raw_neuralynx().
After testing my implementation on local datasets that have several GBs worth of real data (and multiple channel types, e.g. seeg, eog etc.), I learned that channel types can have different number of segments which causes NeuralynxIO.read() to throw (a meaningful) error. NeuralynxIO() has the exclude_filename param which is a list of files to ignore to deal with this, such that the user can initiate separate readers for seeg, eog etc. (see this github issue).

For now, my hack is to include exclude_fnames_pattern param containing file patterns to ignore (e.g. "EOG*") in init. These patterns are passed to glob.glob and the returned lists of files are stored in raw._extras and passed to to NeuralynxIO() as exclude_filename input arg. Not sure if there’s a way of doing this that’s more aligned with the MNE API, but this worked for me locally.

Extra information: raw.plot() on testing dataset

from mne.datasets.testing import data_path, requires_testing_data
from mne.io import read_raw_neuralynx
testing_path = data_path(download=False) / "neuralynx"
raw = read_raw_neuralynx(testing_path, preload=True)
raw.plot()

And an actual (non-dummy) dataset:

This is a larger dataset (about 5GB in size).

KristijanArmeni · 2023-11-14T15:54:45Z

Yes if you click blue "details" to the right of the failing Tests/Style to get to https://github.com/mne-tools/mne-python/actions/runs/6852318442/job/18630579970?pr=11969 you'll see:

Just commenting to point that 0c552cb was pushed yesterday here to fix docstrings. This makes the Tests/Style pass, though I see some CIs fail but looks like it's some conda-specific test on mne:main?

(No rush on my end, forgot to comment yesterday and just making sure it's seen, can't tell if a pushed commit alone also sends a notification to all involved in the conversation.)

larsoner · 2023-11-14T15:57:32Z

can't tell if a pushed commit alone also sends a notification to all involved in the conversation

It can, but most maintainers I know have disabled this feature because it generates way too many emails, so yes please do comment when it's time to look. Failure is unrelated so you can ignore, I'll merge main into this branch to fix it. Can you take out of draft status if you're ready for it to be merged (assuming CIs are green)? Then I'll take one last look and merge if it's green and I have no further comments!

KristijanArmeni · 2023-11-14T17:15:16Z

@larsoner It looks all green? See if there's anything missing or fishy in the reader.

It should be a minimally working reader. I guess other improvements can follow on top of this, like bookkeeping any additional info from nlx headers (meas_date etc.). I'll be using this on real data in the coming weeks, so I can open issues as they pop along.

Oh and we'll also need a doc/changes/devel.rst update to mention this improvement!

This can be done in a separate PR? Once it's merged, I'd like to also acknowledge the researcher who acquired the testing dataset, provided the .mat files and helped troubleshooting Neuralynx specifics (if that's a practice).

larsoner · 2023-11-14T17:20:55Z

This can be done in a separate PR?

It should be part of this one if possible. Feel free to push.

Yes you can credit whoever you think should get co-author credit for the PR by adding names at the end of the changelog line (or multiple :newcontrib:s for people who are new contributors).

KristijanArmeni · 2023-11-14T20:26:11Z

Thanks @larsoner! I updated doc/changes/devel.rst and names.ini, hopefully in good order. Seems all checks passing.

larsoner · 2023-11-14T20:31:29Z

Thanks @KristijanArmeni !

An incorrect type hint sneaked in via mne-tools#11969, causing confusion to static type checkers. Since we don't use type hints in MNE yet (except for some very rare exceptions), I simply removed the faulty one instead of fixing it.

eduardosand · 2023-11-28T19:45:01Z

@KristijanArmeni This is awesome! I was wondering in the end how you deal with missing samples, i.e. array size mismatch(when making an array using first and last timepoints vs the actual data present). For me, having to use the exclude file parameter seems like it disregards the power of having all the data in the directory at once and a warning is nice, but not if there isn't much I can do about it.

On my end, I have two opinions on the matter.

The sample rate itself is not exactly 32K or whatever ncs_reader.get_signal_sampling_rate() spits out, but something slightly smaller than that. This could contribute to the reason why there is a slight mismatch, and why this increases on larger datasets.

Segments are generated in neo, not just when the experiments presses play/pause or an event annotation is made, but also when neuralynx fails to record certain samples. I also think this is the reason for the difference in samples across channels, and as the github issue above points out, there's also some weirdness with cutting off samples at the beginning and end of a file. Hence, neuralynx will drop more samples in the 32K sampled channels than the 8K sampled channels but asymmetrically(ex. neuralynx drops 3-4 samples in 32K case, no samples are dropped in 8K case, so a segment is generated in 32K channels, but not 8K channels).

See below for more:
https://www.fieldtriptoolbox.org/faq/discontinuous_neuralynx/
fieldtrip/fieldtrip#1128

My workaround has thus been to assume uniform sampling within a segment, and then interpolating with cubic splines for any data 'in between' segments. I do this instead of fieldtrips filling with NaNs, because the signal processing is easier downstream (without having to then recast the NaN values to something suitable for scipy.signal.sosfiltfilt). Outside of spike sorting, I don't care about 32K or 8K sampling rate, so I then downsample to 2K and save as .npy files for downstream analysis.

larsoner · 2023-11-28T19:52:53Z

For me, having to use the exclude file parameter seems like it disregards the power of having all the data in the directory at once and a warning is nice, but not if there isn't much I can do about it.

Assuming you have two streams to align / resample (though it generalizes to more), naively my approach would be to use the exclude parameter to create two Raw instances (the first instance excluding the second file, and the second instance excluding the first file) and then resample them.

If we have gaps in some recordings we should make sure these are in there properly with zeros (or NaNs) and annotated with BAD_ACQ_SKIP so that time is continuous in the instance. Then we "just" need to work out #11408 to resample the data to the same sample rate.

eduardosand · 2023-11-28T21:46:10Z

If we have gaps in some recordings we should make sure these are in there properly with zeros (or NaNs) and annotated with BAD_ACQ_SKIP so that time is continuous in the instance. Then we "just" need to work out #11408 to resample the data to the same sample rate.

I agree with annotation, but I was under the impression that scipy.signal doesn't work well with NaNs? Also worried that 0s might cause edge effects when computing power analyses, although maybe not as big of a concern with wavelets?
Interpolation is nice because it allows us to use any filters we want and as long as remember the values themselves are interpolated, we can discard them later after pre-processing.

larsoner · 2023-11-29T01:07:11Z

I agree with annotation, but I was under the impression that scipy.signal doesn't work well with NaNs? Also worried that 0s might cause edge effects when computing power analyses, although maybe not as big of a concern with wavelets?

Correct, it doesn't handle NaN well -- it will turn the whole signal NaN. But the idea isn't to operate on the whole signal at once. See the original issue -- the idea is to resample each non-NaN segment separately using skip_by_annotation to find segments that should be resampled. Similarly we can improve our tfr_* functions to also respect skip_by_annotation and thereby (again) process each segment separately.

But thinking about it more, getting the segment-by-segment resampling right seems like a difficult problem. Also reading through your description of Neuralynx and samples just missing at the beginning and end of a recording (instead of in the middle someplaces) I'm not totally sure you need all this. So maybe not the right option here 🤷

KristijanArmeni · 2023-11-29T15:10:03Z

Thanks for raising this @eduardosand! You mention two sources/forms of array-size mismatch:

Across channels due to different sampling rates and/or asymmetric Neuralynx sample trimming (e.g. 8kHz, vs 32KHz, this I was aware of this is what, as @larsoner said, exclude_fname_pattern, propagated to neo reader, should take care of such that the user can construct two Raw instances if need be; outside of whether or not the two can eventually be realigned).
Gaps across segments, discontinuous recording within a channel (i.e. number of valid samples is smaller than total number of samples). Indeed, the current code assumes the segments are continuous, temporally adjacent.

Regarding 2 and reading in the Fieldtrip doc you linked:

The low-level read_neuralynx_ncs function detects the presence of gaps in the .ncs file and issues a warning.

It sounds like mne.io.read_raw_neuralynx() should minimally check for temporal gaps between neo segments and issue a warning? For example, it should check that each neo.Segment[i] object starts when the neo.Segment[i-1] ended and raise a warning if this is not the case (i.e. there's temporal gap, assuming the information in neo is accurate)? And potentially also reconstruct/fill/mark missing samples such that the time axis (i.e. raw.times) is continuous and valid.

If this is on track, happy to open a separate issue and work on this.

larsoner · 2023-11-29T15:14:24Z

Sounds good, let's move discussion to #12247

Co-authored-by: Eric Larson <larson.eric.d@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

larsoner reviewed Sep 8, 2023

View reviewed changes

larsoner changed the title ~~- initial commit, add module to support Neuralynx data~~ ENH: Add support for reading Neuralynx data Sep 8, 2023

This was referenced Sep 19, 2023

add support to read continously sampled channel files from the Neuralynx acquisition system #11874

Closed

[ENH] Add Neuralynx continous recordings for testing mne-tools/mne-testing-data#110

Merged

KristijanArmeni mentioned this pull request Sep 29, 2023

[ENH] update version and checksum in config.py for testing datasets #12032

Merged

KristijanArmeni force-pushed the nlx branch 2 times, most recently from 7273499 to 5dbce2a Compare October 19, 2023 16:36

larsoner reviewed Oct 19, 2023

View reviewed changes

larsoner mentioned this pull request Oct 20, 2023

Micromed TRC file reader #10524

Open

KristijanArmeni and others added 7 commits October 30, 2023 14:14

ENH: Add read_raw_neuralynx

a9d355b

FIX: Green

9da25cb

- try more efficient reading into memory

1484244

- ENH add sample2segment to _raw_extras

6bcb833

- use in _read_segment_file instead of _find_first_last_segment

[pre-commit.ci] auto fixes from pre-commit.com hooks

f75a810

for more information, see https://pre-commit.ci

FIX: Channel [ci skip]

6efcb3b

ENH read in data based on sample information, add exclude_fname_patterns

e251d98

KristijanArmeni force-pushed the nlx branch from efd7171 to e251d98 Compare October 30, 2023 18:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

abd1e44

for more information, see https://pre-commit.ci

larsoner mentioned this pull request Nov 13, 2023

ENH: increment testing ds version + hash #12200

Closed

FIX: docstring style, make a method private

0c552cb

Merge branch 'main' into nlx

44a376c

KristijanArmeni marked this pull request as ready for review November 14, 2023 17:04

KristijanArmeni requested review from agramfort, cbrnr, dengemann and mscheltienne as code owners November 14, 2023 17:04

larsoner added this to the 1.6 milestone Nov 14, 2023

KristijanArmeni added 3 commits November 14, 2023 12:59

[MAINT] update devel.rst and names.inc

5600722

[FIX] remove underscore with :newcontrib:

ea07252

[FIX] remove :func: reference as it won't work

10dd8f6

larsoner merged commit e2b3c03 into mne-tools:main Nov 14, 2023

KristijanArmeni deleted the nlx branch November 14, 2023 21:29

hoechenberger mentioned this pull request Nov 22, 2023

Fix incorrect type hint in Neuralynx I/O module #12236

Merged

larsoner mentioned this pull request Nov 29, 2023

BUG: Gaps in neuralynx not handled properly #12247

Open

Uh oh!

ENH: Add support for reading Neuralynx data #11969

ENH: Add support for reading Neuralynx data #11969

Uh oh!

Conversation

KristijanArmeni commented Sep 8, 2023

Reference issue

What does this implement/fix?

To-Do

Uh oh!

welcome bot commented Sep 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

larsoner commented Sep 8, 2023

Uh oh!

KristijanArmeni commented Sep 8, 2023

Uh oh!

KristijanArmeni commented Oct 17, 2023

Values read by read_raw_neuralynx() don't match those obtained by neo.NeuralynxIO().read()

Uh oh!

larsoner commented Oct 17, 2023

Uh oh!

larsoner commented Oct 17, 2023

Uh oh!

KristijanArmeni commented Oct 18, 2023

Uh oh!

KristijanArmeni commented Oct 19, 2023

Test failing

Update

Uh oh!

larsoner Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

larsoner commented Oct 19, 2023

Uh oh!

larsoner Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

KristijanArmeni Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

larsoner Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

KristijanArmeni Oct 19, 2023

Choose a reason for hiding this comment

Uh oh!

KristijanArmeni commented Oct 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KristijanArmeni commented Oct 30, 2023

Extra information: raw.plot() on testing dataset

And an actual (non-dummy) dataset:

Uh oh!

KristijanArmeni commented Nov 14, 2023

Uh oh!

larsoner commented Nov 14, 2023

Uh oh!

KristijanArmeni commented Nov 14, 2023

Uh oh!

larsoner commented Nov 14, 2023

Uh oh!

KristijanArmeni commented Nov 14, 2023

Uh oh!

larsoner commented Nov 14, 2023

Uh oh!

eduardosand commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsoner commented Nov 28, 2023

Uh oh!

eduardosand commented Nov 28, 2023

Uh oh!

KristijanArmeni commented Oct 20, 2023 •

edited

Loading

eduardosand commented Nov 28, 2023 •

edited

Loading