-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ENH: Add support for reading Neuralynx data #11969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴 |
|
Looks like a good start! I left some initial comments. Let me know if they don't make sense and I can clarify further. Or if you want me to try pushing some commits I can. A good next step would be to get ~1 sec of raw data (can be junk!) added to mne-testing-data: |
|
Thanks for the detailed comments! All makes sense to me! I'll have a look and patch these in. In the meantime, I've asked around in the lab if there's a way to get the tiny sample recording done. Report back sometime next week once I hear back. |
|
Just pushed an update to my remote branch. Seeing the commit history chaos, I should prob stick to rebasing mne:main into my local development branch as opposed to merging it. We now have the testing dataset (dummy recording) available and I am trying to implement I don't have a good understanding of Values read by read_raw_neuralynx() don't match those obtained by neo.NeuralynxIO().read()As a first attempt, I tried using
|
|
Wasn't sure the easiest way to fix the history so I did: This gave the commit credit to you but created this branch / commit: If you think this commit is okay you can do locally: and the one commit should be shown in this PR. I'll look into the data structures soon |
|
Okay I got tests to pass here (still on my
Keep in mind You can see in my commit I just use |
|
Thanks for suggesting the fix for the commit history! Will try this in a bit and see how it goes. The fixes in the last commit on your branch all make sense to me.
|
7273499 to
5dbce2a
Compare
|
Ok, so 5dbce2a is a first attempt into more efficient loading into memory. Test failing, but see below for update and the main idea on how this could work via Test failingRight now, I don't have a good insight into UpdateIn short, in 5dbce2a I defer reading data into memory and only fetch the header info and empty data structures ( # quick and dirty plotting
raw = read_raw_neuralynx(fname=testing_path, preload=True)
d1, t1 = raw.get_data(start=0, stop=1000, return_times=True)
raw_orig = raw.copy()
raw3 = raw.pick(picks=[0, 1, 5])
labels3 = raw3.ch_names
d3, t3 = raw3.get_data(start=400, stop=1200, return_times=True)
raw4 = raw_orig.pick(picks=[1, 2, 8])
labels4 = raw4.ch_names
d4, t4 = raw4.get_data(start=200, stop=1000, return_times=True)
# then plot d1, d3, d4 etc. |
mne/io/neuralynx/neuralynx.py
Outdated
| ] | ||
| ).T | ||
|
|
||
| block = all_data # shape = (len(idx), n_samples)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps somewhat confusingly, whatever you pass to _mult_cal_one needs to be shape (n_channels, n_samples), not (len(idx), n_samples). I'll push a quick commit to show what I mean
|
With my commit now I see: So the |
mne/io/neuralynx/neuralynx.py
Outdated
| # shape = (n_segments, 2) where 2nd dim is (start_time, stop_time) | ||
| onst_offt = np.array( | ||
| [ | ||
| (signal.t_start.item(), signal.t_stop.item()) | ||
| for segment in segments | ||
| for signal in segment.analogsignals | ||
| ] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that there can be gaps and/or overlaps?
For gaps there is a way to handle it (insert zeros) but for overlaps I'm not even sure what the right behavior would be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point. I'd assume no gaps/overlap as it should just be chunked representation of a continuous recording by the acquisition system (for some reason). That is, each segments' off time should be the next segments' start time etc. Our test dataset only has two segments (over a total of ~3 sec recording). I can perhaps test this on my local datasets which have on the order of 10's of segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's assumed that they are contiguous then I'd just construct an array of shape (n_samples,) of int type that gives you the mapping from sample numbers into segment numbers in __init__ and store it in _raw_extras[0] = dict(segment_map=segment-map) or so. Then use this to figure out which segments to read in _read_segments_file. If this isn't clear I can push a quick commit to show what I mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Makes total sense. Will implement.
|
OK, added a I think this will work as a strategy to keep track of which samples belong to which segments. Seems like atm the tests are failing, but I can't see exactly where from the bot messages on here (stylistic things?). In any case, if I manually evaluate I suspect something fishy is going on with chunk reading based on time slices via Will explore and report here. |
- use in _read_segment_file instead of _find_first_last_segment
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
In e251d98 I implemented way to read in segments via Filtering by channel type needed in read_raw_neuralynx(). For now, my hack is to include Extra information: raw.plot() on testing datasetfrom mne.datasets.testing import data_path, requires_testing_data
from mne.io import read_raw_neuralynx
testing_path = data_path(download=False) / "neuralynx"
raw = read_raw_neuralynx(testing_path, preload=True)
raw.plot()
And an actual (non-dummy) dataset: |
Just commenting to point that 0c552cb was pushed yesterday here to fix docstrings. This makes the Tests/Style pass, though I see some CIs fail but looks like it's some conda-specific test on (No rush on my end, forgot to comment yesterday and just making sure it's seen, can't tell if a pushed commit alone also sends a notification to all involved in the conversation.) |
It can, but most maintainers I know have disabled this feature because it generates way too many emails, so yes please do comment when it's time to look. Failure is unrelated so you can ignore, I'll merge |
|
@larsoner It looks all green? See if there's anything missing or fishy in the reader. It should be a minimally working reader. I guess other improvements can follow on top of this, like bookkeeping any additional info from nlx headers (meas_date etc.). I'll be using this on real data in the coming weeks, so I can open issues as they pop along.
This can be done in a separate PR? Once it's merged, I'd like to also acknowledge the researcher who acquired the testing dataset, provided the |
It should be part of this one if possible. Feel free to push. Yes you can credit whoever you think should get co-author credit for the PR by adding names at the end of the changelog line (or multiple |
|
Thanks @larsoner! I updated |
|
Thanks @KristijanArmeni ! |
An incorrect type hint sneaked in via mne-tools#11969, causing confusion to static type checkers. Since we don't use type hints in MNE yet (except for some very rare exceptions), I simply removed the faulty one instead of fixing it.
|
@KristijanArmeni This is awesome! I was wondering in the end how you deal with missing samples, i.e. array size mismatch(when making an array using first and last timepoints vs the actual data present). For me, having to use the exclude file parameter seems like it disregards the power of having all the data in the directory at once and a warning is nice, but not if there isn't much I can do about it. On my end, I have two opinions on the matter. The sample rate itself is not exactly 32K or whatever ncs_reader.get_signal_sampling_rate() spits out, but something slightly smaller than that. This could contribute to the reason why there is a slight mismatch, and why this increases on larger datasets. Segments are generated in neo, not just when the experiments presses play/pause or an event annotation is made, but also when neuralynx fails to record certain samples. I also think this is the reason for the difference in samples across channels, and as the github issue above points out, there's also some weirdness with cutting off samples at the beginning and end of a file. Hence, neuralynx will drop more samples in the 32K sampled channels than the 8K sampled channels but asymmetrically(ex. neuralynx drops 3-4 samples in 32K case, no samples are dropped in 8K case, so a segment is generated in 32K channels, but not 8K channels). See below for more: My workaround has thus been to assume uniform sampling within a segment, and then interpolating with cubic splines for any data 'in between' segments. I do this instead of fieldtrips filling with NaNs, because the signal processing is easier downstream (without having to then recast the NaN values to something suitable for scipy.signal.sosfiltfilt). Outside of spike sorting, I don't care about 32K or 8K sampling rate, so I then downsample to 2K and save as .npy files for downstream analysis. |
Assuming you have two streams to align / resample (though it generalizes to more), naively my approach would be to use the If we have gaps in some recordings we should make sure these are in there properly with zeros (or NaNs) and annotated with |
I agree with annotation, but I was under the impression that scipy.signal doesn't work well with NaNs? Also worried that 0s might cause edge effects when computing power analyses, although maybe not as big of a concern with wavelets? |
Correct, it doesn't handle NaN well -- it will turn the whole signal But thinking about it more, getting the segment-by-segment resampling right seems like a difficult problem. Also reading through your description of Neuralynx and samples just missing at the beginning and end of a recording (instead of in the middle someplaces) I'm not totally sure you need all this. So maybe not the right option here 🤷 |
|
Thanks for raising this @eduardosand! You mention two sources/forms of array-size mismatch:
Regarding 2 and reading in the Fieldtrip doc you linked:
It sounds like If this is on track, happy to open a separate issue and work on this. |
|
Sounds good, let's move discussion to #12247 |
Co-authored-by: Eric Larson <larson.eric.d@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>




Reference issue
Work in progress for #11874
What does this implement/fix?
RawNeuralynx()class, subclassingBaseRaw(), and theread_raw_neuralynx()wrapper.NeuralynxIOclass from the Neo package for reading header informationTo-Do