Skip to content

Conversation

@bloyl
Copy link
Contributor

@bloyl bloyl commented May 31, 2018

Current CTF IO doesn't read in bad channels or segments from *.ds/BadChannels and *.ds/bad.segments files.

This PR should address that.

I stuck the new methods in info.py without a strong reason so if anyone thinks they make more sense somewhere else I'm happy to move them,

How should I test this?

  1. I could make a new version of MNE-testing-data/CTF/testdata_ctf_short.ds with a bad segment and a bad channel marked.

  2. I could alter an exists ds folder in MNE-testing-data/CTF/ to have a bad segment and channel.

Downside to option 1 is that it'll add ~10M to MNE-testing-data. Downside to option 2 is that it would potentially break/alter existing tests.

@agramfort
Copy link
Member

I would go with 2 and see if it actually breaks any test. Make the bad segments really short and make only one channel bad?

@bloyl bloyl mentioned this pull request Jun 4, 2018
@bloyl
Copy link
Contributor Author

bloyl commented Jun 4, 2018

It seems as if mne-python also doesn't read in time of the first data sample from ctf data.
For instance both dataEditor (CTF program), Brainstorm and fieldtrip think the first data sample of 'MNE-testing-data/CTF/testdata_ctf_short.ds' is 2 seconds. mne-python has it as 0.

From

mne-python/mne/io/base.py

Lines 629 to 633 in ab4571a

def _update_times(self):
"""Update times."""
self._times = np.arange(self.n_times) / float(self.info['sfreq'])
# make it immutable
self._times.flags.writeable = False

it seems like adjusting this time is not possible is that correct?

@codecov
Copy link

codecov bot commented Jun 5, 2018

Codecov Report

Merging #5255 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #5255      +/-   ##
==========================================
+ Coverage   88.22%   88.25%   +0.03%     
==========================================
  Files         358      358              
  Lines       66228    66284      +56     
  Branches    11254    11268      +14     
==========================================
+ Hits        58429    58502      +73     
+ Misses       4981     4959      -22     
- Partials     2818     2823       +5

# Add measurement id
if info['meas_id'] is not None:
write_id(fid, FIFF.FIFF_PARENT_BLOCK_ID, info['meas_id'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed or wanted?

mne-python/mne/io/write.py

Lines 224 to 226 in ab4571a

def write_id(fid, kind, id_=None):
"""Write fiff id."""
id_ = _generate_meas_id() if id_ is None else id_

will put in the default (DATE_NONE)

I guess the question is do you always want a meas_id in a the fiff structure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIF file itself will need one, but I don't know if meas_info needs one. Can you try writing a file with an info without meas_id in it and see if the C tools can read it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the C tools running or any experience with them.

For what its worth without this line (ie the info['meas_id'] == None case) it would revert to the backend of write_info method so if those played nice with the C code I imagine this would be fine as well

In those cases (ie the info['meas_id'] == None) read_info in mne-python returned an info['meas_id'] with the DATE_NONE usecs in it.

Also the file gets the default (None/DATE_NONE) id.

assert_array_equal(raw.info['meas_id']['machid'],
raw3.info['meas_id']['machid'])
else:
logger.warn('Reader does not set meas_id')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to raise an error? it would catch IO readers that don't set meas_id.

Currently it is only raising errors when the reader doesn't set meas_id and when the tests that expect the warning messages to be a specific thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to warn. I don't think we are guaranteed that all formats will have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I can pull it. What is info['meas_id'] used for... anything?


# Add bad segments as Annotations (correct for start time)
s_time = -res4['pre_trig_pts'] / float(info['sfreq'])
self.annotations = _read_bad_segments(directory, s_time)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this s_time would be the start of the 'raw.times' array. Is that possible with the current fif structures?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw.times always starts at 0. but there is raw.first_samp which is an offset, is that what you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw.first_samp at least from what I saw doesn't change raw.times[0] but does skip the first raw.first_samp samples of the data array. Is that the correct understanding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what it does in the context of CTF. But for Neuromag data it is the offset between when the acquisition started and the recording began. In MNE we use it when cropping files, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with setting raw.first_samp is that it skips those samples as stored on disk, so while it makes sense of the timing issues needs to say set to 0 to read in all of the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first_samp should not impact the reading of the first samples on disk. It's basically an increment in events and allows to know when samples where recorded wrt the original absolute time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't match what I am seeing.
for instance running this test code

import os.path as op
from mne.io import read_raw_ctf
from mne.datasets import testing

ctf_dir = op.join(testing.data_path(download=True), 'CTF')
ctf_fname = op.join(ctf_dir, 'testdata_ctf_short.ds')
raw = read_raw_ctf(ctf_fname, preload=True)

print ('first_samp - %d' % raw.first_samp)
print ('_data.shape', raw._data.shape)
print ('_data[3,0]', raw._data[3, 0])
print ('_data[3,5]', raw._data[3, 5])

print ('len(raw.times) - %d' % len(raw.times))
print ('raw.times.min() - %f' % raw.times.min())
print ('raw.times.max() - %f' % raw.times.max())

on the current master(not this branch) returns

first_samp - 0
('_data.shape', (356, 4801))
('_data[3,0]', 16442.588333333333)
('_data[3,5]', 16442.592499999999)
len(raw.times) - 4801
raw.times.min() - 0.000000
raw.times.max() - 4.000000

if I change

first_samps = [0] * len(last_samps)

to
first_samps = [5] * len(last_samps)

returns

first_samp - 5
('_data.shape', (356, 4796))
('_data[3,0]', 16442.592499999999)
('_data[3,5]', 16442.596666666665)
len(raw.times) - 4796
raw.times.min() - 0.000000
raw.times.max() - 3.995833

so it certainly seems like setting first_samp skips the opening of the data array.

@agramfort
Copy link
Member

@larsoner ok for you?

can someone replicate the travis failure?

@larsoner
Copy link
Member

larsoner commented Jun 9, 2018

test_cov.py has been timing out on all PRs for the last week-ish. I have not had time to try to replicate yet

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the seemingly unresolved first_samp issue.

assert_equal(raw.info['meas_date'], raw_read.info['meas_date'])

for key in ['secs', 'usecs', 'version']:
assert_equal(raw.info['meas_id'][key], raw_read.info['meas_id'][key])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can now do the simpler assert raw.info['meas_id'][key] == raw_read.info['meas_id'][key]. The codebase is slowly migrating toward this pytest way of doing things (except for comparisons that require numpy.testing). Feel free to change it here or not, I'm happy either way

@agramfort
Copy link
Member

agramfort commented Jun 10, 2018 via email

@bloyl
Copy link
Contributor Author

bloyl commented Jun 10, 2018

All the raw.times stuff makes sense.

My concern is the differences in raw._data
when first_samp=5 the data array is 5 samples shorter (4796) then when first_samp=0 (4801).

Similarly first data sample ('_data[3,0]', 16442.592499999999) when first_samp=5 is equal to the 5th data sample ('_data[3,5]', 16442.592499999999) when first_samp=0

@bloyl
Copy link
Contributor Author

bloyl commented Jun 11, 2018

This is where first_samp is used to skip the opening of the data on disk.

mne-python/mne/io/base.py

Lines 506 to 520 in da1d329

offset = 0
for fi in np.nonzero(files_used)[0]:
start_file = self._first_samps[fi]
# first iteration (only) could start in the middle somewhere
if offset == 0:
start_file += start - cumul_lens[fi]
stop_file = np.min([stop - cumul_lens[fi] + self._first_samps[fi],
self._last_samps[fi] + 1])
if start_file < self._first_samps[fi] or stop_file < start_file:
raise ValueError('Bad array indexing, could be a bug')
n_read = stop_file - start_file
this_sl = slice(offset, offset + n_read)
self._read_segment_file(data[:, this_sl], idx, fi,
int(start_file), int(stop_file),
cals, mult)

readers then implement _read_segment_file

I assume that fif file/IO sorts this all out for you when you use crop?

@agramfort
Copy link
Member

agramfort commented Jun 11, 2018 via email

@bloyl
Copy link
Contributor Author

bloyl commented Jun 15, 2018

Do we want to add add keywords to raw_raw_ctf that activate/deactivate this behavior? if so what should the defaults be?

I would vote for adding keywords that default to the old behavior (ignoring bad channels and segments) and maybe plan to change the default at some later release.

@larsoner
Copy link
Member

I see it as more of a bugfix / something we should have been doing all along. Do you think it's likely that people had bad channels and/or segments marked during acq that they wouldn't want marked during offline analysis?

@bloyl
Copy link
Contributor Author

bloyl commented Jun 15, 2018

I agree that this is something that should have been done the whole time so should become the default behavior at some point. I am perhaps being overly conservative after the comp channels stuff :). I am fine either way.

I due think its possible that there are people who didn't know there were bad channels/segments marked in the data they were using and now their pipelines will behave slightly differently.

@agramfort
Copy link
Member

agramfort commented Jun 15, 2018 via email

@agramfort
Copy link
Member

agramfort commented Jul 9, 2018

@bloyl can you resurrect this?

@bloyl
Copy link
Contributor Author

bloyl commented Jul 9, 2018

+1 for merging on my end

I'll be away the next two weeks with limited email so feel free to make any changes you feel are needed.

@larsoner
Copy link
Member

@bloyl have you tested this with some of your own files and found it to work okay?

I'll probably test it locally just by using the testing dataset and visually inspecting the Annotations.

@bloyl
Copy link
Contributor Author

bloyl commented Jul 10, 2018 via email

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1 for merge

@larsoner larsoner changed the title CTF support for ds bad channels and bad segments [MRG+1] CTF support for ds bad channels and bad segments Jul 11, 2018
start_block(fid, FIFF.FIFFB_MEAS)
write_id(fid, FIFF.FIFF_BLOCK_ID)
if info['meas_id'] is not None:
write_id(fid, FIFF.FIFF_PARENT_BLOCK_ID, info['meas_id'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort I know @bloyl is out for a bit so I'm working on this now.

Maybe we should just revert this bit for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

@agramfort
Copy link
Member

@bloyl can you rebase to fix CIs?

@larsoner
Copy link
Member

Green

@agramfort agramfort merged commit 1aea53a into mne-tools:master Jul 20, 2018
@agramfort
Copy link
Member

thx @bloyl and @larsoner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants