Skip to content

Conversation

@swy7ch
Copy link
Contributor

@swy7ch swy7ch commented Jun 26, 2020

Reference issue

#7926, in this comment.

What does this implement/fix?

Add NIRSport (v1) systems data processing in MNE:

  • ability to pick which data file to load (*.wlX vs *.nosatflags_wlX)
  • NaN values annotation
  • more ?

Additional information

The two versions of NIRSport (1 & 2) do not provide data the same way, so v2 support will be handled later.

As said here by @cbrnr, NIRSport v1 and NIRScout data seem similar enough to use the same reader. We can of course debate this, and I'd make another one if needed.

@swy7ch
Copy link
Contributor Author

swy7ch commented Jun 26, 2020

Wow, that's a lot of errors for such a small patch.

Regarding circleci it looks like I have to link my repo with them. If it's the case I'll have to contact my tutor, as he's in charge of the repo.

Don't know about the code coverage though.

I aslo have to deal with tests, but I'll wait for the approval of a NIRSport v1 dataset. Nevermind that

@swy7ch swy7ch force-pushed the addNIRSport branch 2 times, most recently from 474ec68 to 965f2e3 Compare June 26, 2020 14:41
@drammock
Copy link
Member

It's actually not that bad:

Style errors:

mne/io/nirx/tests/test_nirx.py:58:1: E302 expected 2 blank lines, found 1
mne/io/nirx/tests/test_nirx.py:64:33: W503 line break before binary operator
mne/io/nirx/tests/test_nirx.py:71:1: E302 expected 2 blank lines, found 1
mne/io/nirx/tests/test_nirx.py:60 in public function `test_nirx_nosatflags_v1_warn`:
        D400: First line should end with a period (not 'a')

Typos:

mne\io\nirx\tests\test_nirx.py:66: in test_nirx_nosatflags_v1_warn
    with pytest.raised(RuntimeWarning, match='You provided saturated'):
E   AttributeError: module 'pytest' has no attribute 'raised'

(should be pytest.raises, lines 65 and 67)

@rob-luke
Copy link
Member

rob-luke commented Jun 26, 2020

Secondly we need to decide what is the better option to read and provide to users. The unsaturated data, with NaNs. Or data that is known to have saturated values in it, which we know aren't valid but don't know which values are bad. I'm not sure there is a correct answer here, it probably depends on the particular study. So perhaps it should be a flag that's passed in to the read function. I think providing a warning as you have done is a good idea.

I think when reading data I would prefer to have the data with NaNs in it. This may break downstream processing (filtering?), but I would like to handle the saturation myself. Whereas if I read the data with saturated values I would never know which values were untrustworthy. For example I usually cut my data in to epochs, then I could just throw away epochs that had Nans in them because I know the machine saturated then. Or I could crop segments out of the original raw file that are +- a few seconds of saturated segments.

The third option may be to return the saturated data and an index/mask to which data is invalid. Not sure how this would actually be implemented. But just throwing all the options out there.

@drammock
Copy link
Member

The third option may be to return the saturated data and an index/mask to which data is invalid. Not sure how this would actually be implemented.

numpy.ma.masked_array is a natural choice for the data container in that case, but it could take some real work to make sure that actually works as expected with the rest of the instance's API.

@rob-luke
Copy link
Member

numpy.ma.masked_array is a natural choice for the data container in that case, but it could take some real work to make sure that actually works as expected with the rest of the instance's API.

Oh very interesting! I didn't know about this. But yeah sounds like lots of work. And I guess we can't expand info to have a mask.

But I think its worth having these API discussions now before @swy7ch puts too much work in. We need to decide which approach to take (return NaNs, return invalid data, or return data+mask), then what the defaults should be. @larsoner any suggestions here?

Don't know about the code coverage though.

Once we get your test sets in this should fix itself.

read_raw_nirx(fname, preload=True)

@requires_testing_data
def test_nirx_nosatflags_v1_warn(tmpdir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Lets update with your real data once collected.

@rob-luke
Copy link
Member

if "NIRScout" not in hdr['GeneralInfo']['Device']:
warn("Only import of data from NIRScout devices have been "
"thoroughly tested. You are using a %s device. " %
hdr['GeneralInfo']['Device'])

You can also add NIRSport1 to this somehow (depends how encoded in header file, hopefully the headers for NIRSport1 are similar format).

@swy7ch
Copy link
Contributor Author

swy7ch commented Jun 29, 2020

You can also add NIRSport1 to this somehow (depends how encoded in header file, hopefully the headers for NIRSport1 are similar format).

Looks like the general header structure is the same, only some variables differ. For instance, in [GeneralInfo], 'APD' (for a NIRScout dataset) are replaced with 'Dual-Tip' (for a NIRSport one). Don't know if they are the same thing with a differnt name, but they're booth booleans.

Style errors:

@drammock Fixed it, except for

mne/io/nirx/tests/test_nirx.py:64:33: W503 line break before binary operator

Not quite used to write Python (C dev here), and PEP8 recommends to break before a binary operator, for readability purpose. Since the choice to break before or after binary operator is a project convention, which path did you chose to follow ?

Force-pushing in the meantime to review other changes

@larsoner
Copy link
Member

We still abide by the old standard (we had a lot of code before they changed it in 2016), see our custom config here

https://github.com/mne-tools/mne-python/blob/master/setup.cfg#L19

I'd be okay with removing this requirement altogether. We can't just switch easily -- flake8 --count says that removing W504 from our ignore list means we'd need to change 575 lines all at once, which is rough...

@swy7ch
Copy link
Contributor Author

swy7ch commented Jun 29, 2020

No problem, just wanted to stick with PEP8 recommendations before knowing what your were :). I've fixed it.

By the way @larsoner, I assume you already now about it but the ci/circleci job is broken:

pyvista.plotting.QtDeprecationError: `BackgroundPlotter` has moved to pyvistaqt.

You can install this from PyPI with: `pip install pyvistaqt`

----------
fname : str
Path to the NIRX data folder or header file.
nosatflags : bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to find a better parameter name? Something without any abbreviations, such as saturated=False or flag_saturated=False would be much better IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured I could use the name of the file, but your request does make sense. Maybe use_saturated ? To be more specific than just saturated. @rob-luke, any idea ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to establish the preferred behaviour of the function first, then answer these detail questions. But my vote is for return_saturated or use_saturated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO flag_saturated is clearer, because it implies that saturated values are flagged somehow (with NaNs in this case). Even if we don't flag/mask saturated values, we will always return saturated values (either masked/flagged or not), so use_saturated or return_saturated does not tell me what they do. So I'd vote for either flag_saturated or mask_saturated, both of which return NaNs for saturated values if True and the raw values if False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go with mask_saturated then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I actually always agreed with @larsoner but I wrote 'annotate' instead of 'nan'. So yes, I think we now all agree on the solution:

  • saturated='ignore' (the default) returns the original values (contained in .nosatflags_wl* files)
  • saturated='nan' replaces saturated values with NaNs (contained in .wl* files)
  • saturated='annotate' will return the original values (just like 'ignore') and in addition will create annotations at the locations of saturated values - however, this option is for another PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok we are all on the same page now. Thanks for clarifying cbrnr.

however, this option is for another PR

Can we do this all in this PR? There is no rush to get this merged as we don't even have a test set committed to the repo yet. I don't want to have incomplete support for NIRSport merged, then we are just going to have triage more issues. Let's add complete support in one hit. If annotate is too much work for one person, then multiple people can commit to this branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thanks for bearing with us Swy7ch. Let us know if there are any other decisions to be made that are blocking your progress.

Copy link
Contributor Author

@swy7ch swy7ch Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, I actually think the process is quite active, which is a good think as far as I'm concerned :)

Can we do this all in this PR?

Definitely ! You already suggested it in #7926, so I didn't ask again. I'll see what I can do concerning the annotation. A few questions in this regard:

  • the data should be used as is (i.e. with NaN values), but the channels must be marked(like the 'bad' channels markin the tutorial), right ? I don't fully understand @drammock's solution
  • any idea of the name of the marking ? 'saturated', I assume ?
  • do you know if marking an object twice could lead to troubles ? I'll try to find a dataset with channels that are both of bad quality and containing NaN anyway, it's just to have a heads-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the data should be used as is (i.e. with NaN values), but the channels must be marked (like the 'bad' channels markin the tutorial), right ? I don't fully understand @drammock's solution

TL;DR: you don't need to understand my solution, because it looks like we're not going to do that anyway (which is fine).

Full explanation: My suggestion of using numpy.ma.masked_array was an alternative to doing separate masked / non-masked options in the reader function. In a masked array approach, the data includes all the saturated values, but the nan-masked values are also read from disk and used to create a numpy mask over the data array. This is nice because it gives users the option to change their mind about using/not using the saturated values later on (without having to reload from disk) because masked arrays make it easy to ignore the mask and see the underlying data if you want to. But the downside is that most of the downstream code (analysis and plotting functions that operate on Raw objects) is not tested to work with masked arrays and I expect several things will break. The good news is, by having string options like ignore and nan and annotate, it leaves open the possibility to add a masked option later if we find that we need/want Raw objects to support masked arrays for other reasons.

@swy7ch swy7ch force-pushed the addNIRSport branch 6 times, most recently from e99491d to c274112 Compare July 1, 2020 14:02
@swy7ch swy7ch force-pushed the addNIRSport branch 5 times, most recently from d7b0ee8 to a5d5751 Compare July 9, 2020 07:55
@rderollepot
Copy link
Contributor

Just finished the PR with the three NIRSport (v1) datasets: mne-tools/mne-testing-data#78. Thanks @rob-luke for the example you linked above, it helped me a lot in the process !

I made some slight changes for this second acquisition:

  • I used a proper probeInfo.mat file made with NIRSite this time
  • All three datasets were recorded in a row using the same calibration, which was good (green) for all channels
  • I added two triggers in each recording

Let me know if anything is missing or can be improved, or if any other testing data is needed.

@rob-luke
Copy link
Member

rob-luke commented Dec 6, 2020

Looks great @rderollepot. I cut a new release at https://github.com/mne-tools/mne-testing-data/releases/tag/0.111

So you will need to update this line to point to 0.111, then it will give you a hash error, so you will need to update the hash a few lines below. Let us know if you get stuck.

releases = dict(testing='0.110', misc='0.7')

@rderollepot
Copy link
Contributor

Sorry @rob-luke but there is something I am not sure to understand : at which point will I have the hash error ?

Is it when I will push the new commit with the updated release number ?

David JULIEN added 4 commits December 9, 2020 15:29
When the probes get saturated (i.e. when the value they return is "too
high"), NIRStar replaces the value with 'NaN' in the standard .wlX file.
The true value is kept in a .nosatflags_wlX file, which is a copy of a
.wlX file with the true probed values.

It is unclear how NIRstar decides that a probe got satured, so we added
a flag to `read_raw_nirx()` and `RawNIRX.__init__` to let the user chose
which file to use. Providing two *.wl1 or two *.wl2 files still raises
an error, as we check for the existence and unicity of the corresponding
*.nosatflags_wlX file.

Note that the directory structure check doesn't yet discriminate between
*.wlX and *.nosatflags_wlX (as it checks for any file ending with 'wlX').
Changing the check could be done in another PR to make sure the user did
not provide only *.nosatflags_wlX files and maybe warn them.
Some devices register 'NaN' values when the probes saturate. Since 'NaN'
values can cause unexpected behaviour, we added the ability to annotate
them so that they can be filtered out during the process.
@larsoner
Copy link
Member

larsoner commented Dec 9, 2020

Is it when I will push the new commit with the updated release number ?

Yes but also I just downloaded the .tar.gz and the hash I got was this if you want to try to avoid the error:

e7ece4615882b99026edb76fb708a3ce

Put this hash in when you update the release number and you should be good

@rderollepot
Copy link
Contributor

Ok thank you @larsoner !

@larsoner larsoner modified the milestones: 0.22, 0.23 Dec 15, 2020
Base automatically changed from master to main January 23, 2021 18:26
@larsoner
Copy link
Member

Where are we on this one? We're going to try to release 0.23 soon and it would be nice to get this in!

@rob-luke
Copy link
Member

@rderollepot if you don't currently have the resources to complete this I can assist by pushing a few commits. But only if you wish, no pressure.

@larsoner larsoner modified the milestones: 0.23, 0.24 Apr 23, 2021
@rob-luke
Copy link
Member

rob-luke commented Apr 24, 2021

@larsoner I just reviewed this code. Its great, but a decent way off finished. The docs don't reflect the code (I think later code modifications just weren't reflected in the docs) and the tests don't really test anything. The authors kindly contributed data to the testing repo, but it is not used here. I am going to put some effort in to tidying up the great work started by @rderollepot and @swy7ch .

I cant figure out how to do the right git magic, so I have continued the PR over at #9348. But we must ensure that the original authors are credited appropriately.

@larsoner
Copy link
Member

Sounds good!

@rderollepot
Copy link
Contributor

Oh, sorry for being late here, I'll catch up here and on the new PR.

@rderollepot if you don't currently have the resources to complete this I can assist by pushing a few commits. But only if you wish, no pressure.

I'll gladly accept your help on this @rob-luke. Since the end of @swy7ch's internship I've indeed been lacking some resources and I wasn't able to consistently spend some time on MNE myself. He also was much more used to the collaborative quality development process than I am, and I must confess I didn't even realize the job on this PR was not finished, sorry for that.

So thanks a lot for helping here ! I'll jump to #9348

@larsoner
Copy link
Member

@rob-luke want to make a push in the next week or so to get this in, or do we bump the milestone again?

@rob-luke
Copy link
Member

Geez, I'd actually thought this was merged @larsoner. I will take a look at the code this week and evaluate the effort required and if its possible with my schedule in the next two weeks. After changing the ppf, this is my next top priority.

@rob-luke
Copy link
Member

@larsoner I think this was completed in #9348 and then I didn't come back and close this one.

@larsoner
Copy link
Member

Great !

@larsoner larsoner closed this Oct 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants