Skip to content

Conversation

@adam2392
Copy link
Member

@adam2392 adam2392 commented Aug 4, 2021

Reference issue

Closes: #9566

What does this implement/fix?

Enables exporting of EDF+ and BDF files, which support annotations.

Note that the montage (i.e. info['dig']) is technically lost upon export.

Additional information

If you look at the Jupyter notebook, then I'm not getting lossless conversion. I think I am perhaps setting the physical max/min and digital max/min incorrectly.

@adam2392
Copy link
Member Author

adam2392 commented Aug 4, 2021

Hi @Teuniz and mne-python maintainers

I took a stab at implementing export functionality, but am still not getting lossless conversion when I re-read the data. I think perhaps there is an issue with digital max/min and physical max/min?

The main part of the file is: https://github.com/mne-tools/mne-python/blob/aa85bae98415f12da1710de25955c8a33352ffa0/mne/export/_edf.py

The Jupiter notebooks are temporary to show the process I am taking.

@adam2392 adam2392 marked this pull request as draft August 4, 2021 15:44
@Teuniz
Copy link

Teuniz commented Aug 4, 2021

Hi @Teuniz and mne-python maintainers

I took a stab at implementing export functionality, but am still not getting lossless conversion when I re-read the data. I think perhaps there is an issue with digital max/min and physical max/min?

The main part of the file is: https://github.com/mne-tools/mne-python/blob/aa85bae98415f12da1710de25955c8a33352ffa0/mne/export/_edf.py

The Jupiter notebooks are temporary to show the process I am taking.

I had a quick look at the code but everything looks fine to me.
Lossless conversion is impossible when you supply samples of type float.
Resolution of EDF when using +/-3000 uV range, is approx. 0.09 uV (6000 / (2^16)).

@cbrnr
Copy link
Contributor

cbrnr commented Aug 5, 2021

In addition to what @Teuniz has already said, EDF files also have a record size, which I believe is 1s by default. Therefore, if your signal is not an integer multiple of the record size, you will have additional samples at the end of the file. AFAIK it is not possible to change this value in pyEDFlib at the moment, but it might be possible in edflib_python.

Two comments regarding your implementation:

  1. Why do you hardcode the physical minimum/maximum?
  2. Why do you have to loop over each second of data when writing to the file? Isn't there a function that dumps everything in one go? See my implementation using pyEDFlib here: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46-L90

@Teuniz
Copy link

Teuniz commented Aug 5, 2021

2\. Why do you have to loop over each second of data when writing to the file? Isn't there a function that dumps everything in one go?

Nope, there isn't. Sample data needs to be written per datarecord.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 5, 2021

Nope, there isn't. Sample data needs to be written per datarecord.

OK, so unlike https://pyedflib.readthedocs.io/en/latest/ref/edfwriter.html#pyedflib.EdfWriter.writeSamples then.

@Teuniz
Copy link

Teuniz commented Aug 5, 2021

but it might be possible in edflib_python.

Yes, the datarecord duration has a default value of 1 second but can be changed using "setDataRecordDuration()".

@cbrnr
Copy link
Contributor

cbrnr commented Aug 5, 2021

Yes, the datarecord duration has a default value of 1 second but can be changed using "setDataRecordDuration()".

Is there any caveat setting it to a value of 1 / sfreq if you really wanted to produce a file with exactly the same length as the data (assuming that longer records are not possible)?

@Teuniz
Copy link

Teuniz commented Aug 5, 2021

Is there any caveat setting it to a value of 1 / sfreq

Good question and yes there are.
First, the datarecord duration is written into the header in plain readable ascii with space for 8 characters, e.g. "0.123456".
This can create rounding errors which results in the samplerate being off.
Second, EDFlib can write max. 1 annotation per datarecord. This tradeoff has been made because the storage space for
annotations needs to beknown in advance, it can not be changed on the fly.
So, lowering the datarecord duration will increase the max. number of annotations that can be written for a given recording duration, increasing it will lower the storage space for annotations.
On a side note, EDFlib writes one annotation channel by default but it can be set to multiple channels if more storage space is needed.

edit: a datarecord duration longer than 1 second is possible.

@Teuniz
Copy link

Teuniz commented Aug 5, 2021

In my experience, it's better to set the datarecord duration to one of the following values, if possible:
1, 0.5, 0.4, 0.25, 0.2, 0.125, 0.1, 0.0625 second. If the last datarecord can not be completely filled, simply don't write that datarecord
or fill it up with the baseline or zero.
Ofcourse, for samplerates below 1 Hz, one needs to use a datarecord duration longer than 1 second.
Unfortunately, there's no perfect solution that catches (almost) all use cases.
The application needs to use some intelligence/algorithm to decide what's the best value for the datarecord duration
depending on the samplerate of the signals and the number of annotations to write.
This is not part of EDFlib because it's practically impossible to catch and predict all possible use cases.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 5, 2021

Thanks @Teuniz, these are some important insights. I'd go for a simple approach then, namely using a 1s data record duration and filling any remaining samples with zero (of course if that is the case I'd issue a warning to inform the user what happened).

@adam2392 adam2392 marked this pull request as ready for review August 5, 2021 15:29
@adam2392
Copy link
Member Author

adam2392 commented Aug 5, 2021

  1. Why do you hardcode the physical minimum/maximum?

@cbrnr I'm not very familiar with the physical min/max stuff. My understanding is that these are the "ranges" at which the actual analog physical values (i.e. voltage) were capable of having. I followed @Teuniz 's example test files in EDFlib-Python.

Is there a good way I could proceed with setting these based on the data?

Is there an issue w/ hardcoding it in this way?

@cbrnr
Copy link
Contributor

cbrnr commented Aug 5, 2021

See how I do it in the link above.

@Teuniz
Copy link

Teuniz commented Aug 5, 2021

See how I do it in the link above.

This is not as intended by the EDF format.
Pys max and phys min should be the clipping levels of the ADC input and they should be the same for all channels.
If this information is not available, for example when converting from another format, use a safe value for all EEG channels
which is, for example, +3000 uV and -3000 uV.
For example, Nihon Kohden uses +3200 uV and -3200 uV for all EEG channels (which are the actual clipping levels
of their input amplifiers & ADC).
In other words, the phys max and phys min values in the EDF header must NOT be used as an indication of the actual
peak values in that signal in that file.

A similar issue was going on here: sccn/eeglab#246

@adam2392
Copy link
Member Author

adam2392 commented Aug 5, 2021

@Teuniz Any thoughts on how to test before/after writing data and improving the matching of the exported data?

I'm only getting matches up to 1-2 decimal places after exporting the data. Wouldn't/shouldn't the resolution improve if I use a smaller range (physical min/max) based on your calculation above?

@cbrnr
Copy link
Contributor

cbrnr commented Aug 5, 2021

I'd still implement a safety check to make sure signal maxima/minima are not exceeding the hard-coded values.

@agramfort
Copy link
Member

@rob-luke it's a good question but a hard question for software engineering....

I think this question is targeted for the downstream projects we rely on for exporting. I have been reluctant
for many years to add export support in mne and we have recently agreed to create the mne.export submodule but not to directly support the writing software code. I think we don't have the bandwidth for this.

@Teuniz
Copy link

Teuniz commented Aug 22, 2021

...I tried to read a file written using this PR with https://github.com/sam81/BDF.jl and it failed to read with the following error (reading the original file worked fine).

ERROR: LoadError: ArgumentError: invalid base 10 digit '.' in "-15573.1"
for physMin and physMax

It complains because it expects an integer value but physical max/min can also be a real number.
That Julia module is a very sloppy implementation of BDF also because it doesn't support different samplerates.
It seems to be compatible only with genuine Biosemi recordings because Biosemi hardware writes integer values
(-262144/262143) for physical max/min and Biosemi hardware always uses the same samplerate for all channels.
Not because it's a requirement of the format (which isn't) but because their hardware has these properties.

So, if Fieldtrip, or any other software for that matter, can only handle a subset of BDF (e.g. all channels must
have the same samplerate, as used by Biosemi), then yes, you can expect some problems with that software.
If that is the case (I don't say it is!), then it's up to the maintainers of that software to fix it or not.

Rationale:
BDF is a 24-bits version of EDF. The only difference is that the sample size is changed from two to three
bytes. So, if EDF says that different samplerates are allowed, it's also allowed in BDF. The same applies to
real numbers for phys max/min.

@rob-luke
Copy link
Member

Thanks for the information @Teuniz and @agramfort

That Julia module is a very sloppy implementation of BDF

I think this is an unnecessarily harsh criticism. That software is very good. The package does exactly what it claims "to read biosemi BDF files", and I have found it to be extremely robust, I've read literally hundreds of files with it from different labs. Which brings me to my second point...

I assumed (and I guess others will to), that if you can read a file from a biosemi device in the biosemi data format, then you can read any bdf. Most people don't care to read the file format specification, and based on the file extension name this behaviour isn't intuitive.

I suggest to add a very prominent note to the exporter highlighting what you've told me. That despite the name being BIOSEMI data format, that BDF is a more encompassing specification and that these files may not be compatible with other software that is able to read biosemi data files.

@adam2392
Copy link
Member Author

Perhaps in the physical min max auto setting we just round up to the nearest integer and this would solve this issue?

@Teuniz
Copy link

Teuniz commented Aug 22, 2021

Perhaps in the physical min max auto setting we just round up to the nearest integer and this would solve this issue?

It's not the only issue. There can be an issue with some software that expect equal samplerates for all signals,
there can be an issue with software that expects the datarecord duration to be always 1 second,
and so on, and so on. I have seen all these problems happen before, with EDF.
So, my suggestion is to do nothing, because in my experience, it will sort out by itself, like it did with EDF.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 23, 2021

I would simply disable BDF export. This file format is likely not meant to be an extension for EDF, but rather something that Biosemi created to store their data with more precision. If someone needs a better format than EDF (i.e. not int16 data samples), they could use other formats that we already support (e.g. BrainVision, EEGLAB). If they want to stick with EDF-like formats, GDF was designed exactly to overcome the limitations of EDF/BDF (Biosig for Python can export to GDF I think, but until this package provides binary wheels for all platforms on PyPI it is rather difficult to build).

I don't want this thread turn into a file format discussion. Let's just acknowledge that there are several widely used file formats around that we should support. We already support reading many popular formats, but we don't have to do that for exporting. I'd restrict MNE-Python export support to the most popular formats only, which IMO are EDF, BrainVision, and EEGLAB.

@adam2392
Copy link
Member Author

Alright, removing BDF support and just re-pushing.

@adam2392 adam2392 requested a review from agramfort August 23, 2021 22:32
Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbrnr feel free to MRG if happy

@adam2392 adam2392 changed the title [MRG] Enable export of EDF and BDF files [MRG] Enable export of EDF files Aug 24, 2021
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work @adam2392 and all the reviewers

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Copy link
Contributor

@cbrnr cbrnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @adam2392!

Comment on lines 158 to 159
raise RuntimeError(f"Setting Patient Birth Date to {birthday} "
f"returned an error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific here, e.g. mention the error or at least offer a hint what could have gone wrong?

Suggested change
raise RuntimeError(f"Setting Patient Birth Date to {birthday} "
f"returned an error")
raise RuntimeError(f"Setting patient birth date to {birthday} "
f"returned an error")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #9643 (comment)

I don't think this error would ever actually occur, unless someone hacked MNE to make the birthdate non-compliant in the first place. MNE has checks for these types of metadata already in place in Raw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why are we re-raising the error then if we cannot be more specific? I would not check for this error if it is already checked elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that EDFLib-Python doesn't actually raise an error. It will just return 0 for error and 1 for success.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we meet the specifications of EDF when setting up our data structures, then everything should work. I guess, I meant to say, these error checks are there to make sure that at least the user is aware that there was some buggy implementation change on our end that strayed away from EDF, and thus export EDF stopped working.

Else just returning 0 would not result in a Python error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lmk wyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be interesting to see what happens. Does the EDF file still get created?

In any case, why don't you use _try_to_set_value() in these cases? This already raises an error if the return value is not equal to zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_try_to_set_value only works on one argument functions. I wasn't sure how to generalize that function to make it for these 2 cases :/.

I'll add some tests for faulty annotations/birthdates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can test birth date and measurement date with non-compliant EDF metadata, but Annotations implicitly only has errors that are already checked and controlled for in Raw, so I can't actually even create an error:

    # add a bad annotation
    annots = Annotations(onset=-1, duration=0, description='test')
    raw = RawArray(data, info)
    raw.set_annotations(annots)
    with pytest.raises(RuntimeError, match='writeAnnotation()'):
        raw.export(temp_fname)

^ won't even work cuz the annotation is dropped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the test file to reflect these two additional tests

Comment on lines 174 to 175
raise RuntimeError(f"Setting Start Date Time {meas_date} "
f"returned an error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can we be more specific?

Suggested change
raise RuntimeError(f"Setting Start Date Time {meas_date} "
f"returned an error")
raise RuntimeError(f"Setting start date time {meas_date} "
f"returned an error")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #9643 (comment)

I don't think this error would ever actually occur, unless someone hacked MNE. MNE has checks for these types of metadata already in place in Raw. If any of these errors ever did occur, it would be some weird runtime bug I presume.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

adam2392 and others added 4 commits August 24, 2021 11:12
@adam2392 adam2392 requested a review from cbrnr August 24, 2021 17:05
@cbrnr cbrnr merged commit 59c3b92 into mne-tools:main Aug 24, 2021
@cbrnr
Copy link
Contributor

cbrnr commented Aug 24, 2021

Thanks @adam2392!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable exporting EDF files

8 participants