[MRG] Enable export of EDF files #9643

adam2392 · 2021-08-04T15:31:17Z

Reference issue

Closes: #9566

What does this implement/fix?

Enables exporting of EDF+ ~~and BDF files,~~ which support annotations.

Note that the montage (i.e. info['dig']) is technically lost upon export.

Additional information

If you look at the Jupyter notebook, then I'm not getting lossless conversion. I think I am perhaps setting the physical max/min and digital max/min incorrectly.

adam2392 · 2021-08-04T15:38:30Z

Hi @Teuniz and mne-python maintainers

I took a stab at implementing export functionality, but am still not getting lossless conversion when I re-read the data. I think perhaps there is an issue with digital max/min and physical max/min?

The main part of the file is: https://github.com/mne-tools/mne-python/blob/aa85bae98415f12da1710de25955c8a33352ffa0/mne/export/_edf.py

The Jupiter notebooks are temporary to show the process I am taking.

Teuniz · 2021-08-04T18:17:30Z

Hi @Teuniz and mne-python maintainers

I took a stab at implementing export functionality, but am still not getting lossless conversion when I re-read the data. I think perhaps there is an issue with digital max/min and physical max/min?

The main part of the file is: https://github.com/mne-tools/mne-python/blob/aa85bae98415f12da1710de25955c8a33352ffa0/mne/export/_edf.py

The Jupiter notebooks are temporary to show the process I am taking.

I had a quick look at the code but everything looks fine to me.
Lossless conversion is impossible when you supply samples of type float.
Resolution of EDF when using +/-3000 uV range, is approx. 0.09 uV (6000 / (2^16)).

cbrnr · 2021-08-05T06:41:50Z

In addition to what @Teuniz has already said, EDF files also have a record size, which I believe is 1s by default. Therefore, if your signal is not an integer multiple of the record size, you will have additional samples at the end of the file. AFAIK it is not possible to change this value in pyEDFlib at the moment, but it might be possible in edflib_python.

Two comments regarding your implementation:

Why do you hardcode the physical minimum/maximum?
Why do you have to loop over each second of data when writing to the file? Isn't there a function that dumps everything in one go? See my implementation using pyEDFlib here: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46-L90

Teuniz · 2021-08-05T08:05:57Z

2\. Why do you have to loop over each second of data when writing to the file? Isn't there a function that dumps everything in one go?

Nope, there isn't. Sample data needs to be written per datarecord.

cbrnr · 2021-08-05T08:09:32Z

Nope, there isn't. Sample data needs to be written per datarecord.

OK, so unlike https://pyedflib.readthedocs.io/en/latest/ref/edfwriter.html#pyedflib.EdfWriter.writeSamples then.

Teuniz · 2021-08-05T08:11:02Z

but it might be possible in edflib_python.

Yes, the datarecord duration has a default value of 1 second but can be changed using "setDataRecordDuration()".

cbrnr · 2021-08-05T08:14:38Z

Yes, the datarecord duration has a default value of 1 second but can be changed using "setDataRecordDuration()".

Is there any caveat setting it to a value of 1 / sfreq if you really wanted to produce a file with exactly the same length as the data (assuming that longer records are not possible)?

Teuniz · 2021-08-05T08:31:48Z

Is there any caveat setting it to a value of 1 / sfreq

Good question and yes there are.
First, the datarecord duration is written into the header in plain readable ascii with space for 8 characters, e.g. "0.123456".
This can create rounding errors which results in the samplerate being off.
Second, EDFlib can write max. 1 annotation per datarecord. This tradeoff has been made because the storage space for
annotations needs to beknown in advance, it can not be changed on the fly.
So, lowering the datarecord duration will increase the max. number of annotations that can be written for a given recording duration, increasing it will lower the storage space for annotations.
On a side note, EDFlib writes one annotation channel by default but it can be set to multiple channels if more storage space is needed.

edit: a datarecord duration longer than 1 second is possible.

Teuniz · 2021-08-05T08:44:33Z

In my experience, it's better to set the datarecord duration to one of the following values, if possible:
1, 0.5, 0.4, 0.25, 0.2, 0.125, 0.1, 0.0625 second. If the last datarecord can not be completely filled, simply don't write that datarecord
or fill it up with the baseline or zero.
Ofcourse, for samplerates below 1 Hz, one needs to use a datarecord duration longer than 1 second.
Unfortunately, there's no perfect solution that catches (almost) all use cases.
The application needs to use some intelligence/algorithm to decide what's the best value for the datarecord duration
depending on the samplerate of the signals and the number of annotations to write.
This is not part of EDFlib because it's practically impossible to catch and predict all possible use cases.

cbrnr · 2021-08-05T08:53:04Z

Thanks @Teuniz, these are some important insights. I'd go for a simple approach then, namely using a 1s data record duration and filling any remaining samples with zero (of course if that is the case I'd issue a warning to inform the user what happened).

adam2392 · 2021-08-05T16:33:27Z

Why do you hardcode the physical minimum/maximum?

@cbrnr I'm not very familiar with the physical min/max stuff. My understanding is that these are the "ranges" at which the actual analog physical values (i.e. voltage) were capable of having. I followed @Teuniz 's example test files in EDFlib-Python.

Is there a good way I could proceed with setting these based on the data?

Is there an issue w/ hardcoding it in this way?

cbrnr · 2021-08-05T16:36:32Z

See how I do it in the link above.

Teuniz · 2021-08-05T19:06:47Z

See how I do it in the link above.

This is not as intended by the EDF format.
Pys max and phys min should be the clipping levels of the ADC input and they should be the same for all channels.
If this information is not available, for example when converting from another format, use a safe value for all EEG channels
which is, for example, +3000 uV and -3000 uV.
For example, Nihon Kohden uses +3200 uV and -3200 uV for all EEG channels (which are the actual clipping levels
of their input amplifiers & ADC).
In other words, the phys max and phys min values in the EDF header must NOT be used as an indication of the actual
peak values in that signal in that file.

A similar issue was going on here: sccn/eeglab#246

adam2392 · 2021-08-05T19:31:20Z

@Teuniz Any thoughts on how to test before/after writing data and improving the matching of the exported data?

I'm only getting matches up to 1-2 decimal places after exporting the data. Wouldn't/shouldn't the resolution improve if I use a smaller range (physical min/max) based on your calculation above?

cbrnr · 2021-08-05T20:21:13Z

I'd still implement a safety check to make sure signal maxima/minima are not exceeding the hard-coded values.

agramfort · 2021-08-22T14:14:41Z

@rob-luke it's a good question but a hard question for software engineering....

I think this question is targeted for the downstream projects we rely on for exporting. I have been reluctant
for many years to add export support in mne and we have recently agreed to create the mne.export submodule but not to directly support the writing software code. I think we don't have the bandwidth for this.

Teuniz · 2021-08-22T17:03:53Z

...I tried to read a file written using this PR with https://github.com/sam81/BDF.jl and it failed to read with the following error (reading the original file worked fine).
ERROR: LoadError: ArgumentError: invalid base 10 digit '.' in "-15573.1"
for physMin and physMax

It complains because it expects an integer value but physical max/min can also be a real number.
That Julia module is a very sloppy implementation of BDF also because it doesn't support different samplerates.
It seems to be compatible only with genuine Biosemi recordings because Biosemi hardware writes integer values
(-262144/262143) for physical max/min and Biosemi hardware always uses the same samplerate for all channels.
Not because it's a requirement of the format (which isn't) but because their hardware has these properties.

So, if Fieldtrip, or any other software for that matter, can only handle a subset of BDF (e.g. all channels must
have the same samplerate, as used by Biosemi), then yes, you can expect some problems with that software.
If that is the case (I don't say it is!), then it's up to the maintainers of that software to fix it or not.

Rationale:
BDF is a 24-bits version of EDF. The only difference is that the sample size is changed from two to three
bytes. So, if EDF says that different samplerates are allowed, it's also allowed in BDF. The same applies to
real numbers for phys max/min.

rob-luke · 2021-08-22T21:24:41Z

Thanks for the information @Teuniz and @agramfort

That Julia module is a very sloppy implementation of BDF

I think this is an unnecessarily harsh criticism. That software is very good. The package does exactly what it claims "to read biosemi BDF files", and I have found it to be extremely robust, I've read literally hundreds of files with it from different labs. Which brings me to my second point...

I assumed (and I guess others will to), that if you can read a file from a biosemi device in the biosemi data format, then you can read any bdf. Most people don't care to read the file format specification, and based on the file extension name this behaviour isn't intuitive.

I suggest to add a very prominent note to the exporter highlighting what you've told me. That despite the name being BIOSEMI data format, that BDF is a more encompassing specification and that these files may not be compatible with other software that is able to read biosemi data files.

adam2392 · 2021-08-22T21:37:30Z

Perhaps in the physical min max auto setting we just round up to the nearest integer and this would solve this issue?

Teuniz · 2021-08-22T22:44:44Z

Perhaps in the physical min max auto setting we just round up to the nearest integer and this would solve this issue?

It's not the only issue. There can be an issue with some software that expect equal samplerates for all signals,
there can be an issue with software that expects the datarecord duration to be always 1 second,
and so on, and so on. I have seen all these problems happen before, with EDF.
So, my suggestion is to do nothing, because in my experience, it will sort out by itself, like it did with EDF.

cbrnr · 2021-08-23T05:48:29Z

I would simply disable BDF export. This file format is likely not meant to be an extension for EDF, but rather something that Biosemi created to store their data with more precision. If someone needs a better format than EDF (i.e. not int16 data samples), they could use other formats that we already support (e.g. BrainVision, EEGLAB). If they want to stick with EDF-like formats, GDF was designed exactly to overcome the limitations of EDF/BDF (Biosig for Python can export to GDF I think, but until this package provides binary wheels for all platforms on PyPI it is rather difficult to build).

I don't want this thread turn into a file format discussion. Let's just acknowledge that there are several widely used file formats around that we should support. We already support reading many popular formats, but we don't have to do that for exporting. I'd restrict MNE-Python export support to the most popular formats only, which IMO are EDF, BrainVision, and EEGLAB.

adam2392 · 2021-08-23T14:35:03Z

Alright, removing BDF support and just re-pushing.

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

agramfort

@cbrnr feel free to MRG if happy

sappelhoff

nice work @adam2392 and all the reviewers

doc/changes/latest.inc

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

cbrnr

Very nice work @adam2392!

mne/export/_edf.py

cbrnr · 2021-08-24T14:55:44Z

mne/export/_edf.py

+                raise RuntimeError(f"Setting Patient Birth Date to {birthday} "
+                                   f"returned an error")


Can we be more specific here, e.g. mention the error or at least offer a hint what could have gone wrong?

Suggested change

raise RuntimeError(f"Setting Patient Birth Date to {birthday} "

f"returned an error")

raise RuntimeError(f"Setting patient birth date to {birthday} "

f"returned an error")

See: #9643 (comment)

I don't think this error would ever actually occur, unless someone hacked MNE to make the birthdate non-compliant in the first place. MNE has checks for these types of metadata already in place in Raw.

So why are we re-raising the error then if we cannot be more specific? I would not check for this error if it is already checked elsewhere.

So the issue is that EDFLib-Python doesn't actually raise an error. It will just return 0 for error and 1 for success.

Assuming we meet the specifications of EDF when setting up our data structures, then everything should work. I guess, I meant to say, these error checks are there to make sure that at least the user is aware that there was some buggy implementation change on our end that strayed away from EDF, and thus export EDF stopped working.

Else just returning 0 would not result in a Python error.

Yes, it would be interesting to see what happens. Does the EDF file still get created?

In any case, why don't you use _try_to_set_value() in these cases? This already raises an error if the return value is not equal to zero.

_try_to_set_value only works on one argument functions. I wasn't sure how to generalize that function to make it for these 2 cases :/.

I'll add some tests for faulty annotations/birthdates.

I can test birth date and measurement date with non-compliant EDF metadata, but Annotations implicitly only has errors that are already checked and controlled for in Raw, so I can't actually even create an error:

# add a bad annotation annots = Annotations(onset=-1, duration=0, description='test') raw = RawArray(data, info) raw.set_annotations(annots) with pytest.raises(RuntimeError, match='writeAnnotation()'): raw.export(temp_fname)

^ won't even work cuz the annotation is dropped.

I've updated the test file to reflect these two additional tests

cbrnr · 2021-08-24T14:56:45Z

mne/export/_edf.py

+            raise RuntimeError(f"Setting Start Date Time {meas_date} "
+                               f"returned an error")


Same here, can we be more specific?

Suggested change

raise RuntimeError(f"Setting Start Date Time {meas_date} "

f"returned an error")

raise RuntimeError(f"Setting start date time {meas_date} "

f"returned an error")

See: #9643 (comment)

I don't think this error would ever actually occur, unless someone hacked MNE. MNE has checks for these types of metadata already in place in Raw. If any of these errors ever did occur, it would be some weird runtime bug I presume.

mne/export/_edf.py

mne/utils/docs.py

Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>

cbrnr · 2021-08-24T17:18:34Z

Thanks @adam2392!

adam2392 added 6 commits August 4, 2021 10:14

Adding export edf.

5e8612d

Adding updated edf exporter.

223ba53

Adding export edf.

2c26a47

Adding updated edf exporter.

f1c04d5

Merge branch 'edf' of github.com:adam2392/mne-python into edf

b8ef6b9

Fix.

aa85bae

Cleanup.

4681613

adam2392 marked this pull request as draft August 4, 2021 15:44

adam2392 added 2 commits August 4, 2021 15:42

Fixing export.

cef450c

Adding updated export test.

75b01fb

adam2392 marked this pull request as ready for review August 5, 2021 15:29

Adding updated edf tests.

f22f2e5

Fix export.

f1f2546

Fixing physical min and max.

b814f40

Clean up.

a1847f1

adam2392 and others added 4 commits August 23, 2021 10:38

Apply suggestions from code review

c465759

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>

merge

cdfe6e6

Merge branch 'edf' of github.com:adam2392/mne-python into edf

b03c19d

increase coverage.

6feb5a2

drammock approved these changes Aug 23, 2021

View reviewed changes

adam2392 requested a review from agramfort August 23, 2021 22:32

agramfort approved these changes Aug 24, 2021

View reviewed changes

adam2392 changed the title ~~[MRG] Enable export of EDF and BDF files~~ [MRG] Enable export of EDF files Aug 24, 2021

sappelhoff reviewed Aug 24, 2021

View reviewed changes

doc/changes/latest.inc Outdated Show resolved Hide resolved

Update doc/changes/latest.inc

312fe85

Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>

cbrnr reviewed Aug 24, 2021

View reviewed changes

adam2392 and others added 4 commits August 24, 2021 11:12

Apply suggestions from code review

fd4efce

Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>

remove

c218229

Remove noqa

3fa8017

Add additional tests

46c5468

adam2392 requested a review from cbrnr August 24, 2021 17:05

cbrnr merged commit 59c3b92 into mne-tools:main Aug 24, 2021

adam2392 mentioned this pull request Aug 24, 2021

Support exporting to EDF format mne-tools/mne-bids#863

Closed

adam2392 deleted the edf branch August 25, 2021 01:31

This was referenced Aug 25, 2021

[MRG, BUG] Fixed bug with EDF channel types and minor bug where other types of channels were not written and bug where annotations were not written to correct scale #9694

Merged

Improving EDF reading in line with specification #9704

Closed

hofaflo mentioned this pull request Mar 17, 2024

Resolution issue with EDF export #12493

Closed

		raise RuntimeError(f"Setting Patient Birth Date to {birthday} "
		f"returned an error")

		raise RuntimeError(f"Setting Start Date Time {meas_date} "
		f"returned an error")

Uh oh!

[MRG] Enable export of EDF files #9643

[MRG] Enable export of EDF files #9643

Uh oh!

Conversation

adam2392 commented Aug 4, 2021 • edited by sappelhoff Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

adam2392 commented Aug 4, 2021

Uh oh!

Teuniz commented Aug 4, 2021

Uh oh!

cbrnr commented Aug 5, 2021

Uh oh!

Teuniz commented Aug 5, 2021

Uh oh!

cbrnr commented Aug 5, 2021

Uh oh!

Teuniz commented Aug 5, 2021

Uh oh!

cbrnr commented Aug 5, 2021

Uh oh!

Teuniz commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Teuniz commented Aug 5, 2021

Uh oh!

cbrnr commented Aug 5, 2021

Uh oh!

adam2392 commented Aug 5, 2021

Uh oh!

cbrnr commented Aug 5, 2021

Uh oh!

Teuniz commented Aug 5, 2021

Uh oh!

adam2392 commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbrnr commented Aug 5, 2021

Uh oh!

agramfort commented Aug 22, 2021

Uh oh!

Teuniz commented Aug 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rob-luke commented Aug 22, 2021

Uh oh!

adam2392 commented Aug 22, 2021

Uh oh!

Teuniz commented Aug 22, 2021

Uh oh!

cbrnr commented Aug 23, 2021

Uh oh!

adam2392 commented Aug 23, 2021

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

sappelhoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cbrnr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

adam2392 commented Aug 4, 2021 •

edited by sappelhoff

Loading

Teuniz commented Aug 5, 2021 •

edited

Loading

adam2392 commented Aug 5, 2021 •

edited

Loading

Teuniz commented Aug 22, 2021 •

edited

Loading