Skip to content

Annotations do not support Unicode characters during I/O roundtrip #11684

@hoechenberger

Description

@hoechenberger

Description of the problem

Raw data (and possibly epochs, which I haven't tested) with Annotations that contain Unicode descriptions cannot be saved.

This was initially reported at https://mne.discourse.group/t/saving-filtered-data-and-epochs/6848/3

Steps to reproduce

# %%
import mne

sample_dir = mne.datasets.sample.data_path()
sample_fname = sample_dir / 'MEG' / 'sample' / 'sample_audvis_raw.fif'

raw = mne.io.read_raw_fif(sample_fname, preload=True)
raw.crop(tmax=60)

annots = mne.Annotations(onset=0, duration=1, description='🙃')
raw.set_annotations(annots)

raw.save('/tmp/foo-raw.fif', overwrite=True)

Link to data

No response

Expected results

File is saved

Actual results

Writing /tmp/foo-raw.fif
Traceback (most recent call last):
  File "/private/tmp/mwe.py", line 13, in <module>
    raw.save('/tmp/foo-raw.fif', overwrite=True)
  File "<decorator-gen-242>", line 12, in save
  File "/Users/hoechenberger/Development/mne-python/mne/io/base.py", line 1691, in save
    _write_raw(
  File "/Users/hoechenberger/Development/mne-python/mne/io/base.py", line 2604, in _write_raw
    cals = _start_writing_raw(
  File "/Users/hoechenberger/Development/mne-python/mne/io/base.py", line 2872, in _start_writing_raw
    _write_annotations(fid, annotations)
  File "/Users/hoechenberger/Development/mne-python/mne/annotations.py", line 1084, in _write_annotations
    write_name_list_sanitized(
  File "/Users/hoechenberger/Development/mne-python/mne/io/write.py", line 149, in write_name_list_sanitized
    write_string(fid, kind, _safe_name_list(lst, "write", name))
  File "/Users/hoechenberger/Development/mne-python/mne/io/write.py", line 130, in write_string
    str_data = str(data).encode("latin1")
UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001f643' in position 0: ordinal not in range(256)

Additional information

Since Unicode Annotations can be set, we should be able to write them. Hence I'm reporting this as a bug and not a feature request.

Proposals by @cbrnr, @drammock, and me:

  • prevent using non-latin1 characters in Annotations (@hoechenberger) – but this is actually not acceptable in 2023 (@drammock, and I fully agree with that)
  • amend FIFF definition to allow Unicode character storage (@cbrnr, @drammock)
  • provide HDF5 export for all affected file types (@drammock)
  • Maybe we can sneak in the Unicode via an ASCII encoding, think base64, but this sounds rather ugly (@hoechenberger)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BUGsprint-2023Issues reserved for the 2023 Intermediate Dev Training

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions