-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[MRG+1] Add CSV to read annotations #5699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Commenting |
I think the idea is that this is not just for internal use, but rather so that people can export and have something human readable/editable/re-readable. |
Codecov Report
@@ Coverage Diff @@
## master #5699 +/- ##
=========================================
+ Coverage 86.71% 88.61% +1.9%
=========================================
Files 369 369
Lines 69319 69463 +144
Branches 11663 11678 +15
=========================================
+ Hits 60108 61557 +1449
+ Misses 6388 5055 -1333
- Partials 2823 2851 +28 |
|
you went the header route while I was suggesting to have onsets as timestamps. If you have onsets as ISO timestamps you can do: annot = mne.read_annot('annot.csv') and raw.annotations.orig_time should be recovered without any custom hack on the csv file. |
|
Yes I think we decided to use the ISO timestamp approach as the optimal solution |
|
... basically @agramfort's solution plus:
I think this gets us the cleanest solution. |
|
we just discussed with @massich that we will have both a .txt format
done so far here and a proper .csv with onset as ISO timestamps
|
|
based on #5640 (comment) I thought we wanted to avoid the conversion on the onsets. I also see the confusion of setting stuff in the header, that then MS excel would not understand. Maybe the solution here proposed is only good for a By the end, my purpose with the PR was to try |
|
agreed. I think we have a plan !
|
mne/tests/test_annotations.py
Outdated
| pytest.param(None, 0, id='None'), | ||
| pytest.param(42, 42.0, id='Scalar'), | ||
| pytest.param(3.14, 3.14, id='Float'), | ||
| # pytest.param((3, 1400000), 3.14, id='Scalar touple') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@massich can I remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It had an extra 0.
mne/annotations.py
Outdated
| """Convert meas_date to seconds.""" | ||
| """Convert meas_date to seconds. | ||
|
|
||
| if meas_date is string, it should conform to '%Y-%m-%d %H:%M:%S.%f' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Captitalize "If". There are a couple of typos, here's my suggestion:
"""Convert meas_date to seconds.
If `meas_date` is a string, it should conform to the ISO8601 format, i.e.
'%Y-%m-%d %H:%M:%S.%f'. Otherwise, this function returns 0. Note that
ISO8601 allows for ' ' or 'T' as delimiters between date and time.
"""
However, I'm not sure if you meant to write `'\t'` instead of `'T'` in the last sentence - could you please check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the regex does have [ T] so T is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the regex conforms to the standard, which is with ' ' or 'T'. But until now I did only implemted the ' '. Thats why the test says invalid. It should be straight forward to implement if needed but for now YAGNI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you should clarify in the comment that 'T' is not supported ATM.
| meas_date = _handle_meas_date(annot.orig_time) | ||
| orig_dt = datetime.utcfromtimestamp(meas_date) | ||
| content += "# orig_time : %s \n" % orig_dt | ||
| content += "# onset, duration, description\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fairly standard to have a header line with the column headers in the first line. All readers should support this header line, so I'd remove the comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment. Are you proposing this:
- content += "# onset, duration, description\n"
+ content += "onset, duration, description\n"This complicates the parsing of the .txt using np.loadtxt quite a lot. And by the end if anyone loads a .txt should be able to see it anyway.
I'm not sure about this change. Anyone else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skiprows=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbrnr what you want is a csv file you can read with pandas. It's not anymore a txt file you can read with np.loadtxt or matlab without understanding the file specifics.
I would therefore stick to the current proposal. Any other opinion anyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea which is more standard. It would be good to check if pandas still works without the # on the first line. If it does not, then having # on the first line is probably worth it even if it's less standard. (On readers that can support a no-#-first-line-header, the data fields will presumably just appear as "column 1/2/3" or something similar, which seems less than ideal but livable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the # you'll need to specify column names on read. It will treat the # as a comment and will ignore it as np.loadtxt or a matlad reader of floats in txt files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, basically I wanted a CSV file but without the timestamps. The idea was to create a file that can be easily read by the user, and this means pd.read_csv and not np.loadtxt for me. Either way, I can live with the #, which means I have to manually specify the column names.
mne/annotations.py
Outdated
| This function reads a .fif, .fif.gz, .vrmk, .edf or .set file and makes an | ||
| :class:`mne.Annotations` object. | ||
| This function reads a .fif, .fif.gz, .vrmk, .edf, .txt or .set file and | ||
| makes an :class:`mne.Annotations` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.csv is missing
mne/annotations.py
Outdated
|
|
||
|
|
||
| def _read_annotations_csv(fname): | ||
| """Write annotations to csv. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read annotations from CSV.
| return re.compile(ISO8601).match(candidate_str) is not None | ||
|
|
||
|
|
||
| def _read_annotations_txt_parse_header(fname): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a separate function for this? Will this be used elsewhere? Otherwise, I'd include that in _read_annotations_txt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it a separate function so that I can get orig_time. I'm silently converging to onset, duration, desc = _read_annotations_xxx to unify read_annotations, avoid import Annotations in mne/io/XXXX/XXX.py and always create the annotations object in read_annotations.
But I've not decided yet what's the best.
| pytest.param((3, 140000), 3.14, id='Scalar touple'), | ||
| pytest.param('2002-12-03 19:01:11.720100', 1038942071.7201, | ||
| id='valid iso8601 string'), | ||
| pytest.param('2002-12-03T19:01:11.720100', 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an invalid string? I thought it'd be good to also test date strings with the funny T character separating date from time (which should be valid according to the regex).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only implemented ' '. Thats why I made this test, to show that does not work. It is a valid ISO, but we don't parse it.
|
This is the set of minimum version I found to work. |
|
I don't get it. Does anyone see a difference between my system and the CI? ~/code/mne-python read_annotations_csv*
(mne27_xx) ❯ conda list
# packages in environment at /home/sik/miniconda3/envs/mne27_xx:
#
# Name Version Build Channel
atomicwrites 1.2.1 py27_0
attrs 18.2.0 py27h28b3542_0
blas 1.0 mkl
ca-certificates 2018.03.07 0
cairo 1.12.18 6
certifi 2018.10.15 py27_0
coverage 4.5.1 py27h14c3975_0
fontconfig 2.11.1 6
freetype 2.5.5 1
funcsigs 1.0.2 py27_0
icu 58.2 h9c2bf20_1
intel-openmp 2019.0 118
libedit 3.1.20170329 h6b74fdf_2
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran 1.0 0
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.17 0
libstdcxx-ng 8.2.0 hdf63c60_1
libxml2 2.9.8 h26e45fe_1
matplotlib 1.4.3 np19py27_2
mkl 2019.0 118
mne 0.17.dev0 <pip>
more-itertools 4.3.0 py27_0
ncurses 6.1 hf484d3e_0
nose 1.3.7 py27_2
numpy 1.9.3 py27h1b885b7_7
numpy-base 1.9.3 py27hdbf6ddf_7
openssl 1.1.1 h7b6447c_0
pandas 0.16.2 np19py27_0
pathlib2 2.3.2 py27_0
pip 18.1 py27_0
pixman 0.32.6 0
pluggy 0.8.0 py27_0
py 1.7.0 py27_0
py2cairo 1.10.0 py27_2
pyparsing 2.0.3 py27_0
pyqt 4.11.4 py27_4
pytest 3.10.0 py27_0
pytest-cov 2.6.0 py27_0
python 2.7.15 h9bab390_2
python-dateutil 2.7.5 py27_0
pytz 2018.7 py27_0
qt 4.8.7 2
readline 7.0 h7b6447c_5
scandir 1.9.0 py27h14c3975_0
scikit-learn 0.15.2 np19py27_0
scipy 0.14.0 np19py27_0
setuptools 40.5.0 py27_0
sip 4.18 py27_0
six 1.11.0 py27_1
sqlite 3.25.2 h7b6447c_0
tk 8.6.8 hbc83047_0
wheel 0.32.2 py27_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 ha838bed_2
~/code/mne-python read_annotations_csv*
(mne27_xx) ❯ pytest mne/tests/test_annotations.py -k io -s
================================================================= test session starts =================================================================
platform linux2 -- Python 2.7.15, pytest-3.10.0, py-1.7.0, pluggy-0.8.0
rootdir: /home/sik/code/mne-python, inifile: setup.cfg
plugins: cov-2.6.0
collected 25 items
mne/tests/test_annotations.py .........................
============================================================== slowest 20 test durations ==============================================================
0.88s call mne/tests/test_annotations.py::test_crop
0.21s call mne/tests/test_annotations.py::test_crop_more
0.18s call mne/tests/test_annotations.py::test_raw_reject
0.13s call mne/tests/test_annotations.py::test_annotation_filtering
0.10s call mne/tests/test_annotations.py::test_basics
0.04s call mne/tests/test_annotations.py::test_events_from_annot_in_raw_objects
0.02s call mne/tests/test_annotations.py::test_annotation_epoching
0.01s call mne/tests/test_annotations.py::test_annotation_omit
0.01s call mne/tests/test_annotations.py::test_read_brainstorm_annotations
0.01s call mne/tests/test_annotations.py::test_event_id_function_default
0.01s call mne/tests/test_annotations.py::test_io_annotation_csv
0.01s call mne/tests/test_annotations.py::test_events_from_annot_onset_alingment
0.01s call mne/tests/test_annotations.py::test_event_id_function_using_custom_function
(0.00 durations hidden. Use -vv to show these durations.)
============================================================== 25 passed in 1.98 seconds ============================================================== |
mne/annotations.py
Outdated
|
|
||
| def _read_annotations_csv(fname): | ||
| """Write annotations to csv. | ||
| """Read annotations to csv. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to -> from
|
The CI is broken, unrelated to your changes: |
|
(This happens sometimes with conda packages) |
.travis.yml
Outdated
| - os: linux | ||
| env: PYTHON_VERSION=2.7 | ||
| CONDA_DEPENDENCIES="numpy=1.8 scipy=0.12 matplotlib=1.3 pandas=0.13 scikit-learn=0.15 pytest pytest-cov" | ||
| CONDA_DEPENDENCIES="numpy=1.9 scipy=0.14 matplotlib=1.4 pandas=0.16 scikit-learn=0.15 pytest pytest-cov" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh you changed the deps. If you follow the .travis.yml order of operations locally you can probably reproduce the failure.
But we shouldn't do this. We should leave numpy and scipy where they are so we continue to test them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, :)
thats what I was referring here #5699 (comment) Series.dt was only introduced in 0.16 .
So if we want to use it here, we require pandas 0.16.
I've been tempted to just skip, this the braking tests. And wait for the release that we drop 2.7
a8257b6 to
d6394ce
Compare
|
ok I revert the bump of deps versions |
|
|
||
|
|
||
| @requires_pandas | ||
| @requires_version('pandas', '0.16') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it can happen that someone has pandas <0.16 installed, the _check_pandas_installed(strict=True) would work and it would get a weird message.
I'm fine with letting it as it is, and wait for someone to complain in the issue tracker.
The other option is to add a parameter version and raise a mining full error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true but I prefer this than bumping the versions for a so far niche usecase. I can live with the way it is now. We'll drop python 2 after the release anyway.
|
Round trip fails https://travis-ci.org/mne-tools/mne-python/jobs/454185426#L2867 |
|
(on the old dependencies) |
ee812c8 to
760640e
Compare
|
ALL GREEN !!! who does the honors? |
|
@massich can you update |
* Add tutorial * illustrate orig_time * Add orig_time in annotations __repr__ * clean up * clean up * more clean up * Add whatsnew * fix broken link * remove old link reference * pep8 * a deeper explanation of the tripplets * remove meas_date from the explanation, and point where to read about it * more * format tutorial * more * FIX: Minor fixes * pass on text * Add str support to _handle_meas_date * allow passing orig_time in ISO8601 format * Migrate whatever was missing in _handle_meas_date form #5699 * print 6 decimals
|
@massich you have merge conflicts to fix first + what's new |
|
I can update Then we can release, I think! |
|
Actually not quite, we still need #5707 |
760640e to
05b5c79
Compare
|
Okay rebased and push forced @massich can you make sure it looks okay still? |
|
I was on it. But no worries. The result is the same. |
|
It LGTM, coverage complains that |
orig_timeorig_time