Skip to content

Conversation

@massich
Copy link
Contributor

@massich massich commented Nov 7, 2018

  • Without orig_time
  • With orig_time

@massich massich changed the title Add CSV to read annotations [WIP] Add CSV to read annotations Nov 7, 2018
@massich
Copy link
Contributor Author

massich commented Nov 7, 2018

Commenting np.loadtxt with @jorisvandenbossche and @ogrisel they suggested that if its an internal format. We might be better off using json

@larsoner
Copy link
Member

larsoner commented Nov 8, 2018

if its an internal format. We might be better off using json

I think the idea is that this is not just for internal use, but rather so that people can export and have something human readable/editable/re-readable.

@larsoner larsoner added this to the 0.17 milestone Nov 8, 2018
@codecov
Copy link

codecov bot commented Nov 8, 2018

Codecov Report

Merging #5699 into master will increase coverage by 1.9%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #5699     +/-   ##
=========================================
+ Coverage   86.71%   88.61%   +1.9%     
=========================================
  Files         369      369             
  Lines       69319    69463    +144     
  Branches    11663    11678     +15     
=========================================
+ Hits        60108    61557   +1449     
+ Misses       6388     5055   -1333     
- Partials     2823     2851     +28

@agramfort
Copy link
Member

you went the header route while I was suggesting to have onsets as timestamps. If you have onsets as ISO timestamps you can do:

annot = mne.read_annot('annot.csv')
raw.set_annotations(annot)

and raw.annotations.orig_time should be recovered without any custom hack on the csv file.

@larsoner
Copy link
Member

larsoner commented Nov 8, 2018

Yes I think we decided to use the ISO timestamp approach as the optimal solution

@larsoner
Copy link
Member

larsoner commented Nov 8, 2018

... basically @agramfort's solution plus:

You can add the ORIG_TIME line, too, if you want (so you can do time deltas, etc. easily)

I think this gets us the cleanest solution.

@agramfort
Copy link
Member

agramfort commented Nov 8, 2018 via email

@massich
Copy link
Contributor Author

massich commented Nov 8, 2018

based on #5640 (comment) I thought we wanted to avoid the conversion on the onsets.

I also see the confusion of setting stuff in the header, that then MS excel would not understand.

Maybe the solution here proposed is only good for a .txt and then txt and csv are different things.

By the end, my purpose with the PR was to try np.loadtxt. And if we have to use np.loadtxt + datatime I would strongly recommend against it. Pandas handles dates properly.

@agramfort
Copy link
Member

agramfort commented Nov 8, 2018 via email

pytest.param(None, 0, id='None'),
pytest.param(42, 42.0, id='Scalar'),
pytest.param(3.14, 3.14, id='Float'),
# pytest.param((3, 1400000), 3.14, id='Scalar touple')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@massich can I remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had an extra 0.

"""Convert meas_date to seconds."""
"""Convert meas_date to seconds.

if meas_date is string, it should conform to '%Y-%m-%d %H:%M:%S.%f'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Captitalize "If". There are a couple of typos, here's my suggestion:

    """Convert meas_date to seconds.

    If `meas_date` is a string, it should conform to the ISO8601 format, i.e.
    '%Y-%m-%d %H:%M:%S.%f'. Otherwise, this function returns 0. Note that
    ISO8601 allows for ' ' or 'T' as delimiters between date and time.
    """

However, I'm not sure if you meant to write `'\t'` instead of `'T'` in the last sentence - could you please check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the regex does have [ T] so T is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the regex conforms to the standard, which is with ' ' or 'T'. But until now I did only implemted the ' '. Thats why the test says invalid. It should be straight forward to implement if needed but for now YAGNI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you should clarify in the comment that 'T' is not supported ATM.

meas_date = _handle_meas_date(annot.orig_time)
orig_dt = datetime.utcfromtimestamp(meas_date)
content += "# orig_time : %s \n" % orig_dt
content += "# onset, duration, description\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fairly standard to have a header line with the column headers in the first line. All readers should support this header line, so I'd remove the comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment. Are you proposing this:

-    content += "# onset, duration, description\n"
+    content += "onset, duration, description\n"

This complicates the parsing of the .txt using np.loadtxt quite a lot. And by the end if anyone loads a .txt should be able to see it anyway.

I'm not sure about this change. Anyone else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skiprows=1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbrnr what you want is a csv file you can read with pandas. It's not anymore a txt file you can read with np.loadtxt or matlab without understanding the file specifics.

I would therefore stick to the current proposal. Any other opinion anyone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea which is more standard. It would be good to check if pandas still works without the # on the first line. If it does not, then having # on the first line is probably worth it even if it's less standard. (On readers that can support a no-#-first-line-header, the data fields will presumably just appear as "column 1/2/3" or something similar, which seems less than ideal but livable.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the # you'll need to specify column names on read. It will treat the # as a comment and will ignore it as np.loadtxt or a matlad reader of floats in txt files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, basically I wanted a CSV file but without the timestamps. The idea was to create a file that can be easily read by the user, and this means pd.read_csv and not np.loadtxt for me. Either way, I can live with the #, which means I have to manually specify the column names.

This function reads a .fif, .fif.gz, .vrmk, .edf or .set file and makes an
:class:`mne.Annotations` object.
This function reads a .fif, .fif.gz, .vrmk, .edf, .txt or .set file and
makes an :class:`mne.Annotations` object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.csv is missing



def _read_annotations_csv(fname):
"""Write annotations to csv.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read annotations from CSV.

return re.compile(ISO8601).match(candidate_str) is not None


def _read_annotations_txt_parse_header(fname):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a separate function for this? Will this be used elsewhere? Otherwise, I'd include that in _read_annotations_txt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a separate function so that I can get orig_time. I'm silently converging to onset, duration, desc = _read_annotations_xxx to unify read_annotations, avoid import Annotations in mne/io/XXXX/XXX.py and always create the annotations object in read_annotations.

But I've not decided yet what's the best.

pytest.param((3, 140000), 3.14, id='Scalar touple'),
pytest.param('2002-12-03 19:01:11.720100', 1038942071.7201,
id='valid iso8601 string'),
pytest.param('2002-12-03T19:01:11.720100', 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an invalid string? I thought it'd be good to also test date strings with the funny T character separating date from time (which should be valid according to the regex).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only implemented ' '. Thats why I made this test, to show that does not work. It is a valid ISO, but we don't parse it.

@massich
Copy link
Contributor Author

massich commented Nov 12, 2018

Series.dt was introduced in pandas-dev/pandas@c2939ea So we need to update the minimum version to 0.16

This is the set of minimum version I found to work.

conda install numpy=1.9 scipy=0.14 matplotlib=1.4 pandas=0.16 scikit-learn=0.15 pytest pytest-cov

@massich
Copy link
Contributor Author

massich commented Nov 12, 2018

I don't get it. Does anyone see a difference between my system and the CI?

~/code/mne-python read_annotations_csv*
(mne27_xx) ❯ conda list                  
# packages in environment at /home/sik/miniconda3/envs/mne27_xx:
#
# Name                    Version                   Build  Channel
atomicwrites              1.2.1                    py27_0  
attrs                     18.2.0           py27h28b3542_0  
blas                      1.0                         mkl  
ca-certificates           2018.03.07                    0  
cairo                     1.12.18                       6  
certifi                   2018.10.15               py27_0  
coverage                  4.5.1            py27h14c3975_0  
fontconfig                2.11.1                        6  
freetype                  2.5.5                         1  
funcsigs                  1.0.2                    py27_0  
icu                       58.2                 h9c2bf20_1  
intel-openmp              2019.0                      118  
libedit                   3.1.20170329         h6b74fdf_2  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 8.2.0                hdf63c60_1  
libgfortran               1.0                           0  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.17                        0  
libstdcxx-ng              8.2.0                hdf63c60_1  
libxml2                   2.9.8                h26e45fe_1  
matplotlib                1.4.3                np19py27_2  
mkl                       2019.0                      118  
mne                       0.17.dev0                 <pip>
more-itertools            4.3.0                    py27_0  
ncurses                   6.1                  hf484d3e_0  
nose                      1.3.7                    py27_2  
numpy                     1.9.3            py27h1b885b7_7  
numpy-base                1.9.3            py27hdbf6ddf_7  
openssl                   1.1.1                h7b6447c_0  
pandas                    0.16.2               np19py27_0  
pathlib2                  2.3.2                    py27_0  
pip                       18.1                     py27_0  
pixman                    0.32.6                        0  
pluggy                    0.8.0                    py27_0  
py                        1.7.0                    py27_0  
py2cairo                  1.10.0                   py27_2  
pyparsing                 2.0.3                    py27_0  
pyqt                      4.11.4                   py27_4  
pytest                    3.10.0                   py27_0  
pytest-cov                2.6.0                    py27_0  
python                    2.7.15               h9bab390_2  
python-dateutil           2.7.5                    py27_0  
pytz                      2018.7                   py27_0  
qt                        4.8.7                         2  
readline                  7.0                  h7b6447c_5  
scandir                   1.9.0            py27h14c3975_0  
scikit-learn              0.15.2               np19py27_0  
scipy                     0.14.0               np19py27_0  
setuptools                40.5.0                   py27_0  
sip                       4.18                     py27_0  
six                       1.11.0                   py27_1  
sqlite                    3.25.2               h7b6447c_0  
tk                        8.6.8                hbc83047_0  
wheel                     0.32.2                   py27_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               ha838bed_2  

~/code/mne-python read_annotations_csv*
(mne27_xx) ❯ pytest mne/tests/test_annotations.py -k io -s
================================================================= test session starts =================================================================
platform linux2 -- Python 2.7.15, pytest-3.10.0, py-1.7.0, pluggy-0.8.0
rootdir: /home/sik/code/mne-python, inifile: setup.cfg
plugins: cov-2.6.0
collected 25 items                                                                                                                                    

mne/tests/test_annotations.py .........................

============================================================== slowest 20 test durations ==============================================================
0.88s call     mne/tests/test_annotations.py::test_crop
0.21s call     mne/tests/test_annotations.py::test_crop_more
0.18s call     mne/tests/test_annotations.py::test_raw_reject
0.13s call     mne/tests/test_annotations.py::test_annotation_filtering
0.10s call     mne/tests/test_annotations.py::test_basics
0.04s call     mne/tests/test_annotations.py::test_events_from_annot_in_raw_objects
0.02s call     mne/tests/test_annotations.py::test_annotation_epoching
0.01s call     mne/tests/test_annotations.py::test_annotation_omit
0.01s call     mne/tests/test_annotations.py::test_read_brainstorm_annotations
0.01s call     mne/tests/test_annotations.py::test_event_id_function_default
0.01s call     mne/tests/test_annotations.py::test_io_annotation_csv
0.01s call     mne/tests/test_annotations.py::test_events_from_annot_onset_alingment
0.01s call     mne/tests/test_annotations.py::test_event_id_function_using_custom_function

(0.00 durations hidden.  Use -vv to show these durations.)
============================================================== 25 passed in 1.98 seconds ==============================================================


def _read_annotations_csv(fname):
"""Write annotations to csv.
"""Read annotations to csv.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to -> from

@larsoner
Copy link
Member

The CI is broken, unrelated to your changes:

$ python -c "import mne; mne.sys_info()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "mne/__init__.py", line 22, in <module>
    from .utils import (set_log_level, set_log_file, verbose, set_config,
  File "mne/utils.py", line 41, in <module>
    from scipy import linalg, sparse
  File "/home/travis/miniconda/envs/test/lib/python2.7/site-packages/scipy/linalg/__init__.py", line 160, in <module>
    from .basic import *
  File "/home/travis/miniconda/envs/test/lib/python2.7/site-packages/scipy/linalg/basic.py", line 15, in <module>
    from .lapack import get_lapack_funcs
  File "/home/travis/miniconda/envs/test/lib/python2.7/site-packages/scipy/linalg/lapack.py", line 215, in <module>
    from scipy.linalg import _flapack
ImportError: /home/travis/miniconda/envs/test/lib/python2.7/site-packages/scipy/linalg/_flapack.so: undefined symbol: _gfortran_copy_string

@larsoner
Copy link
Member

(This happens sometimes with conda packages)

.travis.yml Outdated
- os: linux
env: PYTHON_VERSION=2.7
CONDA_DEPENDENCIES="numpy=1.8 scipy=0.12 matplotlib=1.3 pandas=0.13 scikit-learn=0.15 pytest pytest-cov"
CONDA_DEPENDENCIES="numpy=1.9 scipy=0.14 matplotlib=1.4 pandas=0.16 scikit-learn=0.15 pytest pytest-cov"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh you changed the deps. If you follow the .travis.yml order of operations locally you can probably reproduce the failure.

But we shouldn't do this. We should leave numpy and scipy where they are so we continue to test them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, :)

thats what I was referring here #5699 (comment) Series.dt was only introduced in 0.16 .

So if we want to use it here, we require pandas 0.16.

I've been tempted to just skip, this the braking tests. And wait for the release that we drop 2.7

massich pushed a commit to massich/mne-python that referenced this pull request Nov 12, 2018
@agramfort agramfort force-pushed the read_annotations_csv branch from a8257b6 to d6394ce Compare November 12, 2018 21:36
@agramfort
Copy link
Member

ok I revert the bump of deps versions



@requires_pandas
@requires_version('pandas', '0.16')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it can happen that someone has pandas <0.16 installed, the _check_pandas_installed(strict=True) would work and it would get a weird message.

I'm fine with letting it as it is, and wait for someone to complain in the issue tracker.

The other option is to add a parameter version and raise a mining full error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true but I prefer this than bumping the versions for a so far niche usecase. I can live with the way it is now. We'll drop python 2 after the release anyway.

@larsoner
Copy link
Member

@larsoner
Copy link
Member

(on the old dependencies)

@larsoner larsoner force-pushed the read_annotations_csv branch from ee812c8 to 760640e Compare November 13, 2018 02:22
@agramfort agramfort changed the title [WIP] Add CSV to read annotations [MRG] Add CSV to read annotations Nov 13, 2018
@massich
Copy link
Contributor Author

massich commented Nov 13, 2018

ALL GREEN !!! who does the honors?

@larsoner
Copy link
Member

@massich can you update whats_new.rst? Then we can merge

jasmainak pushed a commit that referenced this pull request Nov 13, 2018
* Add tutorial

* illustrate orig_time

* Add orig_time in annotations __repr__

* clean up

* clean up

* more clean up

* Add whatsnew

* fix broken link

* remove old link reference

* pep8

* a deeper explanation of the tripplets

* remove meas_date from the explanation, and point where to read about it

* more

* format tutorial

* more

* FIX: Minor fixes

* pass on text

* Add str support to _handle_meas_date

* allow passing orig_time in ISO8601 format

* Migrate whatever was missing in _handle_meas_date form #5699

* print 6 decimals
@agramfort
Copy link
Member

@massich you have merge conflicts to fix first + what's new

@larsoner
Copy link
Member

larsoner commented Nov 13, 2018

I can update whats_new.rst and rebase unless you are on it @massich

Then we can release, I think!

@larsoner
Copy link
Member

Actually not quite, we still need #5707

@larsoner larsoner force-pushed the read_annotations_csv branch from 760640e to 05b5c79 Compare November 13, 2018 16:27
@larsoner
Copy link
Member

Okay rebased and push forced

@massich can you make sure it looks okay still?

@massich
Copy link
Contributor Author

massich commented Nov 13, 2018

I was on it. But no worries. The result is the same.

@massich
Copy link
Contributor Author

massich commented Nov 13, 2018

It LGTM, coverage complains that _write_annotations_csv is not covered. But I checked locally and it is.

@massich massich changed the title [MRG] Add CSV to read annotations [MRG+1] Add CSV to read annotations Nov 13, 2018
@larsoner larsoner merged commit f2be343 into mne-tools:master Nov 13, 2018
@massich massich deleted the read_annotations_csv branch November 16, 2018 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants