-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[MRG] Add support for indexing/slicing Annotations objects #5800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mne/tests/test_annotations.py
Outdated
| """Test indexing Annotations.""" | ||
| NUM_ANNOT = 5 | ||
| EXPECTED_ONSETS = EXPECTED_DURATIONS = [_ for _ in range(NUM_ANNOT)] | ||
| EXPECTED_DESCS = [_.__repr__() for _ in range(NUM_ANNOT)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use _ for a param you actually use
421f85a to
a41d96a
Compare
|
based on #5795 (comment) maybe we should return a copy. But I'm not sure. But I guess that @larsoner was trying to avoid this? my_wrong_recorded_annotatoins = [d.startswith('foo') for d in raw.annotations.description]
onsets, _, _ = raw.annotations[my_wrong_recorded_annotatoins]
onsets += 10If you want to do that you should do |
|
+1 to return a copy
… |
|
Yes in MNE we (should, at least) always return a copy with indexing operations on our objects. This makes us different from NumPy, which has inplace and copy rules. |
Codecov Report
@@ Coverage Diff @@
## master #5800 +/- ##
==========================================
+ Coverage 88.57% 88.58% +0.01%
==========================================
Files 369 369
Lines 68934 69004 +70
Branches 11614 11631 +17
==========================================
+ Hits 61055 61126 +71
+ Misses 5027 5025 -2
- Partials 2852 2853 +1 |
tutorials/plot_object_annotations.py
Outdated
| # with the sliced elements. | ||
| # | ||
| # See the following examples and usages: | ||
| plt.close('all') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why matplotlib here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go.
tutorials/plot_object_annotations.py
Outdated
| start, stop, step = (0, None, 2) | ||
| every_other_annotation = slice(start, stop, step) | ||
| for onset, duration, desc in zip(*annot[every_other_annotation]): | ||
| print('onset={0} duration={1} desc={2}'.format(onset, duration, desc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to dumb it down. Here you use advanced string formatting, for loops etc. I would just do:
annotations[:3] # will return a new Annotations formed by the first 3
annotations[2] # will return a new Annotations restricted to the 3rd annotation.
and I would point to python indexing doc as it behaves likes for str or lists etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats exactly what does not wat I expected from the tests.
| ========================================================================= | ||
|
|
||
| Events and :class:`~mne.Annotations` are quite similar. | ||
| :term:`Events <events>` and :term:`annotations` are quite similar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
term links work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should
|
I confused |
a8c7bdc to
6e29f94
Compare
6e29f94 to
da741a6
Compare
mne/annotations.py
Outdated
| out = Annotations(onset=[self.onset[key]], | ||
| duration=[self.duration[key]], | ||
| description=[self.description[key]], | ||
| orig_time=self.orig_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would return a tuple like
(self.onset[key], self.duration[key], self.description[key])
this would be consistent with the iterator.
or it could be a dict with keys onset, duration, description like when you index a pandas dataframe...
thoughts @jona-sassenhagen ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking to @agramfort offline he convinced me to return a dictionary not an Annotations when indexing with a single integer.
It has the advantage that if we ever want to extend the annotations, with more fields we won't break people's code.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we return an Annotations object for the iterator and for slicing/indexing operations? It has the same extend-ability because whatever we add will be attributes of the Annotations class immediately.
This is also basically what we do with the Epochs class, so it's more consistent with that, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mne-tools/contributors ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the proposals are to return:
- annotations object
- tuple
- dict
?
I haven't worked with annotations enough to have an opinion ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, so concretely:
# annot
for a in raw.annotations:
onset, duration, desc = a.onset[0], a.duration[0], a.desc[0]
...
# tuple
for onset, duration, desc in raw.annotations:
...
# dict
for a in raw.annotations:
onset, duration, desc = a['onset'], a['duration'], a['desc']
...
The only advantage of an dict approach over iterating over annotations objects I see is that it avoids the [0], but this does not seem worth introducing more API inconsistency to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returning dict is more consistent with dataframe behavior and will not break if you start adding a channel name for annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annot will also not break if you start adding a channel name for annotations.
So the question is, do we value internal package consistency (Annotations iterating like Epochs) or consistency with what Pandas does (Annotations iterating like pandas) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have an iterable container c in Python if you do:
for k in range(len(c)):
print(c[k])
it is equivalent to:
for k in c:
print(k)
this works for lists, strs, arrays, etc.
My decision to have epochs[k] return epochs with nave=1 is I think an historical error yet quite convenient.
To get this you should have needed to do epochs[k: k + 1]
So this is not a pandas thing.
What is a pandas thing is the fact that annot[k] would return a dict and not a tuple. As when you do
s = df.iloc[k]
you get s as Series whose semantic matches a dictionary as the column names becomes the index.
does it make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the argument you are really trying to make has to do with how conceptualize iteration over an N-dimensional object: is what you get in the loop (N-1)D (like arrays, lists, tuples, pandas, etc.) or ND (like epochs). I can see why this would be useful, even if it's less like what we usually do, so I can live with it.
(FWIW, I don't quite see why the iterable argument helps, since it actually holds already for the epochs object (epochs[k] == e for k, e in enumerate(epochs)) and would hold for the annotations API (annot[k] == a for k, a in annotations). It seems to actually favor "iter yields Annotations" if __getitem__ always returns Annotations; in order for the relationship to hold for "iter yields dict", we'd need Annotations | dict to be returned by __getitem__, depending on whether the result is slice vs int...)
|
it does not hold for epochs class. if you iterate over epochs you get
arrays but when you do epochs[k] you get an epochs object
|
|
Ahh true I thought it was the other way but did not check. FWIW the iter/index equivalence argument still seems to go against the proposal, though, right? |
|
Ahh true I thought it was the other way but did not check.
FWIW the iter/index equivalence argument still send to go against the proposal, though, right?
I don't think so. I propose to get a dictionary when you iter or
access k'th element.
|
|
So in getitem, int gives dict and slice gives Annotations object? |
|
Or I guess we could have slice give a dict with values that are 2D ndarray. Then if you want Annotations it's just a reconstruction with double star away |
|
slice gets you Annotations and int gets you a dict
clear?
… |
|
I can live with it |
| else: | ||
| return out | ||
| key = list(key) if isinstance(key, tuple) else key | ||
| return Annotations(onset=self.onset[key], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if key is a slice you should force the copy of onset etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it take a copy by calling the constructor?
I've to check that.
|
no copy when it's a slice
… |
|
I guess I'm testing something wrong, because I don't see the effect of the copy. NUM_ANNOT = 5
EXPECTED_ONSETS = EXPECTED_DURATIONS = [x for x in range(NUM_ANNOT)]
EXPECTED_DESCS = [x.__repr__() for x in range(NUM_ANNOT)]
annot = Annotations(onset=EXPECTED_ONSETS,
duration=EXPECTED_DURATIONS,
description=EXPECTED_DESCS,
orig_time=None)
print((id(annot.onset),
id(annot[:1].onset),
id(annot.onset) == id(annot[:1].onset)))
print((id(annot.onset[0]),
id(annot[:1].onset[0]),
id(annot.onset[0]) == id(annot[:1].onset[0])))
print(annot.onset[0]) NUM_ANNOT = 5
EXPECTED_ONSETS = EXPECTED_DURATIONS = [x for x in range(NUM_ANNOT)]
EXPECTED_DESCS = [x.__repr__() for x in range(NUM_ANNOT)]
annot = Annotations(onset=EXPECTED_ONSETS,
duration=EXPECTED_DURATIONS,
description=EXPECTED_DESCS,
orig_time=None)
print((id(annot.onset),
id(annot[:1].onset),
id(annot.onset) == id(annot[:1].onset)))
print((id(annot.onset[0]),
id(annot[:1].onset[0]),
id(annot.onset[0]) == id(annot[:1].onset[0])))
print(annot.onset[0])
print((id(annot.onset[0]),
id(annot[:1].onset[0]),
id(annot.onset[0]) == id(annot[:1].onset[0])))
annot[:1].onset[0] = 42
print(annot.onset[0])
print((id(annot.onset[0]),
id(annot[:1].onset[0]),
id(annot.onset[0]) == id(annot[:1].onset[0])))
print((id(annot.onset[0]),
id(annot[:1].onset[0]),
id(annot.onset[0]) == id(annot[:1].onset[0])))
annot[:1].onset[0] = 42
print(annot.onset[0])
print((id(annot.onset[0]),
id(annot[:1].onset[0]),
id(annot.onset[0]) == id(annot[:1].onset[0])))the result is both the same using copy or not. The returned list is a different one, but the elements inside are the same. But the change has no effect. I guess I'm doing something wrong. I just added the |
|
which is different than this: xx = np.array(range(3))
xx[:1][0]=42
print(xx) |
|
I guess that what I'm saying is that I could not figure out how to write a test that actually breaks if |
That's because there is an implicit copy in the https://github.com/mne-tools/mne-python/blob/master/mne/annotations.py#L157 So you shouldn't need to do any |
mne/annotations.py
Outdated
| key = list(key) if isinstance(key, tuple) else key | ||
| return Annotations(onset=self.onset[key].copy(), | ||
| duration=self.duration[key].copy(), | ||
| description=self.description[key].copy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... so no need for any of these copy calls, because a copy is made in Annotations.__init__
|
good catch !
|
|
I did not see this one. self.onset = np.array(onset, dtype=float)Great. |
|
This still remains though The vectors do have different id (they are indeed a copy) but the first element of both vectors share the id. Anyway.. I'll let it be. this outsmarts me. |
larsoner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbrnr feel free to merge if you are happy
|
Thanks @massich! |
Based on python 3 doc indexing with
the wrong type should raise
TypeErrornotIndexError. Then the entire function gets much more simpler:About the current solution: