Skip to content

Conversation

@massich
Copy link
Contributor

@massich massich commented Dec 17, 2018

Based on python 3 doc indexing with
the wrong type should raise TypeError not IndexError. Then the entire function gets much more simpler:

def __getitem__(self, key):
    """Propagate indexing and slicing to the underlying numpy structure."""
    return (self.onset[key], self.duration[key], self.description[key])

About the current solution:

  • (pros) I like the idea that we are raising the right
  • (cons) the code has 5 branches

"""Test indexing Annotations."""
NUM_ANNOT = 5
EXPECTED_ONSETS = EXPECTED_DURATIONS = [_ for _ in range(NUM_ANNOT)]
EXPECTED_DESCS = [_.__repr__() for _ in range(NUM_ANNOT)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use _ for a param you actually use

@massich
Copy link
Contributor Author

massich commented Dec 17, 2018

based on #5795 (comment) maybe we should return a copy. But I'm not sure.

But I guess that @larsoner was trying to avoid this?

my_wrong_recorded_annotatoins = [d.startswith('foo') for d in raw.annotations.description]
onsets, _, _ = raw.annotations[my_wrong_recorded_annotatoins]
onsets += 10

If you want to do that you should do raw.annotations.onset[my_wrong_recorded_annotations] += 10

@agramfort
Copy link
Member

agramfort commented Dec 17, 2018 via email

@larsoner
Copy link
Member

Yes in MNE we (should, at least) always return a copy with indexing operations on our objects. This makes us different from NumPy, which has inplace and copy rules.

@codecov
Copy link

codecov bot commented Dec 19, 2018

Codecov Report

Merging #5800 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #5800      +/-   ##
==========================================
+ Coverage   88.57%   88.58%   +0.01%     
==========================================
  Files         369      369              
  Lines       68934    69004      +70     
  Branches    11614    11631      +17     
==========================================
+ Hits        61055    61126      +71     
+ Misses       5027     5025       -2     
- Partials     2852     2853       +1

# with the sliced elements.
#
# See the following examples and usages:
plt.close('all')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why matplotlib here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go.

start, stop, step = (0, None, 2)
every_other_annotation = slice(start, stop, step)
for onset, duration, desc in zip(*annot[every_other_annotation]):
print('onset={0} duration={1} desc={2}'.format(onset, duration, desc))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to dumb it down. Here you use advanced string formatting, for loops etc. I would just do:

annotations[:3]   # will return a new Annotations formed by the first 3
annotations[2]   # will return a new Annotations restricted to the 3rd annotation.

and I would point to python indexing doc as it behaves likes for str or lists etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats exactly what does not wat I expected from the tests.

=========================================================================

Events and :class:`~mne.Annotations` are quite similar.
:term:`Events <events>` and :term:`annotations` are quite similar.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

term links work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should

@massich massich changed the title [MRG] Add support for indexing/slicing Annotations objects [WIP] Add support for indexing/slicing Annotations objects Dec 19, 2018
@massich
Copy link
Contributor Author

massich commented Dec 19, 2018

I confused __getitem__ and __iter__ behavior.

out = Annotations(onset=[self.onset[key]],
duration=[self.duration[key]],
description=[self.description[key]],
orig_time=self.orig_time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would return a tuple like

(self.onset[key], self.duration[key], self.description[key])

this would be consistent with the iterator.

or it could be a dict with keys onset, duration, description like when you index a pandas dataframe...

thoughts @jona-sassenhagen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking to @agramfort offline he convinced me to return a dictionary not an Annotations when indexing with a single integer.

It has the advantage that if we ever want to extend the annotations, with more fields we won't break people's code.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we return an Annotations object for the iterator and for slicing/indexing operations? It has the same extend-ability because whatever we add will be attributes of the Annotations class immediately.

This is also basically what we do with the Epochs class, so it's more consistent with that, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mne-tools/contributors ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the proposals are to return:

  • annotations object
  • tuple
  • dict

?

I haven't worked with annotations enough to have an opinion ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, so concretely:

# annot
for a in raw.annotations:
    onset, duration, desc = a.onset[0], a.duration[0], a.desc[0]
    ...
# tuple
for onset, duration, desc in raw.annotations:
    ...
# dict
for a in raw.annotations:
    onset, duration, desc = a['onset'], a['duration'], a['desc']
    ...

The only advantage of an dict approach over iterating over annotations objects I see is that it avoids the [0], but this does not seem worth introducing more API inconsistency to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning dict is more consistent with dataframe behavior and will not break if you start adding a channel name for annotations

Copy link
Member

@larsoner larsoner Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annot will also not break if you start adding a channel name for annotations.

So the question is, do we value internal package consistency (Annotations iterating like Epochs) or consistency with what Pandas does (Annotations iterating like pandas) here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have an iterable container c in Python if you do:

for k in range(len(c)):
    print(c[k])

it is equivalent to:

for k in c:
    print(k)

this works for lists, strs, arrays, etc.

My decision to have epochs[k] return epochs with nave=1 is I think an historical error yet quite convenient.
To get this you should have needed to do epochs[k: k + 1]
So this is not a pandas thing.

What is a pandas thing is the fact that annot[k] would return a dict and not a tuple. As when you do

s = df.iloc[k]

you get s as Series whose semantic matches a dictionary as the column names becomes the index.

does it make any sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the argument you are really trying to make has to do with how conceptualize iteration over an N-dimensional object: is what you get in the loop (N-1)D (like arrays, lists, tuples, pandas, etc.) or ND (like epochs). I can see why this would be useful, even if it's less like what we usually do, so I can live with it.

(FWIW, I don't quite see why the iterable argument helps, since it actually holds already for the epochs object (epochs[k] == e for k, e in enumerate(epochs)) and would hold for the annotations API (annot[k] == a for k, a in annotations). It seems to actually favor "iter yields Annotations" if __getitem__ always returns Annotations; in order for the relationship to hold for "iter yields dict", we'd need Annotations | dict to be returned by __getitem__, depending on whether the result is slice vs int...)

@agramfort
Copy link
Member

agramfort commented Dec 25, 2018 via email

@larsoner
Copy link
Member

larsoner commented Dec 25, 2018

Ahh true I thought it was the other way but did not check.

FWIW the iter/index equivalence argument still seems to go against the proposal, though, right?

@agramfort
Copy link
Member

agramfort commented Dec 27, 2018 via email

@larsoner
Copy link
Member

So in getitem, int gives dict and slice gives Annotations object?

@larsoner
Copy link
Member

Or I guess we could have slice give a dict with values that are 2D ndarray. Then if you want Annotations it's just a reconstruction with double star away

@agramfort
Copy link
Member

agramfort commented Dec 27, 2018 via email

@larsoner
Copy link
Member

I can live with it

else:
return out
key = list(key) if isinstance(key, tuple) else key
return Annotations(onset=self.onset[key],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if key is a slice you should force the copy of onset etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it take a copy by calling the constructor?
I've to check that.

@agramfort
Copy link
Member

agramfort commented Jan 9, 2019 via email

@massich massich changed the title [WIP] Add support for indexing/slicing Annotations objects [MRG] Add support for indexing/slicing Annotations objects Jan 10, 2019
@massich
Copy link
Contributor Author

massich commented Jan 10, 2019

I guess I'm testing something wrong, because I don't see the effect of the copy.

    NUM_ANNOT = 5
    EXPECTED_ONSETS = EXPECTED_DURATIONS = [x for x in range(NUM_ANNOT)]
    EXPECTED_DESCS = [x.__repr__() for x in range(NUM_ANNOT)]

    annot = Annotations(onset=EXPECTED_ONSETS,
                        duration=EXPECTED_DURATIONS,
                        description=EXPECTED_DESCS,
                        orig_time=None)

    print((id(annot.onset),
           id(annot[:1].onset),
           id(annot.onset) == id(annot[:1].onset)))
    print((id(annot.onset[0]),
           id(annot[:1].onset[0]),
           id(annot.onset[0]) == id(annot[:1].onset[0])))
    print(annot.onset[0])    NUM_ANNOT = 5
    EXPECTED_ONSETS = EXPECTED_DURATIONS = [x for x in range(NUM_ANNOT)]
    EXPECTED_DESCS = [x.__repr__() for x in range(NUM_ANNOT)]

    annot = Annotations(onset=EXPECTED_ONSETS,
                        duration=EXPECTED_DURATIONS,
                        description=EXPECTED_DESCS,
                        orig_time=None)

    print((id(annot.onset),
           id(annot[:1].onset),
           id(annot.onset) == id(annot[:1].onset)))
    print((id(annot.onset[0]),
           id(annot[:1].onset[0]),
           id(annot.onset[0]) == id(annot[:1].onset[0])))
    print(annot.onset[0])
    print((id(annot.onset[0]),
           id(annot[:1].onset[0]),
           id(annot.onset[0]) == id(annot[:1].onset[0])))
    annot[:1].onset[0] = 42
    print(annot.onset[0])
    print((id(annot.onset[0]),
           id(annot[:1].onset[0]),
           id(annot.onset[0]) == id(annot[:1].onset[0])))
    print((id(annot.onset[0]),
           id(annot[:1].onset[0]),
           id(annot.onset[0]) == id(annot[:1].onset[0])))
    annot[:1].onset[0] = 42
    print(annot.onset[0])
    print((id(annot.onset[0]),
           id(annot[:1].onset[0]),
           id(annot.onset[0]) == id(annot[:1].onset[0])))

the result is both the same using copy or not.

(139650719663168, 139650338620800, False)
(139650385747040, 139650385747040, True)
0.0
(139650385747040, 139650385747040, True)
0.0
(139650385747040, 139650385747040, True)

The returned list is a different one, but the elements inside are the same. But the change has no effect. I guess I'm doing something wrong. I just added the copy() to make sure.

@massich
Copy link
Contributor Author

massich commented Jan 10, 2019

which is different than this:

xx = np.array(range(3))
xx[:1][0]=42
print(xx)
[42  1  2]

@massich
Copy link
Contributor Author

massich commented Jan 10, 2019

I guess that what I'm saying is that I could not figure out how to write a test that actually breaks if .copy() is not there and passes otherwise.

@larsoner
Copy link
Member

I could not figure out how to write a test that actually breaks if .copy() is not there and passes otherwise.

That's because there is an implicit copy in the np.array(...) calls in the constructor:

https://github.com/mne-tools/mne-python/blob/master/mne/annotations.py#L157

So you shouldn't need to do any .copy() calls if the variables are passed to Annotations(...)

key = list(key) if isinstance(key, tuple) else key
return Annotations(onset=self.onset[key].copy(),
duration=self.duration[key].copy(),
description=self.description[key].copy(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... so no need for any of these copy calls, because a copy is made in Annotations.__init__

@agramfort
Copy link
Member

agramfort commented Jan 10, 2019 via email

@massich
Copy link
Contributor Author

massich commented Jan 10, 2019

I did not see this one.

self.onset = np.array(onset, dtype=float)

Great.

@massich
Copy link
Contributor Author

massich commented Jan 10, 2019

This still remains though

(139650719663168, 139650338620800, False)
(139650385747040, 139650385747040, True)
0.0
(139650385747040, 139650385747040, True)
0.0
(139650385747040, 139650385747040, True)

The vectors do have different id (they are indeed a copy) but the first element of both vectors share the id. Anyway.. I'll let it be. this outsmarts me.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbrnr feel free to merge if you are happy

@cbrnr cbrnr merged commit e2c538b into mne-tools:master Jan 11, 2019
@cbrnr
Copy link
Contributor

cbrnr commented Jan 11, 2019

Thanks @massich!

@cbrnr cbrnr deleted the iter_annotations branch January 11, 2019 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants