Skip to content

Conversation

@pablomainar
Copy link
Contributor

Fix mentioned in issue #12114

When Epoch data is preloaded and get_data is called with units argument, now a copy is created and returned. This is to avoid modifying the underlying Epochs._data when multiplying by the units factor.
Any other call to get_data will return a view of _data, which is dangerous and can easily break the Epochs object. Therefore also a warning has been added.
The test_epochs.py has been modify to test that a copy is returned in the case of preloaded and passing units.

@welcome
Copy link

welcome bot commented Oct 18, 2023

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴

mne/epochs.py Outdated
A view on epochs data.
"""
warn(
"The result from get_data() can be a view on the Epochs data, which means that modifying it in-place will modify the data in the Epochs object, making it incorrect."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, this would very much confuse me. How would you know exactly when a copy and when a view is returned? Can we please either clearly document this or make the behavior consistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the discussion in #12114 (we decided to implement a copy=True parameter). @pablomainar can you please update this PR accordingly? If these changes are too much, just let us know and we can take over. In any case, the warning should definitely not be issued by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Maybe the best option is to do what has been recently discussed in the issue thread. Add a copy argument to the get_data() method of Raw, Epochs and Evoked and have it default to True and always return a copy. If the user selects to pass False, then wtry to return a view, and if not possible (because of numpy indexing), warn that a copy is being returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user selects to pass False, then wtry to return a view, and if not possible (because of numpy indexing), warn that a copy is being returned.

I think the most common case here is probably for the user "I don't care if it's a copy or not just make it as efficient as possible" often for passing to a sklearn classifier or so, which this proposal does not really allow -- copy=False is almost there but the user will get unnecessary warnings. I guess we could add something like copy=None for this, which would mean "view if possible, copy otherwise". But I think copy=False is actually a good choice for this use case as it keeps us in line with how copy works in other places, e.g.:

  • https://numpy.org/devdocs/reference/generated/numpy.array.html

    If true (default), then the object is copied. Otherwise, a copy will only be made if array returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements

  • https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html

    By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.

  • https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

    It doesn't say this explicitly, but if you pass it non-float values then it does copy your data even if you do copy=False:

    Details
    >>> from sklearn.decomposition import PCA
    >>> import numpy as np
    >>> x = np.ones((3, 3), int)
    >>> PCA(copy=False).fit_transform(x)
    array([[0., 0., 0.],
           [0., 0., 0.],
           [0., 0., 0.]])
    >>> x
    array([[1, 1, 1],
           [1, 1, 1],
           [1, 1, 1]])
    >>> x = np.ones((3, 3), float)
    >>> PCA(copy=False).fit_transform(x)
    array([[0., 0., 0.],
           [0., 0., 0.],
           [0., 0., 0.]])
    >>> x
    array([[0., 0., 0.],
           [0., 0., 0.],
           [0., 0., 0.]])
    

So I think it's a pretty common practice to behave this way.

So my vote would be just write in the Notes something about how to get a guaranteed view on the data with get_data, since I think it's almost a corner case that they actually need it e.g. for inplace modification, and not bother emitting a warning when copy=False and it's not a view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. copy=False should mean what it means in NumPy, i.e., "avoid copying if possible". I also agree that we should not warn if using a view is impossible, though a logger.info message would be fine if people think it's important to emit something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate an info message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok agree! I will work on the PR during the weekend. I will submit a first version for Epochs and if we agree I'll also make the changes for Raw and Evoked for all of them to be consistent.

@pablomainar pablomainar closed this by deleting the head repository Oct 21, 2023
@drammock drammock mentioned this pull request Oct 23, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants