-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix view/copy bug in epochs #12117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix view/copy bug in epochs #12117
Conversation
|
Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴 |
for more information, see https://pre-commit.ci
mne/epochs.py
Outdated
| A view on epochs data. | ||
| """ | ||
| warn( | ||
| "The result from get_data() can be a view on the Epochs data, which means that modifying it in-place will modify the data in the Epochs object, making it incorrect." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, this would very much confuse me. How would you know exactly when a copy and when a view is returned? Can we please either clearly document this or make the behavior consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the discussion in #12114 (we decided to implement a copy=True parameter). @pablomainar can you please update this PR accordingly? If these changes are too much, just let us know and we can take over. In any case, the warning should definitely not be issued by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Maybe the best option is to do what has been recently discussed in the issue thread. Add a copy argument to the get_data() method of Raw, Epochs and Evoked and have it default to True and always return a copy. If the user selects to pass False, then wtry to return a view, and if not possible (because of numpy indexing), warn that a copy is being returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user selects to pass False, then wtry to return a view, and if not possible (because of numpy indexing), warn that a copy is being returned.
I think the most common case here is probably for the user "I don't care if it's a copy or not just make it as efficient as possible" often for passing to a sklearn classifier or so, which this proposal does not really allow -- copy=False is almost there but the user will get unnecessary warnings. I guess we could add something like copy=None for this, which would mean "view if possible, copy otherwise". But I think copy=False is actually a good choice for this use case as it keeps us in line with how copy works in other places, e.g.:
-
https://numpy.org/devdocs/reference/generated/numpy.array.html
If true (default), then the object is copied. Otherwise, a copy will only be made if array returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements
-
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html
By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.
-
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
It doesn't say this explicitly, but if you pass it non-
floatvalues then it does copy your data even if you docopy=False:Details
>>> from sklearn.decomposition import PCA >>> import numpy as np >>> x = np.ones((3, 3), int) >>> PCA(copy=False).fit_transform(x) array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]) >>> x array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]) >>> x = np.ones((3, 3), float) >>> PCA(copy=False).fit_transform(x) array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]) >>> x array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
So I think it's a pretty common practice to behave this way.
So my vote would be just write in the Notes something about how to get a guaranteed view on the data with get_data, since I think it's almost a corner case that they actually need it e.g. for inplace modification, and not bother emitting a warning when copy=False and it's not a view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. copy=False should mean what it means in NumPy, i.e., "avoid copying if possible". I also agree that we should not warn if using a view is impossible, though a logger.info message would be fine if people think it's important to emit something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate an info message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok agree! I will work on the PR during the weekend. I will submit a first version for Epochs and if we agree I'll also make the changes for Raw and Evoked for all of them to be consistent.
for more information, see https://pre-commit.ci
Fix mentioned in issue #12114
When Epoch data is preloaded and get_data is called with units argument, now a copy is created and returned. This is to avoid modifying the underlying Epochs._data when multiplying by the units factor.
Any other call to get_data will return a view of _data, which is dangerous and can easily break the Epochs object. Therefore also a warning has been added.
The test_epochs.py has been modify to test that a copy is returned in the case of preloaded and passing units.