-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[MRG] Automatic info['nchan'] and info['ch_names'] #2765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sounds like a promising approach! I hadn't though about making a list-like object and overriding Looks like the CIs are angry with a few different errors. Let me know when it's ready for review. |
|
Also please add to the description or comment with the appropriate "Closes #XXX" so the related issue gets closed when we merge |
|
I am on my phone now. So can't look. But just wondering out loud. Can't we On Monday, January 4, 2016, Eric Larson notifications@github.com wrote:
|
I suspect that will lead to quite a bit of repeated code. We quite often want to know the number of channels Have you read through the related issue? This was covered a bit over there already, but there are a number of use cases that need to be supported, with potential performance implications for recalculating these all the time. And actually doing so will make our code less DRY, and potentially less readable at teh same time. Maintaining access to those fields (while under the hood making them be generated in a non-redundant way) is the idea of the PR. I actually wouldn't mind replacing too much replacing |
mne/io/meas_info.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but why not just an empty string? This might be confused as a channel with the name 'empty' ;)
yes. But that's not the reason we have them there. It's historical, no? To be compatible with MNE-Matlab and so on. I actually prefer
you mean they are computed only once when the
I agree in principle that it should reduce errors. But I'm +0 for the solution at the moment. I'm curious what the tracelog would look like now if you have an error ... is it parseable for a new developer / user? |
I have to ask again -- did you read the related issue? It sounds like what you're suggesting -- creating
Perhaps, but that has the disadvantage of requiring a ton of changes with little gain (namely, a little bit more explicitness about what's happening under the hood). More importantly, though, keep in mind that
The idea is that users hopefully are not modifying these fields, just accessing them. For devs, now you only need to modify |
mne/io/meas_info.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acces -> access
I was following the discussion ... but not every detail. Since you mentioned, I went ahead and read it again :)
no, you suggested that. I'm just saying whatever it is, should be more directly accessible rather than under-the-hood.
but you're making them read-only which is already an API change.
yes, I agree with this in principle. But I'd rather prefer something more explicit. I also don't like the fact that some fields are read-only and some aren't. This will lead to confusion.
I think the current solution will lead to more problems. What I'm suggesting is essentially what Alex suggested here: #2300 (comment) but wasn't discussed. Thinking more about it, I'm now -0.5 :) or maybe, I need to take some python lessons and I'll be +0.5 :) |
Ahh, we had a misunderstanding. I never meant to suggest it as a workable solution here. You originally expressed discontent with the solution, and so I tried to put into code what I thought you were originally suggesting we do. You didn't say (until now) that it wasn't what you were thinking, so I assumed it was...
It is an API change, yes, but hopefully you agree it is API change to a smaller degree than removing them entirely is. I think it's actually (far) more likely that users have read
I see what you're saying. To me the idea of changing to properties to buy this extra explicitness is not worth the code overhaul and potentially breaking people's (reading) code, though. I think we'll have to agree to disagree about this since I think we have different weighting functions for the tradeoffs we've discussed. I can live with changing it to |
|
yes, I think you got my point of view now @Eric89GXL :) I like the property idea. It tells the users that there is some change now instead of silently changing it to read-only. |
mne/io/meas_info.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can save a couple of lines by using numpy:
if len(np.unique(self['ch_names'])) < len(self[ch_names]):
raise ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then I cannot list the duplicate channels in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe you could do the same as in line 1411: duplicates = set([ch for ch in self['ch_names'] if self['ch_names'].count(ch) > 1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe even refactor, so you have a checker function for duplicates.
|
I've spend some days now digging through the code, hunting down all instances where ch_names is used. We cannot easily drop info['ch_names']. It's super useful and used extremely frequently in the code. Any substantial change to it breaks things in nearly every file. Debugging the necessary refactoring is not going to be fun. Not to mention it will break many user scripts as well. Changing the field to read-only is a small change that is doable in terms of refactoring required and probably doesn't break that many user scripts. As I see it, some classes in MNE work as a dictionary. Namely the old C-structures lifted from MNE-C like Info, ForwardOperator and InverseOperator. Some classes work as an object. Namely the ones that were written from scratch like Evoked, SourceEstimate and ICA. In a perfect world, I would like to see everything using normal object syntax (info.ch_names, info.chs, info.nchan, info.sfreq) but that would be a huge change that breaks everything. In a less perfect world I would like to see a strict separation between the two 'styles' of classes. So for me a -1 on adding property methods to Info; having to remember whether a class wants dictionary style access or object style access is hard enough without having to remember this for each field. |
Why is it difficult to simply replace every instance of info['ch_names'] with info.ch_names? Or do I miss something here?
It's already an API change. Won't this also break user scripts -- to make it read only? Also imagine how many users would be trying to use this pattern: ch_names = info['ch_names']
ch_names[5] = 'abc' # or some manipulation for their own purposesNow this is broken. I don't mind the fact that this doesn't work -- but the error trace you get is cryptic. Making it a property will make it clear to the user that this is something they are not supposed to manipulate. Again, why is
So, your main complaint is that it's too much effort? I'm worried that what you're proposing is not much better than what we already have (except maybe some speed gains) in terms of avoiding errors. In fact, it will lead to cryptic error messages. Also, I tried running an example on your branch and it's broken. So, you'll have to check the examples too to make sure you don't break them. |
|
i'd also like to avoid mixing style between dicts keys and attributes.
i am horrified with the changes. The to_list and mapping methods are to me
maybe unecessary.
let's see if the doc builds with this branch and see how much code it
changes in the examples
|
That is not very difficult. What would be difficult is to remove the ch_names field altogether.
Yes, API changes in one of the core data structures are bad. The question is: can it achieve more good than bad? Admittedly, it's a very small API change. If the change leads to much good™ we should do it.
Yes. But when they are manipulating the Info structure itself, they are probably doing something awesome and not some run of the mill analysis. Now either they are doing it correctly and changing would very likely lead to the user not realizing that by modifying their local copy of
This is problematic. I'll see if this can be improved. The error message is our main way of communicating the change to our users.
Because
Since it clearly solves nasty errors of inconsistencies in the Info object, do you mean it will introduce new ways for us and users to make errors? We should not care about speed gains here, as long as speed is acceptable. Looping over channel names is probably never speed critical. |
My concern is that it will lead to frustration if the users are not able to figure out what the error trace mean and they get errors for things which they would expect to work in normal circumstances. We should be mindful of the fact that many beginners are often hesitant to go the mailing list with a simple assignment error which they don't understand. I'm not sure if it will introduce new ways to make errors -- I'll have to see once you have everything working. Right now, tests and examples don't work ... If everyone is happy with what you propose, I'll ask you to document this properly to avoid common pitfalls. |
|
let's make it work, then make it nice / user friendly.
|
I'm with you on that one. |
e82645c to
0c678a6
Compare
[MRG] Automatic info['nchan'] and info['ch_names']
|
thanks for the input @Eric89GXL and @agramfort! |
|
I think merging this one was not a very good idea. It broke lots of private code and I don't see why it was really necessary. |
|
It looks less worse than I first thought. But pleas let's not over-smart our objects in general. I see lots of time spent there for features of incremental advantage. This is not what makes our code easier to maintain. |
|
Maybe @wmvanvliet can try to document the changes. I remember that it was promised, but I don't see any documentation added. I feel any new feature should come with documentation and merging should be held off till the documentation is ready. |
|
It's not such a nontrival assumption that making an info field read-only will not have consequences, if so far the Info API was fully consistent with a dict. We would at least need a deprecation (... again waste of dev time). |
Yeah -- basically you need to do one thing now instead of three: construct
I agree. In this particular case, though, multiple people (@agramfort, @jasmainak, me, @wmvanvliet, @jona-sassenhagen, etc.) weighed in on the pros and cons starting at least six months ago (see #2300), with the bulk of the discussion occurring over the course of about a month and a half (mid December through end of Jan). I had a similar original feeling as you that it might be overkill, but @wmvanvliet convinced me otherwise. It seemed like others converged on this as well (see comments above and in #2300). The bulk of the discussion unfortunately coincided with when you weren't much able to participate in the discussion, which was unfortunate timing. I know that this PR has forced a bit of extra work at your end -- after reading through all the lengthy existing discussion of pros and cons of this particular case, do you still come down - instead of + on it? Do you propose rolling it back? If so, we could actually poll for +/- to see where we land I suppose.
Yes please. @wmvanvliet if you have time it would be very helpful in case others hit a similar problem.
Yes this is probably true, @jasmainak thanks for doing the (thankless) job of nagging people (myself included) about docs. |
|
Sorry but I feel it is an overkill to write-protect certain keys of a dict, On Sun, Feb 7, 2016 at 6:31 PM, Eric Larson notifications@github.com
|
|
Sorry but I feel it is an overkill
Fair enough. I assume you saw most (maybe all?) of these points were
brought up previously in the discussion. I suppose we'll have to disagree
on this point in terms of the relative costs and benefits. Do you propose
then to roll it back, or rather want to voice these opinions for future
changes?
I wish that for API dev we could be much slower and careful.
This one was arguably 6 months in the making (from when it was originally
brought up), but really involved ~1.5 months of extensive/continuous
discussion before being merged. This seems like a reasonable timeline to
me, perhaps even a little bit long. How long do you feel would be
reasonable for this sort of change?
|
|
On Sun, Feb 7, 2016 at 7:42 PM, Eric Larson notifications@github.com
Reply to this email directly or view it on GitHub |
|
This is a code change that certainly makes a few things more complicated, but hopefully has long-term benefits with regards to stability correct? Under the hood, it's more complicated, but when interacting with it, it should be less failure prone. Mandatory docs before merging sounds like a good idea. |
|
I'm not sure if it is clear that usually we have deprecation periods for On Mon, Feb 8, 2016 at 1:30 AM, jona-sassenhagen notifications@github.com
|
Well Info dicts were never really intended to be modified by users in |
|
I've merged this too soon. Can we revert this until we've settled the debate? I understand why it's not immediately obvious why we should add this complexity. However, I feel it is an important change that strengthens one of the core foundations on which we build other stuff. The advantages really outshine the downside of a bit of complexity here. Let me try to explain why: I hope we can agree that it's almost always better to have one authoritative source for a data field. When you start making copies, it becomes a pain to keep all the copies in sync. In all likelihood, if Lets take a look at one of the interesting side effects of having copies. Which field do we trust? Say we want to know the number of channels. The most obvious route would be through When we change the name of a channel, we must update two fields. When we add one or remove one, we must update three. Every. Single. Time. Of course, this has become second nature to us and it honestly is not so much of a hassle. But when accepting PR's from others that have less experience with the codebase, will we catch any mistakes, always? Eliminating a possible source of mistakes is a good thing. Let's talk about complexity from the user's point of view. I'm a proponent of regarding the We have the |
|
I think Denis is in fact more concerned with the process than the outcome right? |
|
Yes, you could say it like that. I think we should just be really careful The other thing is that we should really generally try to avoid complexity I would be personally very happy if we could agree on seeing this one On Mon, Feb 8, 2016 at 11:19 AM, jona-sassenhagen notifications@github.com
|
Rest assured I do think about these things. This was an attempt at deconvoluting things; making the |
|
But I agree with you @dengemann that this PR should be a rare case. As MNE matures, we must try our best to keep it stable. |
|
Yes I have no doubts about that you think about these things, I see that On Mon, Feb 8, 2016 at 1:14 PM, Marijn van Vliet notifications@github.com
|
This is something I agree with. It's not documented and people will stumble on it because they think that the |
|
Maybe the right way to do it would really be to think about a frozen dict On Mon, Feb 8, 2016 at 1:28 PM, Mainak Jas notifications@github.com wrote:
|
|
The |
|
See other issue. I think some version of frozen dict would work. Frozenkeys, updateable.
|
The info object has two redundant fields:
nchanandch_names. They are there as convenience fields. However, whenever thechslist is updated, these fields need to be manually updated as well.This PR makes these fields behave more like properties.
It does so by making
Infoa subclass ofcollections.MutableMapping, which allows it to redefine__setitem__and__getitem__while retaining full compatibility with the default Python dict.The
nchanfield just maps tolen(info['chs']).The
ch_namesfield is a bit more tricky. From the outside, it behaves as a mapping to[ch['ch_name'] for ch in info['chs']]. However, in order not to generate a new list every time the field is accessed, the field is an instance of_ChannelNameList.The
_ChannelNameListclass is a subclass ofcollections.Sequence, thus implementing a list that is read-only, but otherwise fully compatible with a normal Python list. It overwrites the__getitem(self, index)__method to map toinfo['chs'][index]['ch_name']on the fly. It also defines a neatmake_index_map()function that generates a dictionary that maps channel names to integer indices, which can be used to speed things up when many lookups are needed.The rest of the code is updated to no longer set the
nchanandch_namesfields of Info objects.Closes #2300