Skip to content

Conversation

@wmvanvliet
Copy link
Contributor

The info object has two redundant fields: nchan and ch_names. They are there as convenience fields. However, whenever the chs list is updated, these fields need to be manually updated as well.

This PR makes these fields behave more like properties.

It does so by making Info a subclass of collections.MutableMapping, which allows it to redefine __setitem__ and __getitem__ while retaining full compatibility with the default Python dict.

The nchan field just maps to len(info['chs']).

The ch_names field is a bit more tricky. From the outside, it behaves as a mapping to [ch['ch_name'] for ch in info['chs']]. However, in order not to generate a new list every time the field is accessed, the field is an instance of _ChannelNameList.

The _ChannelNameList class is a subclass of collections.Sequence, thus implementing a list that is read-only, but otherwise fully compatible with a normal Python list. It overwrites the __getitem(self, index)__ method to map to info['chs'][index]['ch_name'] on the fly. It also defines a neat make_index_map() function that generates a dictionary that maps channel names to integer indices, which can be used to speed things up when many lookups are needed.

The rest of the code is updated to no longer set the nchan and ch_names fields of Info objects.

Closes #2300

@larsoner
Copy link
Member

larsoner commented Jan 4, 2016

It overwrites the getitem(self, index) method to map to info['chs'][index]['ch_name'] on the fly.

Sounds like a promising approach! I hadn't though about making a list-like object and overriding __getitem__, that shouldn't have too bad overhead unless we do hundreds of thousands of such lookups or something silly like that, and it that case your make_index_map would be more appropriate.

Looks like the CIs are angry with a few different errors. Let me know when it's ready for review.

@larsoner larsoner added this to the 0.12 milestone Jan 4, 2016
@larsoner
Copy link
Member

larsoner commented Jan 4, 2016

Also please add to the description or comment with the appropriate "Closes #XXX" so the related issue gets closed when we merge

@jasmainak
Copy link
Member

I am on my phone now. So can't look. But just wondering out loud. Can't we
simply deprecate these two fields somehow and support them only for file IO?

On Monday, January 4, 2016, Eric Larson notifications@github.com wrote:

Also please add to the description or comment with the appropriate
"Closes #XXX" so the related issue gets closed when we merge


Reply to this email directly or view it on GitHub.<
https://ci3.googleusercontent.com/proxy/eBcSwzKmuOLJefr2XM1Zapz6qVF720kRnzJkQ6TKuwMGxvDBPaSbudkyQ174i_i-P7WAQfbwJ3sWc-koAafiBleBPaeQ-ASx0SN1ex_ko8rkl4gGFLg_I3G5m0grAtvnZM6FwXKIm8sGixC_EYAdI5Za9aGEoA=s0-d-e1-ft#https://github.com/notifications/beacon/APHiomUEQosdNOVC6op3muajkV5KecB_ks5pWocogaJpZM4G-H8k.gif

@larsoner
Copy link
Member

larsoner commented Jan 4, 2016

Can't we simply deprecate these two fields somehow and support them only for file IO?

I suspect that will lead to quite a bit of repeated code. We quite often want to know the number of channels nchan, or have effectively a list of the channels ch_names to accomplish things we want to. So to deprecate them, I assume you're talking about replacing all instances of nchan with len(info['chs']), and instances of info['ch_names'] with [c['ch_name'] for c in info['chs']].

Have you read through the related issue? This was covered a bit over there already, but there are a number of use cases that need to be supported, with potential performance implications for recalculating these all the time. And actually doing so will make our code less DRY, and potentially less readable at teh same time. Maintaining access to those fields (while under the hood making them be generated in a non-redundant way) is the idea of the PR.

I actually wouldn't mind replacing too much replacing info['nchan'] with len(info['chs']) in our code in principle, but this PR actually seems like a bit more DRY solution to me, and leads to fewer changed lines / coding patterns, so I slightly prefer it. For info['ch_names'] it's more complicated, and @wmvanvliet seems to have found a suitable, fairly simple solution that keeps our code readable, makes it more DRY, and should reduce instances of error.

@wmvanvliet wmvanvliet changed the title Automatic info['nchan'] and info['ch_names'] [WIP] Automatic info['nchan'] and info['ch_names'] Jan 4, 2016
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, but why not just an empty string? This might be confused as a channel with the name 'empty' ;)

@jasmainak
Copy link
Member

I suspect that will lead to quite a bit of repeated code. We quite often want to know the number of channels nchan, or have effectively a list of the channels ch_names to accomplish things we want to. So to deprecate them, I assume you're talking about replacing all instances of nchan with len(info['chs']), and instances of info['ch_names'] with [c['ch_name'] for c in info['chs']].

yes. But that's not the reason we have them there. It's historical, no? To be compatible with MNE-Matlab and so on. I actually prefer len(info['chs']) even if it's more redundant because it's explicit. As for the other one, I have no strong feelings.

Have you read through the related issue? This was covered a bit over there already, but there are a number of use cases that need to be supported, with potential performance implications for recalculating these all the time. And actually doing so will make our code less DRY, and potentially less readable at teh same time. Maintaining access to those fields (while under the hood making them be generated in a non-redundant way) is the idea of the PR.

you mean they are computed only once when the info dict is created? I thought the idea was to compute them on the fly when you access them. In that case, I would actually prefer putting that in the code as opposed to under-the-hood magic. What's happening here sounds like it should be an explicit method of an info object.

I actually wouldn't mind replacing too much replacing info['nchan'] with len(info['chs']) in our code in principle, but this PR actually seems like a bit more DRY solution to me, and leads to fewer changed lines / coding patterns, so I slightly prefer it. For info['ch_names'] it's more complicated, and @wmvanvliet seems to have found a suitable, fairly simple solution that keeps our code readable, makes it more DRY, and should reduce instances of error.

I agree in principle that it should reduce errors. But I'm +0 for the solution at the moment. I'm curious what the tracelog would look like now if you have an error ... is it parseable for a new developer / user?

@larsoner
Copy link
Member

larsoner commented Jan 4, 2016

you mean they are computed only once when the info dict is created? I thought the idea was to compute them on the fly when you access them. In that case, I would actually prefer putting that in the code as opposed to under-the-hood magic.

I have to ask again -- did you read the related issue? It sounds like what you're suggesting -- creating ch_names = [ch['ch_name'] for ch in info['chs']] or so -- was IIRC the first idea discussed. The discussion that followed was extensive so better not to recreate it here. Hopefully these changes will make more sense once you've read that issue.

What's happening here sounds like it should be an explicit method of an info object.

Perhaps, but that has the disadvantage of requiring a ton of changes with little gain (namely, a little bit more explicitness about what's happening under the hood). More importantly, though, keep in mind that info['nchan'] and info['ch_names'] have been publically accessible to users and likely to be used by them, so we can't just remove them.

I'm curious what the tracelog would look like now if you have an error ... is it parseable for a new developer / user?

The idea is that users hopefully are not modifying these fields, just accessing them. For devs, now you only need to modify info['chs'] now (and maybe call some update method?) instead of having to update three fields correctly/consistently. Currently, a dev might accidentally only modify info['ch_names'] or info['chs'] without updating all fields, leading to error. This should cut down on instances of those errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acces -> access

@jasmainak
Copy link
Member

I have to ask again -- did you read the related issue?

I was following the discussion ... but not every detail. Since you mentioned, I went ahead and read it again :)

It sounds like what you're suggesting -- creating ch_names = [ch['ch_name'] for ch in info['chs']] or so -- was IIRC the first idea discussed.

no, you suggested that. I'm just saying whatever it is, should be more directly accessible rather than under-the-hood.

Perhaps, but that has the disadvantage of requiring a ton of changes with little gain (namely, a little bit more explicitness about what's happening under the hood). More importantly, though, keep in mind that info['nchan'] and info['ch_names'] have been publically accessible to users and likely to be used by them, so we can't just remove them.

but you're making them read-only which is already an API change.

The idea is that users hopefully are not modifying these fields, just accessing them. For devs, now you only need to modify info['chs'] now (and maybe call some update method?) instead of having to update three fields correctly/consistently.

yes, I agree with this in principle. But I'd rather prefer something more explicit. I also don't like the fact that some fields are read-only and some aren't. This will lead to confusion.

Currently, a dev might accidentally only modify info['ch_names'] or info['chs'] without updating all fields, leading to error. This should cut down on instances of those errors.

I think the current solution will lead to more problems. What I'm suggesting is essentially what Alex suggested here: #2300 (comment) but wasn't discussed. Thinking more about it, I'm now -0.5 :) or maybe, I need to take some python lessons and I'll be +0.5 :)

@larsoner
Copy link
Member

larsoner commented Jan 4, 2016

Can't we simply deprecate these two fields somehow and support them only for file IO?

I suspect that will lead to quite a bit of repeated code... I assume you're talking about replacing all instances of nchan with len(info['chs']), and instances of info['ch_names'] with [c['ch_name'] for c in info['chs']].

no, you suggested that. I'm just saying whatever it is, should be more directly accessible rather than under-the-hood.

Ahh, we had a misunderstanding. I never meant to suggest it as a workable solution here. You originally expressed discontent with the solution, and so I tried to put into code what I thought you were originally suggesting we do. You didn't say (until now) that it wasn't what you were thinking, so I assumed it was...

but you're making them read-only which is already an API change.

It is an API change, yes, but hopefully you agree it is API change to a smaller degree than removing them entirely is. I think it's actually (far) more likely that users have read info['ch_names'] or info['nchan'] and used them in their code than it is that they have written these values (properly).

But I'd rather prefer something more explicit. I also don't like the fact that some fields are read-only and some aren't. This will lead to confusion.

I see what you're saying. To me the idea of changing to properties to buy this extra explicitness is not worth the code overhaul and potentially breaking people's (reading) code, though. I think we'll have to agree to disagree about this since I think we have different weighting functions for the tradeoffs we've discussed. I can live with changing it to info.ch_names (it can be a property instead of a method, since these can be expected to do some minimal things under the hood) and info.nchan if @agramfort and others agree with the property idea, though.

@jasmainak
Copy link
Member

yes, I think you got my point of view now @Eric89GXL :) I like the property idea. It tells the users that there is some change now instead of silently changing it to read-only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can save a couple of lines by using numpy:

if len(np.unique(self['ch_names'])) < len(self[ch_names]):
    raise ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then I cannot list the duplicate channels in the error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe you could do the same as in line 1411: duplicates = set([ch for ch in self['ch_names'] if self['ch_names'].count(ch) > 1])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even refactor, so you have a checker function for duplicates.

@wmvanvliet
Copy link
Contributor Author

I've spend some days now digging through the code, hunting down all instances where ch_names is used. We cannot easily drop info['ch_names']. It's super useful and used extremely frequently in the code.

Any substantial change to it breaks things in nearly every file. Debugging the necessary refactoring is not going to be fun. Not to mention it will break many user scripts as well. Changing the field to read-only is a small change that is doable in terms of refactoring required and probably doesn't break that many user scripts.

As I see it, some classes in MNE work as a dictionary. Namely the old C-structures lifted from MNE-C like Info, ForwardOperator and InverseOperator. Some classes work as an object. Namely the ones that were written from scratch like Evoked, SourceEstimate and ICA. In a perfect world, I would like to see everything using normal object syntax (info.ch_names, info.chs, info.nchan, info.sfreq) but that would be a huge change that breaks everything. In a less perfect world I would like to see a strict separation between the two 'styles' of classes. So for me a -1 on adding property methods to Info; having to remember whether a class wants dictionary style access or object style access is hard enough without having to remember this for each field.

@jasmainak
Copy link
Member

I've spend some days now digging through the code, hunting down all instances where ch_names is used. We cannot easily drop info['ch_names']. It's super useful and used extremely frequently in the code.

Why is it difficult to simply replace every instance of info['ch_names'] with info.ch_names? Or do I miss something here?

Any substantial change to it breaks things in nearly every file. Debugging the necessary refactoring is not going to be fun. Not to mention it will break many user scripts as well.

It's already an API change. Won't this also break user scripts -- to make it read only? Also imagine how many users would be trying to use this pattern:

ch_names = info['ch_names']
ch_names[5] = 'abc' # or some manipulation for their own purposes

Now this is broken. I don't mind the fact that this doesn't work -- but the error trace you get is cryptic. Making it a property will make it clear to the user that this is something they are not supposed to manipulate. Again, why is info['bads'] a list? It's not consistent.

As I see it, some classes in MNE work as a dictionary. Namely the old C-structures lifted from MNE-C like Info, ForwardOperator and InverseOperator. Some classes work as an object. Namely the ones that were written from scratch like Evoked, SourceEstimate and ICA. In a perfect world, I would like to see everything using normal object syntax (info.ch_names, info.chs, info.nchan, info.sfreq) but that would be a huge change that breaks everything. In a less perfect world I would like to see a strict separation between the two 'styles' of classes. So for me a -1 on adding property methods to Info; having to remember whether a class wants dictionary style access or object style access is hard enough without having to remember this for each field.

So, your main complaint is that it's too much effort? I'm worried that what you're proposing is not much better than what we already have (except maybe some speed gains) in terms of avoiding errors. In fact, it will lead to cryptic error messages. Also, I tried running an example on your branch and it's broken. So, you'll have to check the examples too to make sure you don't break them.

@agramfort
Copy link
Member

agramfort commented Jan 5, 2016 via email

@wmvanvliet
Copy link
Contributor Author

Why is it difficult to simply replace every instance of info['ch_names'] with info.ch_names? Or do I miss something here?

That is not very difficult. What would be difficult is to remove the ch_names field altogether.

It's already an API change.

Yes, API changes in one of the core data structures are bad. The question is: can it achieve more good than bad? Admittedly, it's a very small API change. If the change leads to much good™ we should do it.

Won't this also break user scripts -- to make it read only?

Yes. But when they are manipulating the Info structure itself, they are probably doing something awesome and not some run of the mill analysis. Now either they are doing it correctly and changing ch_names and chs in tandem, in which case the fix would be to simply delete the line manipulating ch_names, or they are not doing that in which case they probably have bad, hard to find, bugs in their scripts. The MNE code relies heavily on ch_names, nchan and chs being consistent. For example this:

ch_names = info['ch_names']
ch_names[5] = 'abc' # or some manipulation for their own purposes

would very likely lead to the user not realizing that by modifying their local copy of ch_names, they are also modifying the original info object and therefore should modify nchan and chs as well. This bookkeeping nightmare is exactly what we're trying to fix.

the error trace you get is cryptic

This is problematic. I'll see if this can be improved. The error message is our main way of communicating the change to our users.

why is info['bads'] a list?

Because info['bads'] is meant to be modified by the user. Changing this field does not require changing any other fields. Although you should not put non-existing channels in there.

I'm worried that what you're proposing is not much better than what we already have (except maybe some speed gains) in terms of avoiding errors.

Since it clearly solves nasty errors of inconsistencies in the Info object, do you mean it will introduce new ways for us and users to make errors? We should not care about speed gains here, as long as speed is acceptable. Looping over channel names is probably never speed critical.

@jasmainak
Copy link
Member

Since it clearly solves nasty errors of inconsistencies in the Info object, do you mean it will introduce new ways for us and users to make errors?

My concern is that it will lead to frustration if the users are not able to figure out what the error trace mean and they get errors for things which they would expect to work in normal circumstances. We should be mindful of the fact that many beginners are often hesitant to go the mailing list with a simple assignment error which they don't understand. I'm not sure if it will introduce new ways to make errors -- I'll have to see once you have everything working. Right now, tests and examples don't work ...

If everyone is happy with what you propose, I'll ask you to document this properly to avoid common pitfalls.

@agramfort
Copy link
Member

agramfort commented Jan 5, 2016 via email

@wmvanvliet
Copy link
Contributor Author

My concern is that it will lead to frustration if the users are not able to figure out what the error trace mean

I'm with you on that one.

@wmvanvliet wmvanvliet force-pushed the info_read_only_fields branch from e82645c to 0c678a6 Compare January 25, 2016 09:25
wmvanvliet added a commit that referenced this pull request Feb 3, 2016
[MRG] Automatic info['nchan'] and info['ch_names']
@wmvanvliet wmvanvliet merged commit 896becb into mne-tools:master Feb 3, 2016
@wmvanvliet
Copy link
Contributor Author

thanks for the input @Eric89GXL and @agramfort!

@dengemann
Copy link
Member

I think merging this one was not a very good idea. It broke lots of private code and I don't see why it was really necessary.

@dengemann
Copy link
Member

It looks less worse than I first thought. But pleas let's not over-smart our objects in general. I see lots of time spent there for features of incremental advantage. This is not what makes our code easier to maintain.

@jasmainak
Copy link
Member

Maybe @wmvanvliet can try to document the changes. I remember that it was promised, but I don't see any documentation added. I feel any new feature should come with documentation and merging should be held off till the documentation is ready.

@dengemann
Copy link
Member

It's not such a nontrival assumption that making an info field read-only will not have consequences, if so far the Info API was fully consistent with a dict. We would at least need a deprecation (... again waste of dev time).

@larsoner
Copy link
Member

larsoner commented Feb 7, 2016

It looks less worse than I first thought.

Yeah -- basically you need to do one thing now instead of three: construct info['chs'] properly and you get info['ch_names'] and info['nchan'] for free. If you can't get the same functionality you had before in your code while simultaneously simplifying it, I'd be surprised. We've had errors related to setting these fields improperly before, and @wmvanvliet must have continued hitting them to prompt putting forth this effort. _check_consistency() has worked as a bit of a safeguard, but this one is better IMO.

pleas let's not over-smart our objects in general

I agree. In this particular case, though, multiple people (@agramfort, @jasmainak, me, @wmvanvliet, @jona-sassenhagen, etc.) weighed in on the pros and cons starting at least six months ago (see #2300), with the bulk of the discussion occurring over the course of about a month and a half (mid December through end of Jan). I had a similar original feeling as you that it might be overkill, but @wmvanvliet convinced me otherwise. It seemed like others converged on this as well (see comments above and in #2300).

The bulk of the discussion unfortunately coincided with when you weren't much able to participate in the discussion, which was unfortunate timing. I know that this PR has forced a bit of extra work at your end -- after reading through all the lengthy existing discussion of pros and cons of this particular case, do you still come down - instead of + on it? Do you propose rolling it back? If so, we could actually poll for +/- to see where we land I suppose.

Maybe @wmvanvliet can try to document the changes.

Yes please. @wmvanvliet if you have time it would be very helpful in case others hit a similar problem.

I feel any new feature should come with documentation and merging should be held off till the documentation is ready.

Yes this is probably true, @jasmainak thanks for doing the (thankless) job of nagging people (myself included) about docs.

@dengemann
Copy link
Member

Sorry but I feel it is an overkill to write-protect certain keys of a dict,
it is also an API breach as so far our Info was simply a dict with better
repr.
I see a similar thing for channel lists ... why the heck was this
necessary? Why can't it be a Python list with all its known methods, why do
I need to convert it back into a list?
Initially we had the idea to minimise subclassing and stick with standard
Python containers where it is possible (exceptions, better print output,
repr, or our data classes).
My feeling is that this extends the list of extra cases users have to learn
while it burns lots of dev time and discussion.
I wish that for API dev we could be much slower and careful.

On Sun, Feb 7, 2016 at 6:31 PM, Eric Larson notifications@github.com
wrote:

It looks less worse than I first thought.

Yeah -- basically you need to do one thing now instead of three: construct
info['chs'] properly and you get info['ch_names'] and info['nchan'] for
free. If you can't get the same functionality you had before in your code
while simultaneously simplifying it, I'd be surprised. We've had errors
related to setting these fields improperly before, and @wmvanvliet
https://github.com/wmvanvliet must have continued hitting them to
prompt putting forth this effort. _check_consistency() has worked as a
bit of a safeguard, but this one is better IMO.

pleas let's not over-smart our objects in general

I agree. In this particular case, though, multiple people (@agramfort
https://github.com/agramfort, @jasmainak https://github.com/jasmainak,
me, @wmvanvliet https://github.com/wmvanvliet, @jona-sassenhagen
https://github.com/jona-sassenhagen, etc.) weighed in on the pros and
cons starting at least six months ago (see #2300
#2300), with the bulk of
the discussion occurring over the course of about a month and a half (mid
December through end of Jan). I had a similar original feeling as you that
it might be overkill, but @wm vanvliet https://github.com/wmvanvliet
convinced me otherwise. It seemed like others converged on this as well
(see comments above and in #2300
#2300).

The bulk of the discussion unfortunately coincided with when you weren't
much able to participate in the discussion, which was unfortunate timing. I
know that this PR has forced a bit of extra work at your end -- after
reading through all the lengthy existing discussion of pros and cons of
this particular case, do you still come down - instead of + on it? Do you
propose rolling it back? If so, we could actually poll for +/- to see where
we land I suppose.

Maybe @wmvanvliet https://github.com/wmvanvliet can try to document the
changes.

Yes please. @wmvanvliet https://github.com/wmvanvliet if you have time
it would be very helpful in case others hit a similar problem.

I feel any new feature should come with documentation and merging should
be held off till the documentation is ready.

Yes this is probably true, @jasmainak https://github.com/jasmainak
thanks for doing the (thankless) job of nagging people (myself included)
about docs.


Reply to this email directly or view it on GitHub
#2765 (comment)
.

@larsoner
Copy link
Member

larsoner commented Feb 7, 2016 via email

@dengemann
Copy link
Member

On Sun, Feb 7, 2016 at 7:42 PM, Eric Larson notifications@github.com
wrote:

Sorry but I feel it is an overkill

Fair enough. I assume you saw most (maybe all?) of these points were
brought up previously in the discussion. I suppose we'll have to disagree
on this point in terms of the relative costs and benefits. Do you propose
then to roll it back, or rather want to voice these opinions for future
changes?

we should really document this extensively, and if possible deprecate it
correctly.
It is a significant API change to all MNE data objects that have an info,
we're not talking about a little thing here.
Something is not right if such extensive (think about the number of objects
exposing .info) changes can get merged like that.

I wish that for API dev we could be much slower and careful.

This one was arguably 6 months in the making (from when it was originally
brought up), but really involved ~1.5 months of extensive/continuous
discussion before being merged. This seems like a reasonable timeline to
me, perhaps even a little bit long. How long do you feel would be
reasonable for this sort of change?

Maybe my perception was different, as initially I was almost sure that this
would go nowhere. Then I had lost track and it got merged while I was
absorbed with other things.
I don't want to blame anyone, rather a reminder of certain of our
core-values regarding simple and flat APIs, following general Python design
principles. I actually wish we had been stricter about this in the past
already. Maybe it's not so much a matter of time that you assign for an
issue. For core API dev I have rather the impression that +2 is not enough
for merge. We should also require docs before merging and essentially try
to fetch every core dev's approval, if possible.

Reply to this email directly or view it on GitHub

#2765 (comment)
.

@jona-sassenhagen
Copy link
Contributor

This is a code change that certainly makes a few things more complicated, but hopefully has long-term benefits with regards to stability correct? Under the hood, it's more complicated, but when interacting with it, it should be less failure prone.

Mandatory docs before merging sounds like a good idea.

@dengemann
Copy link
Member

I'm not sure if it is clear that usually we have deprecation periods for
non-trivial API changes that would allow you to still use the old API. This
seems to have been neglected here. And it was not a bug fix.

On Mon, Feb 8, 2016 at 1:30 AM, jona-sassenhagen notifications@github.com
wrote:

This is a code change that certainly makes a few things more complicated,
but hopefully has long-term benefits with regards to stability correct?
Under the hood, it's more complicated, but when interacting with it, it
should be less failure prone.

Mandatory docs before merging sounds like a good idea.


Reply to this email directly or view it on GitHub
#2765 (comment)
.

@larsoner
Copy link
Member

larsoner commented Feb 8, 2016

I'm not sure if it is clear that usually we have deprecation periods for
non-trivial API changes that would allow you to still use the old API.
This
seems to have been neglected here. And it was not a bug fix.

Well Info dicts were never really intended to be modified by users in
practice (we provide many public functions to effectively modify them
instead), though, so I'm not convinced our API-related deprecation
standards apply here. This is meant to behave in (almost) all ways like the
dict used to for 99%+ of users. And the small fraction that are affected
are likely to be advanced users or developers who are also going to be the
best equipped to quickly refractor to compensate. So I'm -1 on a
deprecation cycle for this one.

@wmvanvliet
Copy link
Contributor Author

I've merged this too soon. Can we revert this until we've settled the debate?

I understand why it's not immediately obvious why we should add this complexity. However, I feel it is an important change that strengthens one of the core foundations on which we build other stuff. The advantages really outshine the downside of a bit of complexity here. Let me try to explain why:

I hope we can agree that it's almost always better to have one authoritative source for a data field. When you start making copies, it becomes a pain to keep all the copies in sync. In all likelihood, if Info were a regular Python class, nchan and ch_names would have been implemented as properties in the same way as Epochs.ch_names and Evoked.ch_names are implemented as properties. Of course, if my grandmother had wheels she would be a bike. Info is a dict and modeled after the C code. Can't we just live with a bit of data redundancy?

Lets take a look at one of the interesting side effects of having copies. Which field do we trust? Say we want to know the number of channels. The most obvious route would be through info['nchan'], right?
I invite you to do an ag "len\(.*info\[\'ch_names\'\]\)" through the code. I count 25 instances where len(info['ch_names']) is used to obtain the number of channels. This is not a wrong way to do it, it should yield the same result, but you can ask yourself why we didn't use the more obvious info['nchan']. Either we don't trust the nchan field to be correct, or we like to be more explicit: the number of channels is the length of the array holding the channels. Why len(info['ch_names']) though? It would be technically more correct to use len(info['chs']). It seems that sometimes, we feel that ch_names is the field that is most likely to be correct (probably because it's used a lot) and have unconscious doubts about the nchan and chs fields. Meh, a bit of quirkiness in the code, no big deal.

When we change the name of a channel, we must update two fields. When we add one or remove one, we must update three. Every. Single. Time. Of course, this has become second nature to us and it honestly is not so much of a hassle. But when accepting PR's from others that have less experience with the codebase, will we catch any mistakes, always? Eliminating a possible source of mistakes is a good thing.

Let's talk about complexity from the user's point of view. I'm a proponent of regarding the Info structure as part of the API. I would like to see our users happily hacking away with MNE; leaving the marked path of the examples and tutorials and doing awesome things. Before this PR, the complexity of updating redundant Info fields was the responsibility of the user. When facing the tradeoff between complexity for the user versus complexity in the MNE code, I greatly favor complexity in the MNE code.

We have the _check_consistency function, but using it at every point where the user may have injected a custom Info object is messy. Instead, we've opted to call it every time an Info object is modified in the MNE code (and of course we never forget and never overlook this in a PR). So it's actually more a safeguard for us then for the user. This PR adds a real safeguard for the user. In my opinion, this is worth breaking some of the existing scripts to nag the user to simplify their code.

@jona-sassenhagen
Copy link
Contributor

I think Denis is in fact more concerned with the process than the outcome right?

@dengemann
Copy link
Member

Yes, you could say it like that. I think we should just be really careful
when touching public APIs as the Info. If you're using MNE in many programs
you're happy over anything that does not change. MNE is not so young
anymore, we have many applications and users and it would be a good idea to
assume that whatever is public is in our responsibility and should be
maintained accordingly. People --including myself-- will use it, use cases
emerge from public APIs, whether primarily intended or not. If you then say
it's worth a deprecation cycle or not for whatever reasons, you will have
to assume that somewhere out there other people will be crying and / or
cursing for dealing with unexpected crashes and whatever results from
global public API changes instead of moving on with what they actually
wanted to spend their time on. I cannot really remember any issues caused
by the old API btw. But maybe I'm selectively oblivious.

The other thing is that we should really generally try to avoid complexity
in objects. We already ramped up a good amount of it in the past. The
reason why I'm ranting here is that I hope I can make you think about this
a bit. I'm afraid that if we have PRs like this, we generate precedents and
soon we will start implementing extra API rules and conditional behaviors
wherever a potential bug might be lurking ... to protect our "user" who
very often will be a good Python programmer anyways and has already written
lots of code using the existing API.
Ok, in this case you put forward good reasons, and maybe there is a
"generation problem" to it. Indeed as Marijn suggested, I never looked up
'nchan' and instead would always look up the len of the attribute. 'nchan'
to me was a historical relict that we need to write fiff files. Yes this is
how it used to be. I appreciated attempts to simplify things and it is a
good argument that info should be consistent and used effectively. Maybe
another less complex solution might have been to consider a FrozenDict. It
is a standard lib object and it would have been less complex to
characterise ("no field can be changed"). Of course in that case we would
need an explicit update mechanism, maybe via subclassing or simply as a
function that re-constructs the info explicitly by constructing it new and
copying old values. Keep in mind that any object class you introduce
(because it does not exist in the Python world already) is yet another
thing to learn.

I would be personally very happy if we could agree on seeing this one
rather as a rare event and if we could join our main efforts on
deconvoluting things where possible and focusing on API stability,
documented functionality and performance.

On Mon, Feb 8, 2016 at 11:19 AM, jona-sassenhagen notifications@github.com
wrote:

I think Denis is in fact more concerned with the process than the outcome
right?


Reply to this email directly or view it on GitHub
#2765 (comment)
.

@wmvanvliet
Copy link
Contributor Author

The other thing is that we should really generally try to avoid complexity
in objects. We already ramped up a good amount of it in the past. The
reason why I'm ranting here is that I hope I can make you think about this
a bit.

Rest assured I do think about these things. This was an attempt at deconvoluting things; making the Info object easier to use and removing redundancy all over the code where we modify the channel information in Info objects. This was an attempt at making the API more stable; adding sane safeguards.

@wmvanvliet
Copy link
Contributor Author

But I agree with you @dengemann that this PR should be a rare case. As MNE matures, we must try our best to keep it stable.

@dengemann
Copy link
Member

Yes I have no doubts about that you think about these things, I see that
you address a relevant problem and I really appreciate this. Maybe I should
note that I mean stability in the "does not change" sense, not only in the
"does not crash" sense. The other thing is I don't see how things become
more simple when introducing more conditional behavior and more exceptional
objects. Protecting single keys still seems really weird to me, I cannot
get used to this idea. I actually regret a bit the extent to which we have
been appreciating fancy APIs in the past. I see a clear maintenance burden
associated with this. We have also spent lots of time correcting APIs that
were not well designed in the past. I just think we should avoid fancy API
development on core objects or live with a slow pace and merge late. We
already have a very rich and sometimes complex API. API is this thing that
you want to get as right as possible at the beginning and what causes this
excruciating pain if you have to correct it post-hoc and maintain it.

On Mon, Feb 8, 2016 at 1:14 PM, Marijn van Vliet notifications@github.com
wrote:

But I agree with you @dengemann https://github.com/dengemann that this
PR should be a rare case. As MNE matures, we must try our best to keep it
stable.


Reply to this email directly or view it on GitHub
#2765 (comment)
.

@jasmainak
Copy link
Member

Protecting single keys still seems really weird to me, I cannot get used to this idea.

This is something I agree with. It's not documented and people will stumble on it because they think that the Info is a simple dict.

@dengemann
Copy link
Member

Maybe the right way to do it would really be to think about a frozen dict
approach. Once created, the info is there and won't change. Then use
special functions to update them, blocking all syntax / operator
overloading write access.

On Mon, Feb 8, 2016 at 1:28 PM, Mainak Jas notifications@github.com wrote:

Protecting single keys still seems really weird to me, I cannot get used
to this idea.

This is something I agree with. It's not documented and people will
stumble on it because they think that the Info is a simple dict.


Reply to this email directly or view it on GitHub
#2765 (comment)
.

@wmvanvliet
Copy link
Contributor Author

The Info structure is a textbook example for the usage of an immutable data structure like a frozen dict. It would solve the data redundancy and consistency checking once and for all. I discarded the idea though, because it would break everything. For example, the current way of indicating bad channels as far as I know is info['bads'] += ['EEG034', 'EEG040'].

@dengemann
Copy link
Member

See other issue. I think some version of frozen dict would work. Frozenkeys, updateable.

On 08 Feb 2016, at 17:06, Marijn van Vliet notifications@github.com wrote:

The Info structure is a textbook example for the usage of an immutable data structure like a frozen dict. It would solve the data redundancy and consistency checking once and for all. I discarded the idea though, because it would break everything. For example, the current way of indicating bad channels as far as I know is info['bads'] += ['EEG034', 'EEG040'].


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants