-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
(From #15425 )
Currently, (non-Multi)Indexes can be indexed with Series indexers. And this actually also applies to MultiIndexes, of which you would be selecting from the first level. Hence, it seems a natural consequence for MultiIndexes to be indexed with DataFrame indexers.
Moreover, once #15434 is fixed, we will have a bi-dimensional object (MultiIndex) which can be indexed with np.arrays... but only one-dimensional ones! This is also strange.
The feature per se is certainly useful. As a simple real world example, I am currently working with a subjects DataFrame to which I must attribute two columns from design, another DataFrame, depending on a group and time columns of subjects, which are also levels of the MultiIndex of design. I would like to just do
subjects[design.columns] = design.loc[subjects[["group", "time"]]]Now, I know this could be solved by .joining the two DataFrames... but this is conceptually more complicated (I even currently ignore whether I can join one DataFrame on columns and the other on index levels... but this is OT), to the point that I'm rather doing:
to_mi = lambda df : df.set_index(list(df.columns)).index
subjects[design.columns] = design.loc[to_mi(subjects[["group", "time"]])]@jorisvandenbossche suggests this feature would add complexity to indexing, "eg, should the column names align on the level names?". I'm personally fine with both answers:
- Yes: then we just use something like
to_miabove (transforming aDataFrameinMultiIndex, and then using it to actually index) - No: then it's really really simple (we just transform the
DataFrameinto tuples - I had actually already done this in Mi indexing #15425 before rolling back)
"Yes" is probably the cleanest answer (possibly together with allowing indexing with bi-dimensional np.arrays, to obtain the equivalent of the "No" answer). In any case, once we decide, I can take care of this.