Skip to content

API: Please make ".loc" return type depend on index, not on specific labels #9519

@toobaz

Description

@toobaz

I already mentioned this in #9466 but I think it deserves its own bug report:

In [2]: s = pd.Series([1, 2, 3], index=(1,1,2))

In [3]: s
Out[3]: 
1    1
1    2
2    3
dtype: int64

In [4]: s.loc[1]
Out[4]: 
1    1
1    2
dtype: int64

In [5]: type(s.loc[1])
Out[5]: pandas.core.series.Series

In [6]: s.loc[2]
Out[6]: 3

In [7]: type(s.loc[2])
Out[7]: numpy.int64

Quoting #5678 , "You are selecting out of a duplicated index Series. You could argue that you should get back another Series"

I really think life would be easier if s.loc[2] returned a Series of length one (and DataFrames and Panels behaved similarly). One is assumed to know (and can check in O(1)) if an index is unique, but maybe not if a given label is unique.

With higher dimensions structures it's even more messy because if e.g. .loc[lab_a, lab_b, lab_c] yields a lower dimension structure, but still a pandas structure, you have to find out which dimensions have been lost/kept (i.e. which of the labels were duplicates).

I don't think I have the skills to propose a PR, but I would volunteer to fix the broken tests.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions