API: Categorical.unique() should not drop unused categories

Currently, ``Categorical.unique`` and ``CategoricalIndex.unique`` drop unused categories:

```python
>>> categories = ['very good', 'good', 'neutral', 'bad', 'very bad']
>>> cat = pd.Categorical(['good','good', 'bad', 'bad'], categories=categories, ordered=True)
>>> cat
[good, good, bad, bad]
Categories (5, object): [very good < good < neutral < bad < very bad]
>>> cat.unique()
[good, bad]
Categories (2, object): [good < bad]  # unused categories dropped
```

So, ``.unique()`` both uniquefies and drops unused categories (does two things in one operation)

Often, even if you want to uniquefy values, you still want to control whether to drop unused categories or not. So ``Categorical/CategoricalIndex.unique`` should IMO keep all categories, and categories should be dropped in a seperate action. So, this would be a better API:

```python
>>> cat.unique()
[good, bad]
Categories (5, object): [very good < good < neutral < bad < very bad]    # unused not dropped
```

If you want to drop unused categories, you should do it explicitly like so: ``cat.unique().remove_unused_categories()``.

The proposed API is also faster, as dropping unused categories requires recoding the categories/codes, which is potentially expensive.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Categorical.unique() should not drop unused categories #21648

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: Categorical.unique() should not drop unused categories #21648

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions