-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version
Description
(edit by @TomAugspurger)
Current output:
In [33]: pd.DataFrame(index=[0, 1]).nunique()
Out[33]:
Empty DataFrame
Columns: []
Index: [0, 1]Expected Output is an empty series:
Out[34]: Series([], dtype: float64)Not sure what the expected dtype of that Series should be... probably object.
original post below:
Code Sample, a copy-pastable example if possible
With Pandas 0.20.3
# create a DataFrame with 3 rows
df = pd.DataFrame({'a': ['A','B','C']})
# lookup unique values for each column, excluding 'a'
unique = df.loc[:, (df.columns != 'a')].nunique()
# this results in an empty Series, the index is also empty
unique.index.tolist()
>>> []
# and
unique[unique == 1].index.tolist()
>>> []With pandas 0.23.3
# create a DataFrame with 3 rows
df = pd.DataFrame({'a': ['A','B','C']})
# lookup unique values for each column, excluding 'a'
unique = df.loc[:, (df.columns != 'a')].nunique()
# this results in an empty Series, but the index is not empty
unique.index.tolist()
>>> [1,2,3]
also:
unique[unique == 1].index.tolist()
>>> [1,2,3]Note:
# if we have don't have an empty df, the behavior of nunique() seems fine:
df = pd.DataFrame({'a': ['A','B','C'], 'b': [1,1,1]})
unique = df.loc[:, (df.columns != 'a')].nunique()
unique[unique == 1]
>>> b 1
>>> dtype: int64
# and
unique[unique == 1].index.tolist()
>>> ['b']Problem description
The change of behavior is a bit disturbing, and seems like it is a bug:
nunique() ends up creating a Series, and it should be a Series of the df columns, but that doesn't seem to be the case here, instead it is picking up the index of the df.
This is likely related to:
I am posting this because in my use case I use the list to drop the columns, but i end up with column names that do not exist in the df
Details
INSTALLED VERSIONS ------------------ commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 17.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.3 pytest: None pip: 10.0.1 setuptools: 39.2.0 Cython: None numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: None sphinx: 1.5.5 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.3.2 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: NoneMetadata
Metadata
Assignees
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version