Skip to content

Unexpected results when filtering with .isin (some fields contain python datastructures) #20883

@atc0m

Description

@atc0m

Code Sample, a copy-pastable example if possible

import pandas as pd

data = [
    {'id': 1, 'content': [{'values': 3}]},
    {'id': 2, 'content': u'whats going on'},
    {'id': 3, 'content': u'whaaaaaaaaat'},
    {'id': 4, 'content': [{'values': 4}]}
]

if __name__ == '__main__':
    df = pd.DataFrame.from_dict(data)
    v = [u'whats going on', u'whaaaaaat']
    print df[df.content.isin(v)]
    v = [u'whats going on', u'what']
    print df[df.content.isin(v)]

Problem description

The first print statement executes sucessfully, filtering to the single row 'id': 2, 'content': u'whats going on', however the second filter throws an error even though the only difference is the length of one of the elements in the list v.

Output for the code snippet above:

          content  id
1  whats going on   2
/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py:473: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  return max(0, -(-(self._stop - self._start) // self._step))
Traceback (most recent call last):
  File "test_pandas.py", line 15, in <module>
    print df[df.content.isin(v)]
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 2804, in isin
    return self._constructor(result, index=self.index).__finalize__(self)
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 264, in __init__
    raise_cast_failure=True)
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 3269, in _sanitize_array
    if len(subarr) != len(index) and len(subarr) == 1:
  File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py", line 473, in __len__
    return max(0, -(-(self._stop - self._start) // self._step))
TypeError: unhashable type: 'list'
Details INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: None.None

pandas: 0.22.0
pytest: 2.9.2
pip: 9.0.1
setuptools: 36.4.0
Cython: None
numpy: 1.14.2
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions