-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
1. Inconsistent comparisons versus string 'interval':
In [2]: IntervalDtype() == 'interval'
Out[2]: True
In [3]: IntervalDtype('interval') == 'interval'
Out[3]: False
In [4]: IntervalDtype('int64') == 'interval'
Out[4]: FalseI'd expect all of these to return True, like how CategoricalDtype(*, *) == 'category' always returns True.
2. Inconsistent comparisons versus IntervalDtype(None):
In [5]: IntervalDtype(None) == IntervalDtype('interval')
Out[5]: False
In [6]: IntervalDtype(None) == IntervalDtype('int64')
Out[6]: FalseI'd expect all of these to return True, like how CDT(None, None) == CDT(*, *) always returns True.
3. IntervalDtype.name attribute changes
In [7]: IntervalDtype().name
Out[7]: 'interval'
In [8]: IntervalDtype('interval').name
Out[8]: 'interval[]'
In [9]: IntervalDtype('int64').name
Out[9]: 'interval[int64]'CategoricalDtype.name attribute is always the same:
In [10]: CategoricalDtype(list('abc'), True).name
Out[10]: 'category'
In [11]: CategoricalDtype(list('wxyz'), False).name
Out[11]: 'category'I'd expect IntervalDtype.name to always return 'interval', like how CDT.name always returns 'category'. This makes the code for checking equality against strings (i.e. what I described in 1) simpler. I don't think the behavior of str(IntervalDtype) should change, which is currently the same as IntervalDtype.name, so I'd still have that return strings specifying the subtype.
4. (No longer an issue due to #19022)CategoricalDtype gets cached incorrectly:
In [12]: idt1 = IntervalDtype(CategoricalDtype(list('abc'), True))
In [13]: idt2 = IntervalDtype(CategoricalDtype(list('wxyz'), False))
In [14]: idt2.subtype
Out[14]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=True)This looks to be caused by the caching being done by string representation, and str(CDT(*, *)) always returns 'category':
pandas/pandas/core/dtypes/dtypes.py
Lines 673 to 679 in e1d5a27
| try: | |
| return cls._cache[str(subtype)] | |
| except KeyError: | |
| u = object.__new__(cls) | |
| u.subtype = subtype | |
| cls._cache[str(subtype)] = u | |
| return u |
Can caching be removed entirely for IntervalDtype, or is there some need/advantage that I'm not seeing? Looking at the other dtypes, CategoricalDtype appears to have had the caching code removed, but PeriodDtype and DatetimeTZDtype are using it.