-
Notifications
You must be signed in to change notification settings - Fork 77
Description
The following piece of code will only print once, although foo is called with arguments that would not compare equal:
from devlib.utils.misc import memoized
@memoized
def foo(l):
print('hello')
l = []
l.append(3)
foo(l)
l.pop()
l.append(4)
foo(l)
That will also only print once, although C and D are wildly different classes:
class C:
def __init__(self):
self.val1 = 42
def foo(self):
pass
class D:
def __init__(self):
self.val2 = list(range(0, 100))
def bar(self):
pass
foo(C())
foo(D())
The root cause is __get_memo_id that only checks the first 32 bytes of the underlying representation of a PyObject, in addition to the id() of the object. IDs can be reallocated, and the same object will not always compare equal to a past state of itself (generally true for any immutable object). Similar issues will arise with classes implementing __eq__ without implementing __hash__, or with naive but nonetheless valid implementations like:
def __hash__(self): return 42
This is allowed by:
The only required property is that objects which compare equal have the same hash value
https://docs.python.org/3/reference/datamodel.html#object.__hash__
Generally speaking, that issue cannot generally be solved, apart from only using memoized with hashable types (which rules out lists). A potential workaround would be to use the repr() of the object, hoping that it will reflect the full state of the object.
Another workaround can be to keep a reference to the parameters, so they are never destructed and we ensure that if objects are not the same, they will get a different ID. This would make memoized relatively useless for types likes lists, where calls like foo([1,2]) would not be memoized anymore.
Note: Python 3 functools.lru_cache avoids that issue by not supporting non-hashable types.
https://docs.python.org/3/library/functools.html#functools.lru_cache