Skip to content

Surprising memoized behavior for mutable containers #341

@douglas-raillard-arm

Description

@douglas-raillard-arm

The following piece of code will only print once, although foo is called with arguments that would not compare equal:

from devlib.utils.misc import memoized

@memoized
def foo(l):
    print('hello')

l = []
l.append(3)

foo(l)

l.pop()
l.append(4)

foo(l)

That will also only print once, although C and D are wildly different classes:

class C:
    def __init__(self):
        self.val1 = 42

    def foo(self):
        pass

class D:
    def __init__(self):
        self.val2 = list(range(0, 100))

    def bar(self):
        pass

foo(C())
foo(D())

The root cause is __get_memo_id that only checks the first 32 bytes of the underlying representation of a PyObject, in addition to the id() of the object. IDs can be reallocated, and the same object will not always compare equal to a past state of itself (generally true for any immutable object). Similar issues will arise with classes implementing __eq__ without implementing __hash__, or with naive but nonetheless valid implementations like:

def __hash__(self): return 42

This is allowed by:

The only required property is that objects which compare equal have the same hash value
https://docs.python.org/3/reference/datamodel.html#object.__hash__

Generally speaking, that issue cannot generally be solved, apart from only using memoized with hashable types (which rules out lists). A potential workaround would be to use the repr() of the object, hoping that it will reflect the full state of the object.

Another workaround can be to keep a reference to the parameters, so they are never destructed and we ensure that if objects are not the same, they will get a different ID. This would make memoized relatively useless for types likes lists, where calls like foo([1,2]) would not be memoized anymore.

Note: Python 3 functools.lru_cache avoids that issue by not supporting non-hashable types.
https://docs.python.org/3/library/functools.html#functools.lru_cache

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions