Skip to content

Dynamic class reset state on every deserialization #439

@sklam

Description

@sklam

Reproducer:

# Tested with cloudpickle 1.6.0
from cloudpickle import dumps, loads


class Klass:
    classvar = None

def mutator():
    Klass.classvar = 100

def check():
    print("checking....")
    print(f"   Klass.classvar [{hex(id(Klass))}] = {Klass.classvar}")


def failing_case():
    print("Klass", hex(id(Klass)))
    saved = dumps(Klass)
    mutator()
    check()
    loads(saved)
    check()
    loads(saved)
    check()



if __name__ == '__main__':
    failing_case()

Prints:

Klass 0x7fc698719980
checking....
   Klass.classvar [0x7fc698719980] = 100
checking....
   Klass.classvar [0x7fc698719980] = None
checking....
   Klass.classvar [0x7fc698719980] = None

After each loads(saved), the state in Klass is being reset unexpectedly.

This problem can appear like a tricky race condition in distributed, multi-threaded framework, such as Dask. See example https://gist.github.com/sklam/98e7c98ce909e76a3fa7904754db7bd9.

I created a patch for this in the vendored cloudpickle in Numba: numba/numba#7388. Please let me know if there will be problems with the way I am fixing it. If it is okay, I can submit the PR here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions