Skip to content

Cloudpickle does not properly register submodule dependencies of a pickled function if the function accesses the submodule via getattr (or equivalent means) #581

@27359794

Description

@27359794

As seen on master:

>>> import cloudpickle
>>> cloudpickle.version
'3.2.0.dev0'
>>> import concurrent.futures
>>> def func():                                              
...     x = getattr(concurrent, 'futures').ThreadPoolExecutor
... 
>>> func()  # can be succesfully called
>>> cloudpickle.dump(func, open('/tmp/dump', 'wb'))

Then in another session:

>>> import cloudpickle
>>> cloudpickle.load(open('/tmp/dump', 'rb'))()  # not callable upon load
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in func
AttributeError: module 'concurrent' has no attribute 'futures'

The reason is that at pickle time, the submodule detection logic only registers that a function needs a submodule x.y.z if the strings y, z all appear in the set of names stored in the function's code object. If the pickled function were to access the submodule via concurrent.futures, then both concurrent and futures appear in the set of names. But in the failing example above, 'futures' is a string and so doesn't appear in the set of names.

We can trigger the failure by replacing the getattr call with say vars(concurrent)['futures'] or concurrent.__dict__['futures'] for the same reason.

Relates to this issue about slow performance when pickling functions that use packages.

One could argue that this access pattern is sufficiently abnormal that cloudpickle doesn't need to handle it properly. But in the related issue, a maintainer asked me to make a new issue for this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions