Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally use pickle5 (Redux) #370

Merged
merged 98 commits into from
Jul 1, 2020

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented May 20, 2020

Fixes #179

Thanks to @pierreglaser's work in PR ( #368 ), this is a rebased/simplified version of PR ( #364 ). Otherwise is the same in that it tries to use pickle5 on older Python versions to support out-of-band buffers.

@jakirkham jakirkham force-pushed the opt_use_pickle5_redux branch 2 times, most recently from f356dbd to 6635ce1 Compare May 20, 2020 05:35
@codecov
Copy link

codecov bot commented May 20, 2020

Codecov Report

Merging #370 into master will increase coverage by 0.15%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #370      +/-   ##
==========================================
+ Coverage   91.25%   91.40%   +0.15%     
==========================================
  Files           2        3       +1     
  Lines         629      640      +11     
  Branches      132      134       +2     
==========================================
+ Hits          574      585      +11     
  Misses         34       34              
  Partials       21       21              
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 86.44% <100.00%> (ø)
cloudpickle/cloudpickle_fast.py 96.64% <100.00%> (+0.01%) ⬆️
cloudpickle/compat.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 938553f...f17b31a. Read the comment docs.

@jakirkham
Copy link
Member Author

jakirkham commented May 20, 2020

We probably need the fix in PR ( pitrou/pickle5-backport#16 ). Asking about a patch release with that change in issue ( pitrou/pickle5-backport#17 ).

Edit: This has been released as pickle5 version 0.0.10. Have updated the requirement here accordingly.

@jakirkham jakirkham force-pushed the opt_use_pickle5_redux branch 8 times, most recently from 8a6cb2f to 4e66e39 Compare May 21, 2020 21:14
Tries to use `pickle5` for `pickle` if available on older Python
versions.
@jakirkham
Copy link
Member Author

Alright I think I've cleared the out-of-band related errors and have managed to get Python 3.7 to pass. 🎉

There still seem to be some other errors for earlier Python versions. The errors seem to be things like pickling ABCs and Unions. Not sure how we address those. If anyone has thoughts, they would be most welcome 🙂

Has a fix for the module name `PickleBuffer` is in.
As NumPy 1.18.5 supports `pickle5` on Python 3.5 and we require that
NumPy version for testing, include this out-of-band buffers test on
Python 3.5 as well.
@jakirkham jakirkham added the ci downstream Signal the CI to run the test suite of all registered cloudpickle downstream projects. label Jun 30, 2020
@jakirkham jakirkham changed the title WIP: Optionally use pickle5 (Redux) Optionally use pickle5 (Redux) Jun 30, 2020
@jakirkham
Copy link
Member Author

@pierreglaser, do you know what we need to do to run downstream CI tests here? Tried adding the label and restarting, but maybe something else is needed?

@pierreglaser
Copy link
Member

pushing an empty commit should do the trick :)

@@ -521,7 +521,7 @@ def test_module_locals_behavior(self):
pickled_func_path = os.path.join(self.tmpdir, 'local_func_g.pkl')

child_process_script = '''
import pickle
from cloudpickle.compat import pickle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using cloudpickle.compat.pickle or pickle should be equivalent in the load case right?

Copy link
Member

@pierreglaser pierreglaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

A small benchmark with and without pickle5:

(cloudpickle_py37)  ~/repos/cloudpickle  (opt_use_pickle5_redux)❯_ pip show pickle5
WARNING: Package(s) not found: pickle5
(cloudpickle_py37)  ~/repos/cloudpickle  (opt_use_pickle5_redux)❯_ python -m timeit 'import cloudpickle; _ = cloudpickle.dumps(list(range(int(1e6))))'
1 loop, best of 5: 1.21 sec per loop
(cloudpickle_py37)  ~/repos/cloudpickle  (opt_use_pickle5_redux)❯_ pip install pickle5
Processing /home/pierreglaser/.cache/pip/wheels/7e/6a/00/67136a90d6aca437d806d1d3cedf98106e840c97a3e5188198/pickle5-0.0.11-cp37-cp37m-linux_x86_64.whl
Installing collected packages: pickle5
Successfully installed pickle5-0.0.11
WARNING: You are using pip version 20.0.2; however, version 20.1.1 is available.
You should consider upgrading via the '/home/pierreglaser/.virtualenvs/cloudpickle_py37/bin/python3.7 -m pip install --upgrade pip' command.
(cloudpickle_py37)  ~/repos/cloudpickle  (opt_use_pickle5_redux)❯_ python -m timeit 'import cloudpickle; _ = cloudpickle.dumps(list(range(int(1e6))))'
5 loops, best of 5: 46.7 msec per loop

cloudpickle/cloudpickle_fast.py Outdated Show resolved Hide resolved
# To be able to test tornado coroutines
tornado
# To be able to test numpy specific things
# but do not build numpy from source on Python nightly
numpy; python_version <= '3.8'
numpy >=1.18.5; python_version <= '3.8'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we don't have to skip numpy + Python 3.5 after all? nice :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed :) PR ( numpy/numpy#16439 ) added Python 3.5 support

@@ -1008,7 +1008,8 @@ def example():

# choose "subprocess" rather than "multiprocessing" because the latter
# library uses fork to preserve the parent environment.
command = ("import pickle, base64; "
command = ("import base64; "
"from cloudpickle.compat import pickle; "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: cloudpickle.compat.pickle and pickle are interchangeable in this situation right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both of these test cases, we are using the highest supported protocol with cloudpickle, which is protocol 5, to produce the pickled data. So we wound up needing to change these since we are trying to load the pickled data and then need to use pickle5 when pickle doesn't have protocol 5 support. We could alternatively skip the test, restrict the protocol based on pickle.HIGHEST_PROTOCOL, or something else. We could also leave as-is. Some options to consider 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I agree. Thanks.

Co-authored-by: Pierre Glaser <pierreglaser@msn.com>
Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ogrisel ogrisel merged commit 508cdf1 into cloudpipe:master Jul 1, 2020
@pierreglaser
Copy link
Member

@jakirkham I'm going to release now, feel free to voice any concern/remarks :)

@jakirkham
Copy link
Member Author

Awesome! Thank you both 😀

No concerns from me. Was thinking the same thing in fact 😉

@jakirkham jakirkham deleted the opt_use_pickle5_redux branch July 1, 2020 17:20
@jakirkham
Copy link
Member Author

Just as an FYI, @suquark this is in cloudpickle 1.5.0

@jorisvandenbossche
Copy link

I encountered a failing dask test which I suppose is related to this: dask/dask#6374. It happens when you have pickle5 installed with the latest cloudpickle. The pickling roundtrip of a dask dataframe seems to no longer work (but I am not very knowledgeable about pickling/cloudpickle/pickle5, so not directly sure what the cause is / whether it's a bug).

@pierreglaser
Copy link
Member

Duplicating my comment from dask/dask#6374:

The reason why this test is failing is that cp_dumps relies on pickle5 and thus uses pickle protocol 5 when dumping df. On the countrary, loads is the stdlib pickle.loads function, which, on Python 3.7 does not implement pickle protocol 5 yet, hence the error. So pickle files generated using cp_dumps when pickle5 is installed should/can only be read using pickle5.loads/cloudpickle.loads. This change was not avoidable to have pickle5 support on Python 3.6/7.

Although this is a breaking change:

But I'd be glad to hear your concerns.

@suquark
Copy link
Contributor

suquark commented Jul 8, 2020

@jakirkham thanks for notifying me about the release!

@jakirkham
Copy link
Member Author

Of course! 😄

Also the Dask test failure should be fixed with PR ( dask/dask#6379 ).

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Apr 4, 2022
2.0.0
=====

- Python 3.5 is no longer supported.

- Support for registering modules to be serialised by value. This allows code
  defined in local modules to be serialised and executed remotely without those
  local modules installed on the remote machine.
  ([PR #417](cloudpipe/cloudpickle#417))

- Fix a side effect altering dynamic modules at pickling time.
  ([PR #426](cloudpipe/cloudpickle#426))

- Support for pickling type annotations on Python 3.10 as per [PEP 563](
  https://www.python.org/dev/peps/pep-0563/)
  ([PR #400](cloudpipe/cloudpickle#400))

- Stricter parametrized type detection heuristics in
  _is_parametrized_type_hint to limit false positives.
  ([PR #409](cloudpipe/cloudpickle#409))

- Support pickling / depickling of OrderedDict KeysView, ValuesView, and
  ItemsView, following similar strategy for vanilla Python dictionaries.
  ([PR #423](cloudpipe/cloudpickle#423))

- Suppressed a source of non-determinism when pickling dynamically defined
  functions and handles the deprecation of co_lnotab in Python 3.10+.
  ([PR #428](cloudpipe/cloudpickle#428))

1.6.0
=====

- `cloudpickle`'s pickle.Pickler subclass (currently defined as
  `cloudpickle.cloudpickle_fast.CloudPickler`) can and should now be accessed
  as `cloudpickle.Pickler`. This is the only officially supported way of
  accessing it.
  ([issue #366](cloudpipe/cloudpickle#366))

- `cloudpickle` now supports pickling `dict_keys`, `dict_items` and
  `dict_values`.
  ([PR #384](cloudpipe/cloudpickle#384))

1.5.0
=====

- Fix a bug causing cloudpickle to crash when pickling dynamically created,
  importable modules.
  ([issue #360](cloudpipe/cloudpickle#354))

- Add optional dependency on `pickle5` to get improved performance on
  Python 3.6 and 3.7.
  ([PR #370](cloudpipe/cloudpickle#370))

- Internal refactoring to ease the use of `pickle5` in cloudpickle
  for Python 3.6 and 3.7.
  ([PR #368](cloudpipe/cloudpickle#368))

1.4.1
=====

- Fix incompatibilities between cloudpickle 1.4.0 and Python 3.5.0/1/2
  introduced by the new support of cloudpickle for pickling typing constructs.
  ([issue #360](cloudpipe/cloudpickle#360))

- Restore compat with loading dynamic classes pickled with cloudpickle
  version 1.2.1 that would reference the `types.ClassType` attribute.
  ([PR #359](cloudpipe/cloudpickle#359))

1.4.0
=====

**This version requires Python 3.5 or later**

- cloudpickle can now all pickle all constructs from the ``typing`` module
  and the ``typing_extensions`` library in Python 3.5+
  ([PR #318](cloudpipe/cloudpickle#318))

- Stop pickling the annotations of a dynamic class for Python < 3.6
  (follow up on #276)
  ([issue #347](cloudpipe/cloudpickle#347))

- Fix a bug affecting the pickling of dynamic `TypeVar` instances on Python 3.7+,
  and expand the support for pickling `TypeVar` instances (dynamic or non-dynamic)
  to Python 3.5-3.6 ([PR #350](cloudpipe/cloudpickle#350))

- Add support for pickling dynamic classes subclassing `typing.Generic`
  instances on Python 3.7+
  ([PR #351](cloudpipe/cloudpickle#351))

1.3.0
=====

- Fix a bug affecting dynamic modules occuring with modified builtins
  ([issue #316](cloudpipe/cloudpickle#316))

- Fix a bug affecting cloudpickle when non-modules objects are added into
  sys.modules
  ([PR #326](cloudpipe/cloudpickle#326)).

- Fix a regression in cloudpickle and python3.8 causing an error when trying to
  pickle property objects.
  ([PR #329](cloudpipe/cloudpickle#329)).

- Fix a bug when a thread imports a module while cloudpickle iterates
  over the module list
  ([PR #322](cloudpipe/cloudpickle#322)).

- Add support for out-of-band pickling (Python 3.8 and later).
  https://docs.python.org/3/library/pickle.html#example
  ([issue #308](cloudpipe/cloudpickle#308))

- Fix a side effect that would redefine `types.ClassTypes` as `type`
  when importing cloudpickle.
  ([issue #337](cloudpipe/cloudpickle#337))

- Fix a bug affecting subclasses of slotted classes.
  ([issue #311](cloudpipe/cloudpickle#311))

- Dont pickle the abc cache of dynamically defined classes for Python 3.6-
  (This was already the case for python3.7+)
  ([issue #302](cloudpipe/cloudpickle#302))

1.2.2
=====

- Revert the change introduced in
  ([issue #276](cloudpipe/cloudpickle#276))
  attempting to pickle functions annotations for Python 3.4 to 3.6. It is not
  possible to pickle complex typing constructs for those versions (see
  [issue #193]( cloudpipe/cloudpickle#193))

- Fix a bug affecting bound classmethod saving on Python 2.
  ([issue #288](cloudpipe/cloudpickle#288))

- Add support for pickling "getset" descriptors
  ([issue #290](cloudpipe/cloudpickle#290))

1.2.1
=====

- Restore (partial) support for Python 3.4 for downstream projects that have
  LTS versions that would benefit from cloudpickle bug fixes.

1.2.0
=====

- Leverage the C-accelerated Pickler new subclassing API (available in Python
  3.8) in cloudpickle. This allows cloudpickle to pickle Python objects up to
  30 times faster.
  ([issue #253](cloudpipe/cloudpickle#253))

- Support pickling of classmethod and staticmethod objects in python2.
  arguments. ([issue #262](cloudpipe/cloudpickle#262))

- Add support to pickle type annotations for Python 3.5 and 3.6 (pickling type
  annotations was already supported for Python 3.7, Python 3.4 might also work
  but is no longer officially supported by cloudpickle)
  ([issue #276](cloudpipe/cloudpickle#276))

- Internal refactoring to proactively detect dynamic functions and classes when
  pickling them.  This refactoring also yields small performance improvements
  when pickling dynamic classes (~10%)
  ([issue #273](cloudpipe/cloudpickle#273))

1.1.1
=====

- Minor release to fix a packaging issue (Markdown formatting of the long
  description rendered on pypi.org). The code itself is the same as 1.1.0.

1.1.0
=====

- Support the pickling of interactively-defined functions with positional-only
  arguments. ([issue #266](cloudpipe/cloudpickle#266))

- Track the provenance of dynamic classes and enums so as to preseve the
  usual `isinstance` relationship between pickled objects and their
  original class defintions.
  ([issue #246](cloudpipe/cloudpickle#246))

1.0.0
=====

- Fix a bug making functions with keyword-only arguments forget the default
  values of these arguments after being pickled.
  ([issue #264](cloudpipe/cloudpickle#264))

0.8.1
=====

- Fix a bug (already present before 0.5.3 and re-introduced in 0.8.0)
  affecting relative import instructions inside depickled functions
  ([issue #254](cloudpipe/cloudpickle#254))

0.8.0
=====

- Add support for pickling interactively defined dataclasses.
  ([issue #245](cloudpipe/cloudpickle#245))

- Global variables referenced by functions pickled by cloudpickle are now
  unpickled in a new and isolated namespace scoped by the CloudPickler
  instance. This restores the (previously untested) behavior of cloudpickle
  prior to changes done in 0.5.4 for functions defined in the `__main__`
  module, and 0.6.0/1 for other dynamic functions.

0.7.0
=====

- Correctly serialize dynamically defined classes that have a `__slots__`
  attribute.
  ([issue #225](cloudpipe/cloudpickle#225))

0.6.1
=====

- Fix regression in 0.6.0 which breaks the pickling of local function defined
  in a module, making it impossible to access builtins.
  ([issue #211](cloudpipe/cloudpickle#211))

0.6.0
=====

- Ensure that unpickling a function defined in a dynamic module several times
  sequentially does not reset the values of global variables.
  ([issue #187](cloudpipe/cloudpickle#205))

- Restrict the ability to pickle annotations to python3.7+ ([issue #193](
  cloudpipe/cloudpickle#193) and [issue #196](
  cloudpipe/cloudpickle#196))

- Stop using the deprecated `imp` module under Python 3.
  ([issue #207](cloudpipe/cloudpickle#207))

- Fixed pickling issue with singleton types `NoneType`, `type(...)` and
  `type(NotImplemented)` ([issue #209](cloudpipe/cloudpickle#209))

0.5.6
=====

- Ensure that unpickling a locally defined function that accesses the global
  variables of a module does not reset the values of the global variables if
  they are already initialized.
  ([issue #187](cloudpipe/cloudpickle#187))

0.5.5
=====

- Fixed inconsistent version in `cloudpickle.__version__`.

0.5.4
=====

- Fixed a pickling issue for ABC in python3.7+ ([issue #180](
  cloudpipe/cloudpickle#180)).

- Fixed a bug when pickling functions in `__main__` that access global
  variables ([issue #187](
  cloudpipe/cloudpickle#187)).

0.5.3
=====
- Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in
  types ([issue #144](cloudpipe/cloudpickle#144)).

- itertools objects can also pickled
  ([PR #156](cloudpipe/cloudpickle#156)).

- `logging.RootLogger` can be also pickled
  ([PR #160](cloudpipe/cloudpickle#160)).

0.5.2
=====

- Fixed a regression: `AttributeError` when loading pickles that hold a
  reference to a dynamically defined class from the `__main__` module.
  ([issue #131]( cloudpipe/cloudpickle#131)).

- Make it possible to pickle classes and functions defined in faulty
  modules that raise an exception when trying to look-up their attributes
  by name.

0.5.1
=====

- Fixed `cloudpickle.__version__`.

0.5.0
=====

- Use `pickle.HIGHEST_PROTOCOL` by default.

0.4.4
=====

- `logging.RootLogger` can be also pickled
  ([PR #160](cloudpipe/cloudpickle#160)).

0.4.3
=====

- Fixed a regression: `AttributeError` when loading pickles that hold a
  reference to a dynamically defined class from the `__main__` module.
  ([issue #131]( cloudpipe/cloudpickle#131)).

- Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in
  types. ([issue #144](cloudpipe/cloudpickle#144))

0.4.2
=====

- Restored compatibility with pickles from 0.4.0.
- Handle the `func.__qualname__` attribute.

0.4.1
=====

- Fixed a crash when pickling dynamic classes whose `__dict__` attribute was
  defined as a [`property`](https://docs.python.org/3/library/functions.html#property).
  Most notably, this affected dynamic [namedtuples](https://docs.python.org/2/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields)
  in Python 2. (cloudpipe/cloudpickle#113)
- Cloudpickle now preserves the `__module__` attribute of functions (cloudpipe/cloudpickle#118).
- Fixed a crash when pickling modules that don't have a `__package__` attribute (cloudpipe/cloudpickle#116).

0.4.0
=====

* Fix functions with empty cells
* Allow pickling Logger objects
* Fix crash when pickling dynamic class cycles
* Ignore "None" mdoules added to sys.modules
* Support WeakSets and ABCMeta instances
* Remove non-standard `__transient__` support
* Catch exception from `pickle.whichmodule()`

0.3.1
=====

* Fix version information and ship a changelog

 0.3.0
=====

* Import submodules accessed by pickled functions
* Support recursive functions inside closures
* Fix `ResourceWarnings` and `DeprecationWarnings`
* Assume modules with `__file__` attribute are not dynamic

0.2.2
=====

* Support Python 3.6
* Support Tornado Coroutines
* Support builtin methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci downstream Signal the CI to run the test suite of all registered cloudpickle downstream projects.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloudpickle could extend pickle5
5 participants