Skip to content

Commit

Permalink
DOC more in-depth context description in README
Browse files Browse the repository at this point in the history
  • Loading branch information
pierreglaser committed Jul 31, 2021
1 parent e4ca37d commit 1e9a48d
Showing 1 changed file with 22 additions and 13 deletions.
35 changes: 22 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,19 +72,24 @@ Overriding pickle's serialization mechanism for importable constructs:

An important difference between `cloudpickle` and `pickle` is that
`cloudpickle` can serialize a function or class **by value**, whereas `pickle`
can only serialize it **by reference**, e.g. by serializing its *module
attribute path* (such as `my_module.my_function`).

By default, `cloudpickle` only uses serialization by value in cases where
serialization by reference is usually ineffective, for example when the
function/class to be pickled was constructed in an interactive Python session.

Since `cloudpickle 1.7.0`, it is possible to extend the use of serialization by
value to functions or classes coming from **any pure Python module**. This feature
is useful when the said module is unavailable in the unpickling environment
(making traditional serialization by reference ineffective). To this end,
`cloudpickle` exposes the
`register_pickle_by_value`/`unregister_pickle_by_value` functions:
can only serialize it **by reference**. Serialization by reference treats
functions and classes as attributes of modules, and pickles them through
instructions that trigger the import of their module at load time.
Serialization by reference is thus limited in that it assumes that the module
containing the function or class is available/importable in the unpickling
environment. This assumption breaks when pickling constructs defined in an
interactive session, a case that is automatically detected by `cloudpickle`,
that pickles such constructs **by value**.

Another case where the importability assumption is expected to break is when
developing a module in a distributed execution environment: the worker
processes may not have access to the said module, for example if they live on a
different machine than the process in which the module is being developed.
By itself, `cloudpickle` cannot detect such "locally importable" modules and
switch to serialization by value; instead, it relies on its default mode,
which is serialization by reference. However, since `cloudpickle 1.7.0`, one
can explicitly specify modules for which serialization by value should be used,
using the `register_pickle_by_value(module)`/`/unregister_pickle(module)` API:

```python
>>> import cloudpickle
Expand All @@ -95,6 +100,10 @@ is useful when the said module is unavailable in the unpickling environment
>>> cloudpickle.dumps(my_module.my_function) # my_function is pickled by reference
```

Using this API, there is no need to re-install the new version of the module on
all the worker nodes nor to restart the workers: restarting the client Python
process with the new source code is enough.

Note that this feature is still **experimental**, and may fail in the following
situations:

Expand Down

0 comments on commit 1e9a48d

Please sign in to comment.