From 1e9a48dc7ccf3885d4ae2be03069a3c67b611583 Mon Sep 17 00:00:00 2001 From: Pierre Glaser Date: Sat, 31 Jul 2021 18:18:22 +0100 Subject: [PATCH] DOC more in-depth context description in README --- README.md | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index cd35b04c..731fe0ab 100644 --- a/README.md +++ b/README.md @@ -72,19 +72,24 @@ Overriding pickle's serialization mechanism for importable constructs: An important difference between `cloudpickle` and `pickle` is that `cloudpickle` can serialize a function or class **by value**, whereas `pickle` -can only serialize it **by reference**, e.g. by serializing its *module -attribute path* (such as `my_module.my_function`). - -By default, `cloudpickle` only uses serialization by value in cases where -serialization by reference is usually ineffective, for example when the -function/class to be pickled was constructed in an interactive Python session. - -Since `cloudpickle 1.7.0`, it is possible to extend the use of serialization by -value to functions or classes coming from **any pure Python module**. This feature -is useful when the said module is unavailable in the unpickling environment -(making traditional serialization by reference ineffective). To this end, -`cloudpickle` exposes the -`register_pickle_by_value`/`unregister_pickle_by_value` functions: +can only serialize it **by reference**. Serialization by reference treats +functions and classes as attributes of modules, and pickles them through +instructions that trigger the import of their module at load time. +Serialization by reference is thus limited in that it assumes that the module +containing the function or class is available/importable in the unpickling +environment. This assumption breaks when pickling constructs defined in an +interactive session, a case that is automatically detected by `cloudpickle`, +that pickles such constructs **by value**. + +Another case where the importability assumption is expected to break is when +developing a module in a distributed execution environment: the worker +processes may not have access to the said module, for example if they live on a +different machine than the process in which the module is being developed. +By itself, `cloudpickle` cannot detect such "locally importable" modules and +switch to serialization by value; instead, it relies on its default mode, +which is serialization by reference. However, since `cloudpickle 1.7.0`, one +can explicitly specify modules for which serialization by value should be used, +using the `register_pickle_by_value(module)`/`/unregister_pickle(module)` API: ```python >>> import cloudpickle @@ -95,6 +100,10 @@ is useful when the said module is unavailable in the unpickling environment >>> cloudpickle.dumps(my_module.my_function) # my_function is pickled by reference ``` +Using this API, there is no need to re-install the new version of the module on +all the worker nodes nor to restart the workers: restarting the client Python +process with the new source code is enough. + Note that this feature is still **experimental**, and may fail in the following situations: