NumPy arrays serialize more slowly with cloudpickle than pickle #58

mrocklin · 2016-04-12T23:17:35Z

I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.

In [1]: import numpy as np

In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)

In [3]: import cloudpickle, pickle

In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
Wall time: 185 ms
Out[4]: 100000161

In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 125 ms, sys: 280 ms, total: 404 ms
Wall time: 405 ms
Out[5]: 100000161

mrocklin · 2016-04-12T23:51:43Z

This is in Python 3. Anecdotally I find the following behavior:

Python 2 pickle: 250ms
Python 2 CPickle: 200ms
Python 2 cloudpickle: 250ms
Python 3 pickle: 100-200ms (woot)
Python 3 cloudpickle: 400ms

I'm actually a bit confused about why this isn't significantly faster. Presumably this is pickling the metadata (which is quite fast) and then copying over the data bytestring, which should happen at near memcpy speeds I think?

ogrisel · 2016-04-13T15:35:33Z

Yes, Python 3 pickle.Pickler is like Python 2 cPickle.Pickler and Python 3 pickle._Pickler is like Python 2 pickle.Pickler.

Unfortunately it's not possible to customize the compiled versions of the picklers.

Note that for the latest versions of the pickle protocol, it's possible to pass very large raw byte streams very efficiently but it should probably be fixed in the C code of numpy (I have not checked in details).

pwaller · 2017-08-08T16:01:34Z

Cross reference to relevant conversation on python-dev:

On Mon, 10 Jul 2017 02:35:37 +0200
Victor Stinner <victor.stinner at gmail.com> wrote:

I already proposed to remove this benchmark:
https://mail.python.org/pipermail/speed/2017-April/000554.html

but Antoine Pitrou mentionned that the cloudpickle project uses it.

Maybe we should try to understand what's wrong with _pickle (C module)
for cloudpickle?

That's a good question, Victor. cloudpickle uses three hooks inside
pickle.py's Pickler:

the "dispatch" dictionary

overriding the "save_global" method to support saving more objects
(such as closures, etc.)

overriding the "save_reduce" method; this one doesn't seem really
necessary, perhaps some leftover from previous attempts

_pickle.c's Pickler does seem to allow a custom "dispatch_table", but
it doesn't allow overriding "save_global".

Of course, if _pickle.c were improved to allow such extensions, it
would suddenly allow cloudpickle to be much more performant!

pwaller · 2017-08-08T16:14:35Z

Also cross reference #44.

I keep hitting speed issues with dask due to cloudpickling large-ish objects. These objects are fairly trivial for the C pickler but take minutes for CloudPickle, and it's making an otherwise interactive analysis a bit of a pain. Two key things I want to pickle:

A TfidfVectorizer.transform function, which comes with a couple of large numpy arrays.
A set with millions of strings in it (to use with a ddf.column.isin(s) operation in distributed dask)

I did my own digging. It that the fundamental problem is that the C pickler's dispatch table is only consulted after the fundamental types (including function, list, tuple, etc) have already been written. This means you can't prevent override the behaviour of C pickle when it's writing a list, for example.

The "obvious" fix would be to move this lookup to before the fundamental types, assuming that it is not too expensive to do this lookup all the time. This would then allow inserting things into the C pickler's dispatch table such as function. @pitrou have you considered such a thing? I assume that lookup is likely relatively cheap. Clearly such a move would be a trade-off with a (possibly small?) negative impact on other types of pickling operations. But maybe it's worth it to enable much more efficient pickling of other types of things.

I also threw together a hack which enables cloudpickle to efficiently pickle large sets. For my 1M element set, it gives a 30x speedup over the pure python pickler. I assume works with numpy arrays too.

The basic reason hacks of this style can't be used in general is that once you go into the C side, you then can't come back to the python side. That's fine to do if you know your containers don't have any objects in them which will need cloudpickle.

ogrisel · 2017-08-10T07:38:23Z

I think TfidfVectorizer is slow to pickle with the uncompiled pure Python pickler used by cloudpickle not because of the arrays, but because of the potentially very large dict with string keys vectorizer.vocabulary_: basically the pure-python pickler is much slower whenever there is a large number of Python sub-objects in the payload.

ogrisel · 2017-08-10T07:52:11Z

Maybe we should split the issue into 2 issues: one about the speed of compound objects with many subobjects (typically list, set or dict of strings) and keep this issue for the originally reported single large numpy array pickling perf which I suspect is caused by some unnecessary memory copy when using the pure-python pickler (either at pickling time or at unpicking time).

pwaller · 2017-08-10T09:05:13Z

Apologies, I had intended to make the above comments on #68. I do think there is some overlap though, a solution to the problem of using the cpickler where possible would potentially solve both issues.

pitrou · 2017-08-12T10:15:59Z

The "obvious" fix would be to move this lookup to before the fundamental types, assuming that it is not too expensive to do this lookup all the time. This would then allow inserting things into the C pickler's dispatch table such as function. @pitrou have you considered such a thing? I assume that lookup is likely relatively cheap.

Pickling of "fundamental" built-in types such as sets, tuples, etc. (but probably not functions) is performance-critical. Does cloudpickle only need to override the pickling of functions, or are other built-in types also special-cased?

pitrou · 2017-10-26T16:46:51Z

Commenting on my own quoted message in #58 (comment), I made a confusion between dispatch and dispatch_table. cloudpickle uses dispatch (not dispatch_table), which isn't exposed by the C pickler.

pitrou · 2017-10-26T18:09:51Z

Ok, so here is what I think the C Pickler needs for cloudpickle to make use of it:

a settable dispatch attribute for per-type save() callbacks; this takes priority over built-in code for types and functions, but not necessarily for other objects such as int or list
an overridable save_fallback(obj, exc) method called when the C Pickler fails to serialize obj: the fallback can either implement its own serialization for obj or re-raise exc
to allow calling the following methods (no need to make them overridable):
- save(obj)
- write(bytes)
- memoize(obj)
- save_reduce(*args, obj=None)

save_fallback isn't strictly necessary (we can override everything in dispatch as done currently), but may allow to cut down on the amount of custom code in cloudpickle, and would also ensure faster serialization where possible.

pitrou · 2017-10-26T18:16:37Z

Note to self:

_Pickler.memoize() is memo_put() in _pickle.c
_Pickler.save() is save() in _pickle.c
_Pickler.save_reduce() is save_reduce() in _pickle.c
_Pickler.write() is _Pickler_Write in _pickle.c

ogrisel · 2018-03-31T09:08:10Z

For reference, improvements to the C pickler to make it possible to support efficient local class and function definitions (and possibly other cloudpickle/dill features) directly in the standard library are being discussed in this python-dev mailing list thread: https://mail.python.org/pipermail/python-dev/2018-March/152509.html

pierreglaser · 2019-06-13T09:19:38Z

Should we close this now that #253 was merged and that the CloudPickler class now derives from the C-based Pickler class?

jakirkham · 2019-06-13T14:50:53Z

Went ahead and re-ran the benchmark on Python 3 (didn't try Python 2). Looks like there is a significant improvement.

In [1]: import numpy as np                                                                                              

In [2]: import cloudpickle, pickle                                                                                      

In [3]: data = np.random.randint(0, 255, dtype='u1', size=100000000)                                                    

In [4]: %timeit len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))                                               
80.1 ms ± 722 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))                                          
46.5 ms ± 483 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

pierreglaser · 2019-06-13T14:57:34Z

Went ahead and re-ran the benchmark on Python 3

Which Python 3, if I may?

jakirkham · 2019-06-13T15:07:49Z

Sorry Python 3.7.3.

ogrisel · 2020-05-29T09:52:09Z

cloudpickle subclasses the C implementation of the Pickler class in Python 3.8 and later.

ogrisel · 2020-05-29T09:54:20Z

But actually what is important here is not C vs Python implementation of the pickler but the fact that cloudpickle and pickle use pickle protocol 5 efficiently which is already there Python 3.7.

jakirkham · 2020-05-29T16:35:24Z

Indeed the improvement with pickle protocol 5 is noticeable. Very nice work here already indeed!

It would be good to support this on all Python versions we support. I think PR ( #370 ) should get us there. It's maybe 90% towards being completely green. Any help on getting that last 10% would be appreciated (kind of stuck atm).

mrocklin changed the title ~~Performance issue with numpy arrays~~ NumPy arrays serialize more slowly with cloudpickle than pickle Apr 12, 2016

jakirkham closed this as completed Jun 13, 2019

basnijholt mentioned this issue May 28, 2020

cloudpickle extremely slow on pickling Dict[int, float] #375

Closed

richardsheridan mentioned this issue Sep 11, 2021

Opportunistically use cloudpickle richardsheridan/trio-parallel#115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NumPy arrays serialize more slowly with cloudpickle than pickle #58

NumPy arrays serialize more slowly with cloudpickle than pickle #58

mrocklin commented Apr 12, 2016

mrocklin commented Apr 12, 2016

ogrisel commented Apr 13, 2016 •

edited

Loading

pwaller commented Aug 8, 2017

pwaller commented Aug 8, 2017 •

edited

Loading

ogrisel commented Aug 10, 2017 •

edited

Loading

ogrisel commented Aug 10, 2017

pwaller commented Aug 10, 2017

pitrou commented Aug 12, 2017

pitrou commented Oct 26, 2017

pitrou commented Oct 26, 2017

pitrou commented Oct 26, 2017

ogrisel commented Mar 31, 2018

pierreglaser commented Jun 13, 2019

jakirkham commented Jun 13, 2019

pierreglaser commented Jun 13, 2019

jakirkham commented Jun 13, 2019

ogrisel commented May 29, 2020

ogrisel commented May 29, 2020

jakirkham commented May 29, 2020

NumPy arrays serialize more slowly with cloudpickle than pickle #58

NumPy arrays serialize more slowly with cloudpickle than pickle #58

Comments

mrocklin commented Apr 12, 2016

mrocklin commented Apr 12, 2016

ogrisel commented Apr 13, 2016 • edited Loading

pwaller commented Aug 8, 2017

pwaller commented Aug 8, 2017 • edited Loading

ogrisel commented Aug 10, 2017 • edited Loading

ogrisel commented Aug 10, 2017

pwaller commented Aug 10, 2017

pitrou commented Aug 12, 2017

pitrou commented Oct 26, 2017

pitrou commented Oct 26, 2017

pitrou commented Oct 26, 2017

ogrisel commented Mar 31, 2018

pierreglaser commented Jun 13, 2019

jakirkham commented Jun 13, 2019

pierreglaser commented Jun 13, 2019

jakirkham commented Jun 13, 2019

ogrisel commented May 29, 2020

ogrisel commented May 29, 2020

jakirkham commented May 29, 2020

ogrisel commented Apr 13, 2016 •

edited

Loading

pwaller commented Aug 8, 2017 •

edited

Loading

ogrisel commented Aug 10, 2017 •

edited

Loading