Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NumPy arrays serialize more slowly with cloudpickle than pickle #58

Closed
mrocklin opened this issue Apr 12, 2016 · 19 comments
Closed

NumPy arrays serialize more slowly with cloudpickle than pickle #58

mrocklin opened this issue Apr 12, 2016 · 19 comments

Comments

@mrocklin
Copy link
Contributor

I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.

In [1]: import numpy as np

In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)

In [3]: import cloudpickle, pickle

In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
Wall time: 185 ms
Out[4]: 100000161

In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 125 ms, sys: 280 ms, total: 404 ms
Wall time: 405 ms
Out[5]: 100000161
@mrocklin mrocklin changed the title Performance issue with numpy arrays NumPy arrays serialize more slowly with cloudpickle than pickle Apr 12, 2016
@mrocklin
Copy link
Contributor Author

This is in Python 3. Anecdotally I find the following behavior:

  • Python 2 pickle: 250ms
  • Python 2 CPickle: 200ms
  • Python 2 cloudpickle: 250ms
  • Python 3 pickle: 100-200ms (woot)
  • Python 3 cloudpickle: 400ms

I'm actually a bit confused about why this isn't significantly faster. Presumably this is pickling the metadata (which is quite fast) and then copying over the data bytestring, which should happen at near memcpy speeds I think?

@ogrisel
Copy link
Contributor

ogrisel commented Apr 13, 2016

Yes, Python 3 pickle.Pickler is like Python 2 cPickle.Pickler and Python 3 pickle._Pickler is like Python 2 pickle.Pickler.

Unfortunately it's not possible to customize the compiled versions of the picklers.

Note that for the latest versions of the pickle protocol, it's possible to pass very large raw byte streams very efficiently but it should probably be fixed in the C code of numpy (I have not checked in details).

@pwaller
Copy link

pwaller commented Aug 8, 2017

Cross reference to relevant conversation on python-dev:

On Mon, 10 Jul 2017 02:35:37 +0200
Victor Stinner <victor.stinner at gmail.com> wrote:

I already proposed to remove this benchmark:
https://mail.python.org/pipermail/speed/2017-April/000554.html

but Antoine Pitrou mentionned that the cloudpickle project uses it.

Maybe we should try to understand what's wrong with _pickle (C module)
for cloudpickle?

That's a good question, Victor. cloudpickle uses three hooks inside
pickle.py's Pickler:

  • the "dispatch" dictionary
  • overriding the "save_global" method to support saving more objects
    (such as closures, etc.)
  • overriding the "save_reduce" method; this one doesn't seem really
    necessary, perhaps some leftover from previous attempts

_pickle.c's Pickler does seem to allow a custom "dispatch_table", but
it doesn't allow overriding "save_global".

Of course, if _pickle.c were improved to allow such extensions, it
would suddenly allow cloudpickle to be much more performant!

@pwaller
Copy link

pwaller commented Aug 8, 2017

Also cross reference #44.

I keep hitting speed issues with dask due to cloudpickling large-ish objects. These objects are fairly trivial for the C pickler but take minutes for CloudPickle, and it's making an otherwise interactive analysis a bit of a pain. Two key things I want to pickle:

  1. A TfidfVectorizer.transform function, which comes with a couple of large numpy arrays.
  2. A set with millions of strings in it (to use with a ddf.column.isin(s) operation in distributed dask)

I did my own digging. It that the fundamental problem is that the C pickler's dispatch table is only consulted after the fundamental types (including function, list, tuple, etc) have already been written. This means you can't prevent override the behaviour of C pickle when it's writing a list, for example.

The "obvious" fix would be to move this lookup to before the fundamental types, assuming that it is not too expensive to do this lookup all the time. This would then allow inserting things into the C pickler's dispatch table such as function. @pitrou have you considered such a thing? I assume that lookup is likely relatively cheap. Clearly such a move would be a trade-off with a (possibly small?) negative impact on other types of pickling operations. But maybe it's worth it to enable much more efficient pickling of other types of things.

I also threw together a hack which enables cloudpickle to efficiently pickle large sets. For my 1M element set, it gives a 30x speedup over the pure python pickler. I assume works with numpy arrays too.

The basic reason hacks of this style can't be used in general is that once you go into the C side, you then can't come back to the python side. That's fine to do if you know your containers don't have any objects in them which will need cloudpickle.

@ogrisel
Copy link
Contributor

ogrisel commented Aug 10, 2017

I think TfidfVectorizer is slow to pickle with the uncompiled pure Python pickler used by cloudpickle not because of the arrays, but because of the potentially very large dict with string keys vectorizer.vocabulary_: basically the pure-python pickler is much slower whenever there is a large number of Python sub-objects in the payload.

@ogrisel
Copy link
Contributor

ogrisel commented Aug 10, 2017

Maybe we should split the issue into 2 issues: one about the speed of compound objects with many subobjects (typically list, set or dict of strings) and keep this issue for the originally reported single large numpy array pickling perf which I suspect is caused by some unnecessary memory copy when using the pure-python pickler (either at pickling time or at unpicking time).

@pwaller
Copy link

pwaller commented Aug 10, 2017

Apologies, I had intended to make the above comments on #68. I do think there is some overlap though, a solution to the problem of using the cpickler where possible would potentially solve both issues.

@pitrou
Copy link
Member

pitrou commented Aug 12, 2017

The "obvious" fix would be to move this lookup to before the fundamental types, assuming that it is not too expensive to do this lookup all the time. This would then allow inserting things into the C pickler's dispatch table such as function. @pitrou have you considered such a thing? I assume that lookup is likely relatively cheap.

Pickling of "fundamental" built-in types such as sets, tuples, etc. (but probably not functions) is performance-critical. Does cloudpickle only need to override the pickling of functions, or are other built-in types also special-cased?

@pitrou
Copy link
Member

pitrou commented Oct 26, 2017

Commenting on my own quoted message in #58 (comment), I made a confusion between dispatch and dispatch_table. cloudpickle uses dispatch (not dispatch_table), which isn't exposed by the C pickler.

@pitrou
Copy link
Member

pitrou commented Oct 26, 2017

Ok, so here is what I think the C Pickler needs for cloudpickle to make use of it:

  • a settable dispatch attribute for per-type save() callbacks; this takes priority over built-in code for types and functions, but not necessarily for other objects such as int or list
  • an overridable save_fallback(obj, exc) method called when the C Pickler fails to serialize obj: the fallback can either implement its own serialization for obj or re-raise exc
  • to allow calling the following methods (no need to make them overridable):
    • save(obj)
    • write(bytes)
    • memoize(obj)
    • save_reduce(*args, obj=None)

save_fallback isn't strictly necessary (we can override everything in dispatch as done currently), but may allow to cut down on the amount of custom code in cloudpickle, and would also ensure faster serialization where possible.

@pitrou
Copy link
Member

pitrou commented Oct 26, 2017

Note to self:

  • _Pickler.memoize() is memo_put() in _pickle.c
  • _Pickler.save() is save() in _pickle.c
  • _Pickler.save_reduce() is save_reduce() in _pickle.c
  • _Pickler.write() is _Pickler_Write in _pickle.c

@ogrisel
Copy link
Contributor

ogrisel commented Mar 31, 2018

For reference, improvements to the C pickler to make it possible to support efficient local class and function definitions (and possibly other cloudpickle/dill features) directly in the standard library are being discussed in this python-dev mailing list thread: https://mail.python.org/pipermail/python-dev/2018-March/152509.html

@pierreglaser
Copy link
Member

Should we close this now that #253 was merged and that the CloudPickler class now derives from the C-based Pickler class?

@jakirkham
Copy link
Member

Went ahead and re-ran the benchmark on Python 3 (didn't try Python 2). Looks like there is a significant improvement.

In [1]: import numpy as np                                                                                              

In [2]: import cloudpickle, pickle                                                                                      

In [3]: data = np.random.randint(0, 255, dtype='u1', size=100000000)                                                    

In [4]: %timeit len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))                                               
80.1 ms ± 722 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))                                          
46.5 ms ± 483 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@pierreglaser
Copy link
Member

Went ahead and re-ran the benchmark on Python 3

Which Python 3, if I may?

@jakirkham
Copy link
Member

Sorry Python 3.7.3.

@ogrisel
Copy link
Contributor

ogrisel commented May 29, 2020

cloudpickle subclasses the C implementation of the Pickler class in Python 3.8 and later.

@ogrisel
Copy link
Contributor

ogrisel commented May 29, 2020

But actually what is important here is not C vs Python implementation of the pickler but the fact that cloudpickle and pickle use pickle protocol 5 efficiently which is already there Python 3.7.

@jakirkham
Copy link
Member

Indeed the improvement with pickle protocol 5 is noticeable. Very nice work here already indeed!

It would be good to support this on all Python versions we support. I think PR ( #370 ) should get us there. It's maybe 90% towards being completely green. Any help on getting that last 10% would be appreciated (kind of stuck atm).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants