Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeltaStorageHandler is not serializable #1015

Closed
chitralverma opened this issue Dec 13, 2022 · 4 comments · Fixed by #1016
Closed

DeltaStorageHandler is not serializable #1015

chitralverma opened this issue Dec 13, 2022 · 4 comments · Fixed by #1016
Labels
bug Something isn't working

Comments

@chitralverma
Copy link
Contributor

chitralverma commented Dec 13, 2022

Environment

Delta-rs version: latest

Binding:
Python

Environment:

  • Cloud provider: all
  • OS: mac ventura, m1
  • Other:

Bug

What happened:
I am trying to add delta connector to Polars, but which doing a lazy scan polars tries to serialize function and objects, see here. This line breaks with the below exception as DeltaFileSystemHandler is not serializable I suppose,

TypeError: cannot pickle 'builtins.DeltaFileSystemHandler' object

It will be great if this could be serializable, otherwise on Polars side we will have to rely on fsspec or pyarrow.fs, bypassing the DeltaStorageHandler implementation altogether.

What you expected to happen:
Exception should not occur.

How to reproduce it:
You can use the following minimal code to reproduce this.

import deltalake
from deltalake.fs import DeltaStorageHandler
dsh = DeltaStorageHandler("./delta-table")

import pickle
pickle.dumps(dsh)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot pickle 'builtins.DeltaFileSystemHandler' object

More details:

@chitralverma chitralverma added the bug Something isn't working label Dec 13, 2022
@roeap
Copy link
Collaborator

roeap commented Dec 13, 2022

Hi @chitralverma - had a quick look into this it seems this is not natively supported by pyo3 modules. That being said, there are some pointers in an #pyo3/100 that I can explore to make this work.

Really looking forward to a polars integration - thanks! 👍

@chitralverma
Copy link
Contributor Author

Hi @chitralverma - had a quick look into this it seems this is not natively supported by pyo3 modules. That being said, there are some pointers in an #pyo3/100 that I can explore to make this work.

Really looking forward to a polars integration - thanks! 👍

Thanks a lot

@ritchie46
Copy link

ritchie46 commented Dec 13, 2022

Hey @roeap. We also utilize pyo3 in polars and implemented pickle by adding a __getstate__ and __setstate__ method to the class.

See here for our implementation: https://github.com/pola-rs/polars/blob/4c79ef293975b48aaceefdf7d03611822affc7f5/py-polars/src/lazy/dsl.rs#L80

You can use the serde method of choice.

@roeap
Copy link
Collaborator

roeap commented Dec 13, 2022

@ritchie46 @chitralverma - Thanks so much for your input - thanks to that this was quite straight forward and I have a working prototype!

By now I feel much more confident I can open a PR very soon 😆.

roeap added a commit that referenced this issue Dec 30, 2022
# Description

Integrating with polars requires the `DeltaStorageHandler` to be
serializable with pickle. this PR implements the required dunder methods
to make it so...

Unfortunately we lost the ability to instantiate the
`DeltaStorageHandler` with an existing object store, however I do
believe that this is not a critical loss.

cc @chitralverma @ritchie46

# Related Issue(s)

closes #1015

# Documentation

<!---
Share links to useful documentation
--->
chitralverma pushed a commit to chitralverma/delta-rs that referenced this issue Mar 17, 2023
# Description

Integrating with polars requires the `DeltaStorageHandler` to be
serializable with pickle. this PR implements the required dunder methods
to make it so...

Unfortunately we lost the ability to instantiate the
`DeltaStorageHandler` with an existing object store, however I do
believe that this is not a critical loss.

cc @chitralverma @ritchie46

# Related Issue(s)

closes delta-io#1015

# Documentation

<!---
Share links to useful documentation
--->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants