Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pickling support for Python tokenizers #73

Merged
merged 8 commits into from
Jun 12, 2020
Merged

Add pickling support for Python tokenizers #73

merged 8 commits into from
Jun 12, 2020

Conversation

rth
Copy link
Owner

@rth rth commented Jun 12, 2020

Partially addresses #25

This adds __getstate__ / __setstate__ methods to make pickling work, following discussion in PyO3/pyo3#100 and adapting the https://gist.github.com/ethanhs/fd4123487974c91c7e5960acc9aa2a77 example.

There is probably some way to add those methods via macros to avoid code repetition but I haven't figured it out yet.

Pickling support for stem and vectorize modules will be added in a follow up PR.

This also removes the parameter attributes from python wrappers e.g. RegexpTokenizer.pattern as they were anyway not synced with the Rust parameter struct (here RegexpTokenizer.inner.params.pattern), so changing them had no effect. If we want to make it work we could rather first make sure set_params / get_params methods are working as expected, and then implement them via __getattr__ / __setattr__.

use vtext::tokenize::*;

#[pyclass]
#[pyclass(module = "vtext.tokenize")]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to PyO3/pyo3#474 that should have been fixed, but I still see the same error.

@rth rth merged commit 172838c into master Jun 12, 2020
@rth rth deleted the pickle branch June 12, 2020 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant