Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot pickle '_io.BufferedReader' object exception after DataArray transpose and copy operations when using fsspec filecache. #8443

Open
sharkinsspatial opened this issue Nov 11, 2023 · 3 comments

Comments

@sharkinsspatial
Copy link

What is your issue?

We hit this issue while using rioxarray with a series of operations similar to those noted in this issue corteva/rioxarray#711. After looking through the rioxarray codebase a bit I was able to reproduce the issue with pure xarray operations.

When opening a Dataset with fsspec local caching enabled, transposing a DataArray's coordinates and then copying the DataArray results in a cannot pickle '_io.BufferedReader' object exception.

The issue can be reproduced using this sample notebook.

Note that when using the filecache option with fsspec.filesystem the exception occurs immediately. When using the blockcache option the exception only occurs after a second copy call which is presumably using cached data.

This issue seems potentially related to the fsspec investigation in fsspec/filesystem_spec#579 (comment) but interestingly only seems reproducible with this incantation of transpose followed by copy.

@sharkinsspatial sharkinsspatial added the needs triage Issue that has not been reviewed by xarray team member label Nov 11, 2023
@max-sixty
Copy link
Collaborator

Do this & #8442 require using an S3 backend? If not, is it possible to produce examples which are copy-pastable?

(sorry this hasn't got much attention — some of these cross-library issues are difficult. I can't guarantee an MCVE will be sufficient, though it is necessary...)

@max-sixty max-sixty added needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports and removed needs triage Issue that has not been reviewed by xarray team member labels Dec 5, 2023
@sharkinsspatial
Copy link
Author

@max-sixty Apologies for not providing an MCVE. Initially I thought this was S3 backend specific but I am able to reproduce locally with fsspec target_protocol=file as well.

import fsspec
import xarray as xr
import numpy as np

cache_type = "filecache"
protocol = "file"
file_path = "test.nc"

ds = xr.Dataset(
    {
        'latitude': np.arange(10),
        'longitude': np.arange(10),
        'precip': (['latitude', 'longitude'], np.arange(100).reshape(10,10))

    }
)
ds.to_netcdf(file_path, engine="h5netcdf")

fs = fsspec.filesystem(cache_type, target_protocol=protocol, cache_storage="./cache")
file = fs.open(file_path)

ds = xr.open_dataset(file, engine="h5netcdf", decode_coords=True, decode_times=True, lock=False)
da = ds["precip"]

da = da.transpose("longitude", "latitude", missing_dims="ignore")
da_copy = da.copy()

No worries about the lack of attention, I realize that issues dealing with fsspec interactions are likely very difficult to debug. Thank you for investigating 👍

@max-sixty
Copy link
Collaborator

I can repro.

I'm not completely sure what's going on. I'm guessing the dataset references the file path, and so attempts to copy it when the dataset is being copied. But why does it try to pickle it? I though pickling was for serializing, which this isn't trying to do...

@max-sixty max-sixty added topic-backends and removed needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports labels Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants