Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle error in Dataset.sortby() when used with file-like #9330

Open
5 tasks done
kmuehlbauer opened this issue Aug 12, 2024 · 3 comments
Open
5 tasks done

pickle error in Dataset.sortby() when used with file-like #9330

kmuehlbauer opened this issue Aug 12, 2024 · 3 comments
Labels

Comments

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Aug 12, 2024

What happened?

When using file-like (with open(filename...) in connection with ds.sortby() we get TypeError: cannot pickle 'BufferedReader' instances. ds.reindex() works.

What did you expect to happen?

ds.sortby should work without error.

Minimal Complete Verifiable Example

The issue is triggered by ds.sortby(), we can strip this down to pure xarray:

import numpy as np
import netCDF4
import xarray as xr
import fsspec
import io

# create file with netCDF4 to have all bits and pieces
filename = "test.h5"
with netCDF4.Dataset(filename,'w') as f:
    f.createDimension('x', 3)
    f.createDimension('y', 6)
    var = f.createVariable('var', 'int8', ('x', 'y'))
    xcoord = f.createVariable('x', 'int8', ('x'))
    ycoord = f.createVariable('y', 'int8', ('y'))
    xcoord[:] = [2, 1, 0]
    ycoord[:] = [0, 1, 2, 3, 4, 5]
    var[:] = np.arange(18).reshape((3, 6))

This works, maybe because CachingFileManager ist used?

with xr.backends.H5NetCDFStore.open(filename) as store:
    store_entrypoint = xr.backends.StoreBackendEntrypoint()
    ds = store_entrypoint.open_dataset(store)
    display(ds["var"].values)
    ds = ds.sortby('x')
    display(ds["var"].values)
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]], dtype=int8)

array([[12, 13, 14, 15, 16, 17],
       [ 6,  7,  8,  9, 10, 11],
       [ 0,  1,  2,  3,  4,  5]], dtype=int8)

This works too, with reindex.

with open(filename, 'rb') as f:
    print(type(f))
    store = xr.backends.H5NetCDFStore.open(f)
    store_entrypoint = xr.backends.StoreBackendEntrypoint()
    ds = store_entrypoint.open_dataset(store)
    display(ds["var"].values)
    ds = ds.reindex({"x": np.array([0, 1, 2])})
    display(ds["var"].values)
<class '_io.BufferedReader'>
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]], dtype=int8)

array([[12, 13, 14, 15, 16, 17],
       [ 6,  7,  8,  9, 10, 11],
       [ 0,  1,  2,  3,  4,  5]], dtype=int8)

This breaks with sortby.

with open(filename, 'rb') as f:
    print(type(f))
    store = xr.backends.H5NetCDFStore.open(f)
    store_entrypoint = xr.backends.StoreBackendEntrypoint()
    ds = store_entrypoint.open_dataset(store)
    display(ds["var"].values)
    ds = ds.sortby("x")
    display(ds["var"].values)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

<class '_io.BufferedReader'>
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]], dtype=int8)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[69], line 6
      4 ds = store_entrypoint.open_dataset(store)
      5 display(ds["var"].values)
----> 6 ds = ds.sortby("x")
      7 display(ds["var"].values)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py:8182](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py#line=8181), in Dataset.sortby(self, variables, ascending)
   8180     variables = variables
   8181 arrays = [v if isinstance(v, DataArray) else self[v] for v in variables]
-> 8182 aligned_vars = align(self, *arrays, join="left")
   8183 aligned_self = cast("Self", aligned_vars[0])
   8184 aligned_other_vars = cast(tuple[DataArray, ...], aligned_vars[1:])

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py:882](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py#line=881), in align(join, copy, indexes, exclude, fill_value, *objects)
    686 """
    687 Given any number of Dataset and/or DataArray objects, returns new
    688 objects with aligned indexes and dimension sizes.
   (...)
    872 
    873 """
    874 aligner = Aligner(
    875     objects,
    876     join=join,
   (...)
    880     fill_value=fill_value,
    881 )
--> 882 aligner.align()
    883 return aligner.results

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py:582](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py#line=581), in Aligner.align(self)
    580     self.results = self.objects
    581 else:
--> 582     self.reindex_all()

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py:557](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py#line=556), in Aligner.reindex_all(self)
    556 def reindex_all(self) -> None:
--> 557     self.results = tuple(
    558         self._reindex_one(obj, matching_indexes)
    559         for obj, matching_indexes in zip(
    560             self.objects, self.objects_matching_indexes
    561         )
    562     )

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py:558](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py#line=557), in <genexpr>(.0)
    556 def reindex_all(self) -> None:
    557     self.results = tuple(
--> 558         self._reindex_one(obj, matching_indexes)
    559         for obj, matching_indexes in zip(
    560             self.objects, self.objects_matching_indexes
    561         )
    562     )

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py:546](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/alignment.py#line=545), in Aligner._reindex_one(self, obj, matching_indexes)
    543 new_indexes, new_variables = self._get_indexes_and_vars(obj, matching_indexes)
    544 dim_pos_indexers = self._get_dim_pos_indexers(matching_indexes)
--> 546 return obj._reindex_callback(
    547     self,
    548     dim_pos_indexers,
    549     new_variables,
    550     new_indexes,
    551     self.fill_value,
    552     self.exclude_dims,
    553     self.exclude_vars,
    554 )

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py:3517](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py#line=3516), in Dataset._reindex_callback(self, aligner, dim_pos_indexers, variables, indexes, fill_value, exclude_dims, exclude_vars)
   3515         reindexed = self._overwrite_indexes(new_indexes, new_variables)
   3516     else:
-> 3517         reindexed = self.copy(deep=aligner.copy)
   3518 else:
   3519     to_reindex = {
   3520         k: v
   3521         for k, v in self.variables.items()
   3522         if k not in variables and k not in exclude_vars
   3523     }

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py:1366](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py#line=1365), in Dataset.copy(self, deep, data)
   1269 def copy(self, deep: bool = False, data: DataVars | None = None) -> Self:
   1270     """Returns a copy of this dataset.
   1271 
   1272     If `deep=True`, a deep copy is made of each of the component variables.
   (...)
   1364     pandas.DataFrame.copy
   1365     """
-> 1366     return self._copy(deep=deep, data=data)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py:1402](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/dataset.py#line=1401), in Dataset._copy(self, deep, data, memo)
   1400         variables[k] = index_vars[k]
   1401     else:
-> 1402         variables[k] = v._copy(deep=deep, data=data.get(k), memo=memo)
   1404 attrs = copy.deepcopy(self._attrs, memo) if deep else copy.copy(self._attrs)
   1405 encoding = (
   1406     copy.deepcopy(self._encoding, memo) if deep else copy.copy(self._encoding)
   1407 )

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/variable.py:925](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/variable.py#line=924), in Variable._copy(self, deep, data, memo)
    922         ndata = indexing.MemoryCachedArray(data_old.array)  # type: ignore[assignment]
    924     if deep:
--> 925         ndata = copy.deepcopy(ndata, memo)
    927 else:
    928     ndata = as_compatible_data(data)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:162](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=161), in deepcopy(x, memo, _nil)
    160                 y = x
    161             else:
--> 162                 y = _reconstruct(x, memo, *rv)
    164 # If is its own copy, don't memoize.
    165 if y is not x:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:259](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=258), in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    257 if state is not None:
    258     if deep:
--> 259         state = deepcopy(state, memo)
    260     if hasattr(y, '__setstate__'):
    261         y.__setstate__(state)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:201](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=200), in _deepcopy_tuple(x, memo, deepcopy)
    200 def _deepcopy_tuple(x, memo, deepcopy=deepcopy):
--> 201     y = [deepcopy(a, memo) for a in x]
    202     # We're not going to put the tuple in the memo, but it's still important we
    203     # check for it, in case the tuple contains recursive mutable structures.
    204     try:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:221](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=220), in _deepcopy_dict(x, memo, deepcopy)
    219 memo[id(x)] = y
    220 for key, value in x.items():
--> 221     y[deepcopy(key, memo)] = deepcopy(value, memo)
    222 return y

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:162](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=161), in deepcopy(x, memo, _nil)
    160                 y = x
    161             else:
--> 162                 y = _reconstruct(x, memo, *rv)
    164 # If is its own copy, don't memoize.
    165 if y is not x:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:259](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=258), in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    257 if state is not None:
    258     if deep:
--> 259         state = deepcopy(state, memo)
    260     if hasattr(y, '__setstate__'):
    261         y.__setstate__(state)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:201](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=200), in _deepcopy_tuple(x, memo, deepcopy)
    200 def _deepcopy_tuple(x, memo, deepcopy=deepcopy):
--> 201     y = [deepcopy(a, memo) for a in x]
    202     # We're not going to put the tuple in the memo, but it's still important we
    203     # check for it, in case the tuple contains recursive mutable structures.
    204     try:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:221](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=220), in _deepcopy_dict(x, memo, deepcopy)
    219 memo[id(x)] = y
    220 for key, value in x.items():
--> 221     y[deepcopy(key, memo)] = deepcopy(value, memo)
    222 return y

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:162](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=161), in deepcopy(x, memo, _nil)
    160                 y = x
    161             else:
--> 162                 y = _reconstruct(x, memo, *rv)
    164 # If is its own copy, don't memoize.
    165 if y is not x:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:259](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=258), in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    257 if state is not None:
    258     if deep:
--> 259         state = deepcopy(state, memo)
    260     if hasattr(y, '__setstate__'):
    261         y.__setstate__(state)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:201](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=200), in _deepcopy_tuple(x, memo, deepcopy)
    200 def _deepcopy_tuple(x, memo, deepcopy=deepcopy):
--> 201     y = [deepcopy(a, memo) for a in x]
    202     # We're not going to put the tuple in the memo, but it's still important we
    203     # check for it, in case the tuple contains recursive mutable structures.
    204     try:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:221](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=220), in _deepcopy_dict(x, memo, deepcopy)
    219 memo[id(x)] = y
    220 for key, value in x.items():
--> 221     y[deepcopy(key, memo)] = deepcopy(value, memo)
    222 return y

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:162](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=161), in deepcopy(x, memo, _nil)
    160                 y = x
    161             else:
--> 162                 y = _reconstruct(x, memo, *rv)
    164 # If is its own copy, don't memoize.
    165 if y is not x:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:259](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=258), in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    257 if state is not None:
    258     if deep:
--> 259         state = deepcopy(state, memo)
    260     if hasattr(y, '__setstate__'):
    261         y.__setstate__(state)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:201](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=200), in _deepcopy_tuple(x, memo, deepcopy)
    200 def _deepcopy_tuple(x, memo, deepcopy=deepcopy):
--> 201     y = [deepcopy(a, memo) for a in x]
    202     # We're not going to put the tuple in the memo, but it's still important we
    203     # check for it, in case the tuple contains recursive mutable structures.
    204     try:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:136](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=135), in deepcopy(x, memo, _nil)
    134 copier = _deepcopy_dispatch.get(cls)
    135 if copier is not None:
--> 136     y = copier(x, memo)
    137 else:
    138     if issubclass(cls, type):

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:201](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=200), in _deepcopy_tuple(x, memo, deepcopy)
    200 def _deepcopy_tuple(x, memo, deepcopy=deepcopy):
--> 201     y = [deepcopy(a, memo) for a in x]
    202     # We're not going to put the tuple in the memo, but it's still important we
    203     # check for it, in case the tuple contains recursive mutable structures.
    204     try:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py:151](http://localhost:8888/lab/tree/home/kai/python/gists/xradar/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/copy.py#line=150), in deepcopy(x, memo, _nil)
    149 reductor = getattr(x, "__reduce_ex__", None)
    150 if reductor is not None:
--> 151     rv = reductor(4)
    152 else:
    153     reductor = getattr(x, "__reduce__", None)

TypeError: cannot pickle 'BufferedReader' instances

Anything else we need to know?

There are quite a bunch of GH issues out there, which are discussing that problem. It seems, that io.BufferedReader can't be pickled.

XRef:

What I'm wondering is why the reindex above works and the sortby fails?

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150500.55.68-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.7.1.dev12+gb22c4296.d20240809
pandas: 2.2.2
numpy: 2.0.1
scipy: 1.14.0
netCDF4: 1.7.1
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.4.0
dask: 2024.8.0
distributed: 2024.8.0
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: None
numbagg: None
fsspec: 2024.6.1
cupy: None
pint: 0.24.3
sparse: None
flox: None
numpy_groupies: None
setuptools: 70.1.1
pip: 24.0
conda: None
pytest: 8.2.2
mypy: None
IPython: 8.25.0
sphinx: None

@kmuehlbauer kmuehlbauer added bug needs triage Issue that has not been reviewed by xarray team member labels Aug 12, 2024
@dcherian
Copy link
Contributor

Why is it being pickled though?

@kmuehlbauer
Copy link
Contributor Author

It must have to do how the Aligner-object is initialized.

One difference between sortby and reindex is that the latter uses given indexes and join="inner" whereas sortby only uses join="left". That leads to a deepcopy inside _reindex_callback (Line 3517) which finally tries to pickle the underlying BufferedReader instance:

xarray/xarray/core/dataset.py

Lines 3511 to 3517 in ce211b8

if not dim_pos_indexers:
# fast path for no reindexing necessary
if set(new_indexes) - set(self._indexes):
# this only adds new indexes and their coordinate variables
reindexed = self._overwrite_indexes(new_indexes, new_variables)
else:
reindexed = self.copy(deep=aligner.copy)

Obviously a deepcopy isn't triggered with reindex.

@dcherian
Copy link
Contributor

I think we can definitely change sortby to use reindex. This seems to be an edge case in how BufferedReader is copied, which could also be fixed...

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants