BUG: (regression? v2 vs v1.5) ValueError: Big-endian buffer not supported on little-endian compiler #53234

st-bender · 2023-05-15T11:24:23Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
import xarray as xr

ds = xr.Dataset(
    {
        "a": (("x", "y"), np.arange(24).reshape((6, 4)))
    },
    coords={"x": np.arange(6, dtype=">f4")}
)

# raises
# ValueError: Big-endian buffer not supported on little-endian compiler
# on pandas 2.0.1 but *not* on pandas 1.5.3
dsi = ds.interp(x=np.array([1.3, 2.5]))

Issue Description

Hi there,
I onserved that one of my tests failed with
ValueError: Big-endian buffer not supported on little-endian compiler
which had no problem before.
I am not sure what changed internally and how, but observed that this is raised when using pandas version 2 and still succeeds with pandas 1.5.

I found some old reports and the FAQ, but since the behaviour is different between versions, this might be of interest anyway. Or maybe I need to file it with xarray.

The full traceback is:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py39/lib/python3.9/site-packages/xarray/core/dataset.py:3366: in interp
    obj, newidx = missing._localize(obj, {k: v})
.tox/py39/lib/python3.9/site-packages/xarray/core/missing.py:565: in _localize
    imin = index.get_indexer([minval], method="nearest").item()
.tox/py39/lib/python3.9/site-packages/pandas/core/indexes/base.py:3730: in get_indexer
    if not self._index_as_unique:
.tox/py39/lib/python3.9/site-packages/pandas/core/indexes/base.py:6006: in _index_as_unique
    return self.is_unique
pandas/_libs/properties.pyx:36: in pandas._libs.properties.CachedProperty.__get__
    ???
.tox/py39/lib/python3.9/site-packages/pandas/core/indexes/base.py:2238: in is_unique
    return self._engine.is_unique
pandas/_libs/index.pyx:236: in pandas._libs.index.IndexEngine.is_unique.__get__
    ???
pandas/_libs/index.pyx:241: in pandas._libs.index.IndexEngine._do_unique_check
    ???
pandas/_libs/index.pyx:303: in pandas._libs.index.IndexEngine._ensure_mapping_populated
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: Big-endian buffer not supported on little-endian compiler

pandas/_libs/hashtable_class_helper.pxi:7104: ValueError

A slightly different test provides also the function name:

File "pandas/_libs/hashtable_class_helper.pxi", line 7104, in pandas._libs.hashtable.PyObjectHashTable.map_locations
ValueError: Big-endian buffer not supported on little-endian compiler

Expected Behavior

No exception is raised.

Installed Versions

INSTALLED VERSIONS

commit : 37ea63d
python : 3.9.16.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1160.71.1.el7.x86_64
Version : #1 SMP Tue Jun 28 15:37:28 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_GB.UTF-8
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.1
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.4.0
pip : 23.0.1
pytest : 7.3.1
scipy : 1.10.1
xarray : 2023.4.2
tzdata : 2023.3

(all others are "none")

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2023-05-18T22:22:09Z

Or maybe I need to file it with xarray.

It looks like the error is getting raised inside pandas, so this is a fine place to report. It would be helpful if you can narrow it down to a reproducible example that doesn't require xarray

st-bender · 2023-05-20T16:18:55Z

@jbrockmendel Thanks for looking into it. It looks like index.get_indexer() fails, here is an updated test case without xarray:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    data=np.arange(6),
    index=np.arange(6, dtype=">f4"),
    columns=["x"],
)
dfi = df.index.get_indexer([1.3], method="nearest")

Same effect, works with pandas 1.5 but raises an exception with pandas 2.0.

jbrockmendel · 2023-05-22T22:30:29Z

Thanks for updating the example, much easier to look into on our end!

Looks like we should probably disallow big-endian dtypes in the Index constructor.

st-bender · 2023-05-23T17:11:49Z

Thanks for updating the example, much easier to look into on our end!

Looks like we should probably disallow big-endian dtypes in the Index constructor.

I am not sure that is a good idea. It will probably break downstream packages such as xarray, e.g. when reading netcdf files with a different endianess than the system. Sometimes the user does not have control over the endianess because the files are produced on different systems. Those would be left unable to read and process such files.

Also, if I understand correctly, this is not an issue with big-endian per-se, but when the endianess of the index is opposite to the system's endianess. Unfortunately I cannot test the case with little-endian index on a big-endian system.

jbrockmendel · 2023-05-23T18:03:46Z

My best guess (worth checking) is that in 1.5 we silently converted to little-endian, which would make a copy. If that guess is correct, then the choice is to either restore that behavior or to raise, telling users to convert themselves. I lean towards the raising, but wouldn't mind either way

st-bender · 2023-05-24T16:17:11Z

Indeed, it looks like pandas 1.5 converts to native byteorder, the dtype changes from '>f4' to 'float' on little endian, in v2 it stays '>f4'. Note that native order can be either, little or big endian. So just converting big endian might only catch half the cases.

I would prefer backwards compatibility, internally converting seems to have worked fine so far. One could raise a warning though, so that the user can decide if it is important or not.

ejhyer · 2023-08-30T16:18:16Z

Just writing to bump this seeing no activity. This is obviously an edge case that won't affect many people, but it's still an egregious regression. Based on the consequences, namely, netCDF files generated on some systems becoming unreadable via xarray on other systems, I would say the only appropriate course is to restore this transform to Pandas.

jbrockmendel · 2023-08-30T16:31:12Z

A PR would be welcome.

ejhyer · 2023-08-30T17:13:44Z

Apparently, this is a documented issue in gotchas.rst in tagged releases at least as far back as v1.0.0: https://github.com/pandas-dev/pandas/blob/609c3b74b0da87e5c1f36bcf4f6b490ac94413a0/doc/source/user_guide/gotchas.rst#byte-ordering-issues

But it did work in v1.5.3, and in fact works in v2+ for many operations that aren't get_indexer(). The other methods implemented in https://github.com/pandas-dev/pandas/blame/main/pandas/core/indexes/base.py still do this conversion automatically. Here is an example that illustrates what works in old and new versions of pandas:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    data=np.arange(6),
    index=np.arange(6, dtype=">f4"),
    columns=["x"],
)
df2 = pd.DataFrame(
    data=np.arange(6),
    index=np.arange(6, dtype="<f4")+3,
    columns=["x"],
)
print(df.index.union(df2.index))
print(df.index.intersection(df2.index))
print(df.index.get_indexer(df2.index[0:2], method="nearest"))

With pandas v1.5.3:

Float64Index([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], dtype='float64')
Float64Index([3.0, 4.0, 5.0], dtype='float64')
[3 4]

With pandas v2.0.3:

Index([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], dtype='float32')
Index([3.0, 4.0, 5.0], dtype='float32')
<...>
ValueError: Big-endian buffer not supported on little-endian compiler

It's a very blunt recasting clause in union() and the other routines: https://github.com/pandas-dev/pandas/blob/609c3b74b0da87e5c1f36bcf4f6b490ac94413a0/pandas/core/indexes/base.py#L3289C11-L3292C48
There is a recasting clause in get_indexer() but it's slightly different: https://github.com/pandas-dev/pandas/blob/609c3b74b0da87e5c1f36bcf4f6b490ac94413a0/pandas/core/indexes/base.py#L3901C2-L3910C14

It's not clear to me how (or if) that difference is causing this behavior:

st-bender · 2023-08-31T11:21:58Z

Hi there,
Thanks for still looking into it. I think I eventually programmed around it, converting the endianess myself after reading the file before doing any indexing or selecting.
I got it to work for the big -> little endian case (using xarray) by turning ">" into "<" in the dtype string and using .astype(). It might be a bit trickier for the general case, or it might have some unwanted side effects, but it worked for me.

In @ejhyer's example, it looks like pandas 1.5.x converts all float types to float64, but pandas 2.0.x keeps the types as float32 and also keeps the endianess.
Don't know which one is better, I'd probably prefer the new behaviour that seems to have less un-intended type conversions (from a user's point of view), except for the indexing issue. The last example works when converting the first index to little endian before indexing:

print(df.index.astype("<f4").get_indexer(df2.index[0:2], method="nearest"))

Note that it does not work the other way around, converting both to big-endian on a little endian machine. Can't test the behaviour on a big-endian machine.

ejhyer · 2023-08-31T18:28:30Z

Here is an even shorter test case:

import numpy as np
import pandas as pd
idx = pd.Index(np.array([1, 5,  7]).astype('>f4'))
idx.is_unique

My argument for restoring automatic byteswap/recast basically boils down to "many other numpy/pandas/xarray operations do this (silently)." So endianness is transparent to the user in many cases, except when attempting certain pandas operations.
My understanding of the internals of pandas is insufficient to go much farther. @jbrockmendel said:

we should probably disallow big-endian dtypes in the Index constructor.

I agree with this. Looking at the case above, I think my preference would be for an automatic byteswap/recast, and I think a Warning is appropriate if the function returns (or could return) something with a dtype different from what the user explicitly asked for. The Warning could possible be something like RuntimeWarning: values for Index automatically recast to system endianness.

st-bender · 2023-09-01T15:23:12Z

we should probably disallow big-endian dtypes in the Index constructor.

I agree with this.

Wouldn't that render pandas unusable on big-endian machines?

ejhyer · 2023-09-01T16:35:24Z

Wouldn't that render pandas unusable on big-endian machines?

Should have said "disallow construction of Indexes with non-native endianness."

st-bender added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 15, 2023

jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label May 22, 2023

lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label May 30, 2023

This was referenced Sep 6, 2023

Big-endian issue with pandas 2.1.0 mwaskom/seaborn#3464

Closed

Bump pandas from 2.0.0 to 2.1.0 cds-astro/tutorials#58

Closed

jorisvandenbossche added Regression Functionality that used to work in a prior pandas version and removed Bug labels Sep 14, 2023

jorisvandenbossche added this to the 2.1.1 milestone Sep 14, 2023

lithomas1 modified the milestones: 2.1.1, 2.1.2 Sep 21, 2023

lithomas1 modified the milestones: 2.1.2, 2.1.3 Oct 26, 2023

jorisvandenbossche modified the milestones: 2.1.3, 2.1.4 Nov 13, 2023

lithomas1 modified the milestones: 2.1.4, 2.2 Dec 8, 2023

lithomas1 modified the milestones: 2.2, 2.2.1 Jan 20, 2024

lithomas1 modified the milestones: 2.2.1, 2.2.2 Feb 23, 2024

ocefpaf mentioned this issue Apr 10, 2024

Fix missing file in tests pyoceans/pocean-core#85

Merged

lithomas1 modified the milestones: 2.2.2, 2.2.3 Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: (regression? v2 vs v1.5) ValueError: Big-endian buffer not supported on little-endian compiler #53234

BUG: (regression? v2 vs v1.5) ValueError: Big-endian buffer not supported on little-endian compiler #53234

st-bender commented May 15, 2023

INSTALLED VERSIONS

jbrockmendel commented May 18, 2023

st-bender commented May 20, 2023

jbrockmendel commented May 22, 2023

st-bender commented May 23, 2023

jbrockmendel commented May 23, 2023

st-bender commented May 24, 2023

ejhyer commented Aug 30, 2023

jbrockmendel commented Aug 30, 2023

ejhyer commented Aug 30, 2023 •

edited

Loading

st-bender commented Aug 31, 2023

ejhyer commented Aug 31, 2023

st-bender commented Sep 1, 2023

ejhyer commented Sep 1, 2023 •

edited

Loading

BUG: (regression? v2 vs v1.5) ValueError: Big-endian buffer not supported on little-endian compiler #53234

BUG: (regression? v2 vs v1.5) ValueError: Big-endian buffer not supported on little-endian compiler #53234

Comments

st-bender commented May 15, 2023

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

jbrockmendel commented May 18, 2023

st-bender commented May 20, 2023

jbrockmendel commented May 22, 2023

st-bender commented May 23, 2023

jbrockmendel commented May 23, 2023

st-bender commented May 24, 2023

ejhyer commented Aug 30, 2023

jbrockmendel commented Aug 30, 2023

ejhyer commented Aug 30, 2023 • edited Loading

st-bender commented Aug 31, 2023

ejhyer commented Aug 31, 2023

st-bender commented Sep 1, 2023

ejhyer commented Sep 1, 2023 • edited Loading

ejhyer commented Aug 30, 2023 •

edited

Loading

ejhyer commented Sep 1, 2023 •

edited

Loading