-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: (regression? v2 vs v1.5) ValueError: Big-endian buffer not supported on little-endian compiler #53234
Comments
It looks like the error is getting raised inside pandas, so this is a fine place to report. It would be helpful if you can narrow it down to a reproducible example that doesn't require xarray |
@jbrockmendel Thanks for looking into it. It looks like import numpy as np
import pandas as pd
df = pd.DataFrame(
data=np.arange(6),
index=np.arange(6, dtype=">f4"),
columns=["x"],
)
dfi = df.index.get_indexer([1.3], method="nearest") Same effect, works with pandas 1.5 but raises an exception with pandas 2.0. |
Thanks for updating the example, much easier to look into on our end! Looks like we should probably disallow big-endian dtypes in the Index constructor. |
I am not sure that is a good idea. It will probably break downstream packages such as xarray, e.g. when reading netcdf files with a different endianess than the system. Sometimes the user does not have control over the endianess because the files are produced on different systems. Those would be left unable to read and process such files. Also, if I understand correctly, this is not an issue with big-endian per-se, but when the endianess of the index is opposite to the system's endianess. Unfortunately I cannot test the case with little-endian index on a big-endian system. |
My best guess (worth checking) is that in 1.5 we silently converted to little-endian, which would make a copy. If that guess is correct, then the choice is to either restore that behavior or to raise, telling users to convert themselves. I lean towards the raising, but wouldn't mind either way |
Indeed, it looks like pandas 1.5 converts to native byteorder, the dtype changes from '>f4' to 'float' on little endian, in v2 it stays '>f4'. Note that native order can be either, little or big endian. So just converting big endian might only catch half the cases. I would prefer backwards compatibility, internally converting seems to have worked fine so far. One could raise a warning though, so that the user can decide if it is important or not. |
Just writing to bump this seeing no activity. This is obviously an edge case that won't affect many people, but it's still an egregious regression. Based on the consequences, namely, netCDF files generated on some systems becoming unreadable via xarray on other systems, I would say the only appropriate course is to restore this transform to Pandas. |
A PR would be welcome. |
Apparently, this is a documented issue in gotchas.rst in tagged releases at least as far back as v1.0.0: https://github.com/pandas-dev/pandas/blob/609c3b74b0da87e5c1f36bcf4f6b490ac94413a0/doc/source/user_guide/gotchas.rst#byte-ordering-issues But it did work in v1.5.3, and in fact works in v2+ for many operations that aren't
With pandas v1.5.3:
With pandas v2.0.3:
It's a very blunt recasting clause in It's not clear to me how (or if) that difference is causing this behavior: |
Hi there, In @ejhyer's example, it looks like pandas 1.5.x converts all float types to
Note that it does not work the other way around, converting both to big-endian on a little endian machine. Can't test the behaviour on a big-endian machine. |
I agree with this. Looking at the case above, I think my preference would be for an automatic byteswap/recast, and I think a Warning is appropriate if the function returns (or could return) something with a dtype different from what the user explicitly asked for. The Warning could possible be something like |
Wouldn't that render |
Should have said "disallow construction of Indexes with non-native endianness." |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Hi there,
I onserved that one of my tests failed with
ValueError: Big-endian buffer not supported on little-endian compiler
which had no problem before.
I am not sure what changed internally and how, but observed that this is raised when using pandas version 2 and still succeeds with pandas 1.5.
I found some old reports and the FAQ, but since the behaviour is different between versions, this might be of interest anyway. Or maybe I need to file it with
xarray
.The full traceback is:
A slightly different test provides also the function name:
Expected Behavior
No exception is raised.
Installed Versions
INSTALLED VERSIONS
commit : 37ea63d
python : 3.9.16.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1160.71.1.el7.x86_64
Version : #1 SMP Tue Jun 28 15:37:28 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_GB.UTF-8
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 2.0.1
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.4.0
pip : 23.0.1
pytest : 7.3.1
scipy : 1.10.1
xarray : 2023.4.2
tzdata : 2023.3
(all others are "none")
The text was updated successfully, but these errors were encountered: