-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for __arrow_array__
#2650
Changes from 2 commits
a5b6b6a
1380d23
6744cda
af1adca
7df8b9e
4d2659e
0f18323
7ebf8ff
dfa5d06
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1325,6 +1325,14 @@ def __array__(self, *args, **kwargs): | |
with ak._errors.OperationErrorContext("numpy.asarray", (self, *args), kwargs): | ||
return ak._connect.numpy.convert_to_array(self._layout, args, kwargs) | ||
|
||
def __arrow_array__(self, *args, **kwargs): | ||
with ak._errors.OperationErrorContext( | ||
f"{type(self).__name__}.__arrow_array__", (self, *args), kwargs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to be clear that this is Awkward's code, not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without this, the error message would show up in |
||
): | ||
from awkward._connect.pyarrow import convert_to_array | ||
|
||
return convert_to_array(self._layout, args, kwargs) | ||
|
||
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): | ||
""" | ||
Intercepts attempts to pass this Array to a NumPy | ||
|
@@ -2417,6 +2425,14 @@ def __array__(self, *args, **kwargs): | |
with ak._errors.OperationErrorContext("numpy.asarray", (self, *args), kwargs): | ||
return ak._connect.numpy.convert_to_array(self.snapshot(), args, kwargs) | ||
|
||
def __arrow_array__(self, *args, **kwargs): | ||
with ak._errors.OperationErrorContext( | ||
f"{type(self).__name__}.__arrow_array__", (self, *args), kwargs | ||
): | ||
from awkward._connect.pyarrow import convert_to_array | ||
|
||
return convert_to_array(self.snapshot(), args, kwargs) | ||
|
||
@property | ||
def numba_type(self): | ||
""" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# BSD 3-Clause License; see https://github.com/scikit-hep/awkward-1.0/blob/main/LICENSE | ||
|
||
import numpy as np | ||
import pytest | ||
|
||
import awkward as ak | ||
|
||
pyarrow = pytest.importorskip("pyarrow") | ||
|
||
|
||
def test(): | ||
array = ak.Array( | ||
ak.contents.RecordArray( | ||
[ | ||
ak.to_layout(np.arange(6, dtype=np.uint8)), | ||
ak.to_layout(np.arange(6, dtype=np.float64)), | ||
], | ||
["x", "y"], | ||
) | ||
) | ||
arrow_array = pyarrow.array(array) | ||
|
||
assert arrow_array.tolist() == array.to_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your test passes, but I tried what ought to be the same thing and it didn't:
(I was going to do a test to find out if wrapping an Arrow array with
pa.array
is zero-copy. It ought to be—I was just checking.)It might be my pyarrow version (9.0.0): I checkout out this repo and tried to run the test and it failed (same error).
We currently support Arrow versions as low as 7.0.0. This is apparently a bug that they fixed in version$x$ such that $9 < x \le 12$ . We could increase that threshold, except that Arrow is an important requirement for basic things like saving and reading files, and other packages have a tendency of putting upper bounds on the Arrow version. (
condastats
wanted to lower it to Arrow 6, which made it tricky to build an environment in which conda stats can actually be analyzed.)We can't pass$x$ is only needed for the $x$ , raise an error message saying that passing arguments requires Arrow version $x$ ?
pa.array
arguments intoto_arrow
becauseto_arrow
doesn't usually callpa.array
directly; it usespa.Array.from_buffers
, which doesn't take the same arguments. Arrow versionelse
case of thisif
-else
: maybe check the Arrow version there? If it's less thanOr maybe that's overkill and we should just explain that it's an Arrow bug, fixed in version$x$ , when our users encounter this. It would at least require an Arrow version check in the test, so that we can run the tests with lower versions of Arrow. (And maybe that needs to be added to the CI matrix, not just NumPy minimum version, but Arrow, too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, my choice would be for the last one: just ensure that our tests work with the Arrow versions we say we support, even if the Arrow version we support has known bugs when some (not all) operations are attempted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot; I assumed that this part of the Arrow API was more stable than this. It looks like only 12.0.0 supports the casting that is performed by the test case in question. I've made it
skipif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the other tests weren't working for me, so I've added a minimum pyarrow to the build matrix. We don't depend on it quite as much as NumPy, but increasingly, and it has more radical changes from one version to the next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two of these failures are the new tests I was expecting, but some others are from pyarrow string functions. I'm going to test those locally now.
https://github.com/scikit-hep/awkward/actions/runs/5871689180/job/15921684610?pr=2650
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're all fine in my pyarrow 9.0.0, but apparently not in the pyarrow 7.0.0 that we say we support. It was good to test!
They're encoding mistakes. I can fix the string functions (in this PR) while you fix the casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!