-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for __arrow_array__
#2650
Conversation
src/awkward/highlevel.py
Outdated
@@ -1325,6 +1325,14 @@ def __array__(self, *args, **kwargs): | |||
with ak._errors.OperationErrorContext("numpy.asarray", (self, *args), kwargs): | |||
return ak._connect.numpy.convert_to_array(self._layout, args, kwargs) | |||
|
|||
def __arrow_array__(self, *args, **kwargs): | |||
with ak._errors.OperationErrorContext( | |||
f"{type(self).__name__}.__arrow_array__", (self, *args), kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be clear that this is Awkward's code, not pyarrow.array(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, the error message would show up in ak.to_arrow
, since you're calling the high-level version of that and not its _impl
. But users may be confused because they don't have any ak.to_arrow
in their own code, so this helps.
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of the PR and how it's implemented, but something needs to be done about the Arrow bug it triggers in old versions of Arrow (see below).
src/awkward/highlevel.py
Outdated
@@ -1325,6 +1325,14 @@ def __array__(self, *args, **kwargs): | |||
with ak._errors.OperationErrorContext("numpy.asarray", (self, *args), kwargs): | |||
return ak._connect.numpy.convert_to_array(self._layout, args, kwargs) | |||
|
|||
def __arrow_array__(self, *args, **kwargs): | |||
with ak._errors.OperationErrorContext( | |||
f"{type(self).__name__}.__arrow_array__", (self, *args), kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, the error message would show up in ak.to_arrow
, since you're calling the high-level version of that and not its _impl
. But users may be confused because they don't have any ak.to_arrow
in their own code, so this helps.
src/awkward/_connect/pyarrow.py
Outdated
if args == () and kwargs == {}: | ||
return out | ||
else: | ||
return pyarrow.array(out, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your test passes, but I tried what ought to be the same thing and it didn't:
>>> import pyarrow as pa
>>> a = pa.array([1.1, 2.2, 3.3, 4.4, 5.5])
>>> b = pa.array(a, type=pa.float64())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/array.pxi", line 317, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert <pyarrow.DoubleScalar: 1.1> with type pyarrow.lib.DoubleScalar: tried to convert to double
(I was going to do a test to find out if wrapping an Arrow array with pa.array
is zero-copy. It ought to be—I was just checking.)
It might be my pyarrow version (9.0.0): I checkout out this repo and tried to run the test and it failed (same error).
We currently support Arrow versions as low as 7.0.0. This is apparently a bug that they fixed in version condastats
wanted to lower it to Arrow 6, which made it tricky to build an environment in which conda stats can actually be analyzed.)
We can't pass pa.array
arguments into to_arrow
because to_arrow
doesn't usually call pa.array
directly; it uses pa.Array.from_buffers
, which doesn't take the same arguments. Arrow version else
case of this if
-else
: maybe check the Arrow version there? If it's less than
Or maybe that's overkill and we should just explain that it's an Arrow bug, fixed in version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, my choice would be for the last one: just ensure that our tests work with the Arrow versions we say we support, even if the Arrow version we support has known bugs when some (not all) operations are attempted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot; I assumed that this part of the Arrow API was more stable than this. It looks like only 12.0.0 supports the casting that is performed by the test case in question. I've made it skipif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the other tests weren't working for me, so I've added a minimum pyarrow to the build matrix. We don't depend on it quite as much as NumPy, but increasingly, and it has more radical changes from one version to the next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two of these failures are the new tests I was expecting, but some others are from pyarrow string functions. I'm going to test those locally now.
https://github.com/scikit-hep/awkward/actions/runs/5871689180/job/15921684610?pr=2650
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're all fine in my pyarrow 9.0.0, but apparently not in the pyarrow 7.0.0 that we say we support. It was good to test!
They're encoding mistakes. I can fix the string functions (in this PR) while you fix the casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
…to agoose77/feat-arrow-protocol
I think this one will pass because the only error in your last commit was in the strings. Feel free to merge when you're ready! |
Fixes #2647