fix: don't let CuPy iterate over Index with Python for loops #3142
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Arguably, this is a performance thing, but it's so catastrophic that I'd call it a bug-fix.
@lgray noticed that slicing an Awkward-CUDA array was significantly slower than slicing an Awkward Array on the CPU. It's because CuPy thought that an
ak.index.Index
is a generic Python iterable, not a CUDA array.We assumed that adding a
Index.__cuda_array_interface__
property would make CuPy notice that Index is a CUDA capable array. The CuPy documentation describes instances in which it produces (for Numba) and consumes (from PyTorch) the__cuda_array_interface__
property. However, none of those examples illustrate the pattern(with a
cp.ndarray.__getitem__
). Since they don't promise to promote CUDA arrays on slices, I can't be sure that this is a CuPy bug.At least to support existing versions of CuPy, we can't make this assumption. I temporarily added this to Index:
and ran
pytest tests-cuda
to catch all instances that are covered by the tests.1 There were two (fixed in this PR).@ManasviGoyal, can you check this?
@agoose77, do you think there might be other cases in which you assumed that CuPy would auto-promote an Index as a CUDA array?
Meanwhile, I'm going to report this to CuPy, just in case it is unintended/a bug.
Footnotes
We have to do this "cycler" thing to identify requests for
__array_struct__
,__array_interface__
,__array__
in succession because CuPy does not appear in the stack trace. It's implemented in Cython, which doesn't emit Python stack frames. ↩