Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't let CuPy iterate over Index with Python for loops #3142

Conversation

jpivarski
Copy link
Member

Arguably, this is a performance thing, but it's so catastrophic that I'd call it a bug-fix.

@lgray noticed that slicing an Awkward-CUDA array was significantly slower than slicing an Awkward Array on the CPU. It's because CuPy thought that an ak.index.Index is a generic Python iterable, not a CUDA array.

We assumed that adding a Index.__cuda_array_interface__ property would make CuPy notice that Index is a CUDA capable array. The CuPy documentation describes instances in which it produces (for Numba) and consumes (from PyTorch) the __cuda_array_interface__ property. However, none of those examples illustrate the pattern

cupy_array[object_that_implements_cuda_interface]

(with a cp.ndarray.__getitem__). Since they don't promise to promote CUDA arrays on slices, I can't be sure that this is a CuPy bug.

At least to support existing versions of CuPy, we can't make this assumption. I temporarily added this to Index:

--- a/src/awkward/index.py
+++ b/src/awkward/index.py
@@ -161,6 +161,31 @@ class Index:
     def __len__(self) -> int:
         return int(self.length)
 
+    _cycler = 0
+    def __getattr__(self, name):
+        if name == "__array_struct__":
+            self._cycler = 1
+            raise AttributeError(name)
+        elif name == "__array_interface__":
+            if self._cycler == 1:
+                self._cycler = 2
+            else:
+                self._cycler = 0
+            return self._data.__array_interface__
+        elif name == "__array__":
+            if self._cycler == 2:
+                self._cycler = 3
+                raise Exception(
+                    "CuPy is coming to the conclusion that Index is a Python iterable"
+                )
+            else:
+                self._cycler = 0
+            raise AttributeError(name)
+        elif name in self.__dict__:
+            return self.__dict__[name]
+        else:
+            raise AttributeError(name)
+
     @property
     def __cuda_array_interface__(self):
         return self._data.__cuda_array_interface__  # type: ignore[attr-defined]

and ran pytest tests-cuda to catch all instances that are covered by the tests.1 There were two (fixed in this PR).

@ManasviGoyal, can you check this?

@agoose77, do you think there might be other cases in which you assumed that CuPy would auto-promote an Index as a CUDA array?

Meanwhile, I'm going to report this to CuPy, just in case it is unintended/a bug.

Footnotes

  1. We have to do this "cycler" thing to identify requests for __array_struct__, __array_interface__, __array__ in succession because CuPy does not appear in the stack trace. It's implemented in Cython, which doesn't emit Python stack frames.

@jpivarski
Copy link
Member Author

Reported at cupy/cupy#8352.

@lgray lgray mentioned this pull request Jun 6, 2024
13 tasks
@jpivarski
Copy link
Member Author

@ManasviGoyal, give this a quick check and let me know if something seems wrong (since you've been deep in the GPU code, you might spot something that I didn't). Otherwise, check this as "approved" and I'll merge it. Thanks!

Copy link
Collaborator

@ManasviGoyal ManasviGoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski I did check it earlier and everything looks good. I used the function you mentioned and didn't notice any other cases with the same issue in the new tests I added. I'll test it again when I add more integration tests. Go ahead and merge it!

@jpivarski
Copy link
Member Author

Thanks!

@jpivarski jpivarski merged commit 5de7b35 into main Jun 6, 2024
41 checks passed
@jpivarski jpivarski deleted the jpivarski/dont-let-cupy-iterate-over-Index-with-Python-for-loops branch June 6, 2024 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants