CuSolver: Switch to 64 bit api to allow for eigh on matrices > than 26732x26732 #23413

PhilipVinc · 2024-09-03T19:59:58Z

Jaxlib links to CuSolver 32 bit api, which has a hard limit on workspace size which makes it such that it is not possible to diagonalise matrices larger than 26k^2 when using np.float64.

norabelrose · 2024-09-04T15:13:43Z

I'm running into a similar issue with jax.linalg.svd

PhilipVinc · 2024-09-04T15:38:10Z

Yeah, this affects all CuSolver apis, so svd and various factorizations as well…

dfm · 2024-09-04T17:44:13Z

Great suggestion! I'm in the midst of updating all the cuSolver wrappers so I'll plan on getting this in as part of that process. I'd guess that I probably won't be able to land this before the next JAX release, but I'll try!

For reference, it looks like there's an open issue suggesting this for the CPU backend too: #20904

norabelrose · 2024-09-04T19:42:56Z

Thanks a lot for this @dfm, even just a PR / branch that uses the 64 bit api would be very useful since I am trying to run SVD on some large matrices for a project right now.

dfm · 2024-09-04T19:47:40Z

Sure - I can prioritize SVD. Just to confirm, you're running on a GPU, @norabelrose?

norabelrose · 2024-09-04T19:57:45Z

Sure - I can prioritize SVD. Just to confirm, you're running on a GPU, @norabelrose?

Yep, that's right. Thanks!

Unlike the other GPU linear algebra kernels that I've ported so far, this one isn't straightforward to implement as a single kernel, and while it does support lowering without access to a GPU (no more descriptor!), it only supports dynamics shapes in the batch dimensions. There are two main technical challenges: 1. The main `gesvd` kernels in cuSolver/hipSolver only support matrices with shape `(m, n)` with ``m >= n`. This means that we need to transpose the inputs and outputs as part of the lowering rule when `m < n`. (Note: we actually just use C layouts instead of Fortran layouts to implement this case.) While this could be handled in the kernel, this seemed like a lot of work for somewhat limited benefit, and it would probably have performance implications. 2. The `gesvd` and `gesvdj` kernels return `V^H` and `V` respectively, and the batched version of `gesvdj` doesn't support `full_matrices=False`. This means that we need logic in the lowering rule to handle transposition and slicing. This makes it hard to have the algorithm selection be a parameter to the kernel. Another note: cuSolver has a 64-bit implementation of the SVD, and we always use that implementation on the CUDA backend. The 32-bit interface is included for ROCM support, and I have tested it manually. This was a feature request from #23413. PiperOrigin-RevId: 676526543

Unlike the other GPU linear algebra kernels that I've ported so far, this one isn't straightforward to implement as a single kernel, and while it does support lowering without access to a GPU (no more descriptor!), it only supports dynamics shapes in the batch dimensions. There are two main technical challenges: 1. The main `gesvd` kernels in cuSolver/hipSolver only support matrices with shape `(m, n)` with `m >= n`. This means that we need to transpose the inputs and outputs as part of the lowering rule when `m < n`. (Note: we actually just use C layouts instead of Fortran layouts to implement this case.) While this could be handled in the kernel, this seemed like a lot of work for somewhat limited benefit, and it would probably have performance implications. 2. The `gesvd` and `gesvdj` kernels return `V^H` and `V` respectively, and the batched version of `gesvdj` doesn't support `full_matrices=False`. This means that we need logic in the lowering rule to handle transposition and slicing. This makes it hard to have the algorithm selection be a parameter to the kernel. Another note: cuSolver has a 64-bit implementation of the SVD, and we always use that implementation on the CUDA backend. The 32-bit interface is included for ROCM support, and I have tested it manually. This was a feature request from #23413. PiperOrigin-RevId: 676526543

Unlike the other GPU linear algebra kernels that I've ported so far, this one isn't straightforward to implement as a single kernel, and while it does support lowering without access to a GPU (no more descriptor!), it only supports dynamics shapes in the batch dimensions. There are two main technical challenges: 1. The main `gesvd` kernels in cuSolver/hipSolver only support matrices with shape `(m, n)` with `m >= n`. This means that we need to transpose the inputs and outputs as part of the lowering rule when `m < n`. (Note: we actually just use C layouts instead of Fortran layouts to implement this case.) While this could be handled in the kernel, this seemed like a lot of work for somewhat limited benefit, and it would probably have performance implications. 2. The `gesvd` and `gesvdj` kernels return `V^H` and `V` respectively, and the batched version of `gesvdj` doesn't support `full_matrices=False`. This means that we need logic in the lowering rule to handle transposition and slicing. This makes it hard to have the algorithm selection be a parameter to the kernel. Another note: cuSolver has a 64-bit implementation of the SVD, and we always use that implementation on the CUDA backend. The 32-bit interface is included for ROCM support, and I have tested it manually. This was a feature request from #23413. PiperOrigin-RevId: 676839182

Unlike the other GPU linear algebra kernels that I've ported so far, this one isn't straightforward to implement as a single kernel, and while it does support lowering without access to a GPU (no more descriptor!), it only supports dynamics shapes in the batch dimensions. There are two main technical challenges: 1. The main `gesvd` kernels in cuSolver/hipSolver only support matrices with shape `(m, n)` with `m >= n`. This means that we need to transpose the inputs and outputs as part of the lowering rule when `m < n`. (Note: we actually just use C layouts instead of Fortran layouts to implement this case.) While this could be handled in the kernel, this seemed like a lot of work for somewhat limited benefit, and it would probably have performance implications. 2. The `gesvd` and `gesvdj` kernels return `V^H` and `V` respectively, and the batched version of `gesvdj` doesn't support `full_matrices=False`. This means that we need logic in the lowering rule to handle transposition and slicing. This makes it hard to have the algorithm selection be a parameter to the kernel. Another note: cuSolver has a 64-bit implementation of the SVD, and we always use that implementation on the CUDA backend. The 32-bit interface is included for ROCM support, and I have tested it manually. This was a feature request from jax-ml#23413. PiperOrigin-RevId: 676839182

Part of #23413 PiperOrigin-RevId: 684121210

PhilipVinc added the enhancement New feature or request label Sep 3, 2024

dfm self-assigned this Sep 4, 2024

dfm mentioned this issue Sep 4, 2024

Improve CuSolver errors diagnostics #23410

Open

copybara-service bot mentioned this issue Sep 20, 2024

Port GPU kernels for SVD to the FFI. #23794

Closed

copybara-service bot pushed a commit that referenced this issue Oct 9, 2024

Update Eigh kernel on GPU to use 64-bit interface when it is available.

0ec6976

Part of #23413 PiperOrigin-RevId: 684121210

copybara-service bot mentioned this issue Oct 9, 2024

Update Eigh kernel on GPU to use 64-bit interface when it is available. #24216

Open

copybara-service bot pushed a commit that referenced this issue Oct 9, 2024

Update Eigh kernel on GPU to use 64-bit interface when it is available.

57e3db4

Part of #23413 PiperOrigin-RevId: 684121210

copybara-service bot pushed a commit that referenced this issue Oct 9, 2024

Update Eigh kernel on GPU to use 64-bit interface when it is available.

205a130

Part of #23413 PiperOrigin-RevId: 684121210

copybara-service bot pushed a commit that referenced this issue Oct 10, 2024

Update Eigh kernel on GPU to use 64-bit interface when it is available.

74f857d

Part of #23413 PiperOrigin-RevId: 684121210

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CuSolver: Switch to 64 bit api to allow for eigh on matrices > than 26732x26732 #23413

CuSolver: Switch to 64 bit api to allow for eigh on matrices > than 26732x26732 #23413

PhilipVinc commented Sep 3, 2024 •

edited

Loading

norabelrose commented Sep 4, 2024

PhilipVinc commented Sep 4, 2024

dfm commented Sep 4, 2024

norabelrose commented Sep 4, 2024

dfm commented Sep 4, 2024

norabelrose commented Sep 4, 2024

CuSolver: Switch to 64 bit api to allow for eigh on matrices > than 26732x26732 #23413

CuSolver: Switch to 64 bit api to allow for eigh on matrices > than 26732x26732 #23413

Comments

PhilipVinc commented Sep 3, 2024 • edited Loading

norabelrose commented Sep 4, 2024

PhilipVinc commented Sep 4, 2024

dfm commented Sep 4, 2024

norabelrose commented Sep 4, 2024

dfm commented Sep 4, 2024

norabelrose commented Sep 4, 2024

PhilipVinc commented Sep 3, 2024 •

edited

Loading