Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running CCA on GPU #138

Open
fipelle opened this issue Mar 4, 2022 · 11 comments
Open

Running CCA on GPU #138

fipelle opened this issue Mar 4, 2022 · 11 comments

Comments

@fipelle
Copy link

fipelle commented Mar 4, 2022

Hi, is it possible to run a subsection of these versions of CCA on GPU? If so, would you please write down a short example?

@jameschapman19
Copy link
Owner

Yeah the examples are set up to work with pytorch lightning so should be as simple as passing GPUs=1 to trainer as in here:

https://pytorch-lightning.readthedocs.io/en/stable/common/single_gpu.html

@fipelle
Copy link
Author

fipelle commented Mar 4, 2022

Does it only apply to the "Deep Models"?

@jameschapman19
Copy link
Owner

Ah I see - yes only the deep models.

I'd be curious which model is running slow.

In the alternating optimisation methods the bottleneck will be scikitlearn regression solvers. In the CCA/regularisedCCA/PLS models the bottleneck will be the eigenvalue problem solver.

If you're aware of a GPU accelerated version of either of these bottlenecks I'd be interested either as a pointer or a PR.

The other possible direction is the SOTA speed models for CCA that generally use stochastic methods e.g. https://proceedings.neurips.cc/paper/2017/file/c30fb4dc55d801fc7473840b5b161dfa-Paper.pdf

Or

https://proceedings.neurips.cc/paper/2014/file/54229abfcfa5649e7003b83dd4755294-Paper.pdf

@jameschapman19
Copy link
Owner

Also worth saying if you use the Deep CCA methods with single layer linear encoders they should converge to CCA and then could use GPU via that route (if you use full batch gradient descent ie minibatch size = dataset size)

@fipelle
Copy link
Author

fipelle commented Mar 4, 2022

Also worth saying if you use the Deep CCA methods with single layer linear encoders they should converge to CCA and then could use GPU via that route (if you use full batch gradient descent ie minibatch size = dataset size)

Thanks, I will start with that then - trying to build from the examples in the documentation.

I'd be curious which model is running slow.

I am trying to run NCCA with a very large dataset. I was hoping in a GPU implementation for the NN part in the same spirit of the one in cuml - I am not an expert in NCCA and I do not know it is feasible! :)

I thought about pre-processing the data with PCA first and then running NCCA, but it does not look very elegant given that I would have to employ two dimensionality reduction techniques.

@jameschapman19
Copy link
Owner

Ah interesting! Must admit my implementation of NCCA is definitely functional rather than optimized. If there's a faster nearest neighbour algo that can be imported I'd defo drop it in.

Just spotted that the sklearn implementation I've been using can take n_jobs which I don't utilise fully at the mo so that's an easy win

@jameschapman19
Copy link
Owner

@fipelle
Copy link
Author

fipelle commented Mar 4, 2022

Thanks! Take a look at the NN in https://docs.rapids.ai/api/cuml/stable/api.html. I think you should be able to import it right away given that it shares a lot of the scikit-learn syntax.

@fipelle
Copy link
Author

fipelle commented Mar 5, 2022

I am trying to see if it is sufficient to change ncca.py#5 from from sklearn.neighbors import NearestNeighbors to from cuml.neighbors import NearestNeighbors. It would also be nice to have access to the NN options described in https://docs.rapids.ai/api/cuml/stable/api.html#nearest-neighbors when defining NCCA.

EDIT

It seems to work.

@jameschapman19
Copy link
Owner

Just seen the edit! that's cool!

@beckernick
Copy link

Hi! I just came across this issue due to the cuML / RAPIDS mention.

I wanted to note that we've implemented input-to-output data type consistency for all cuML estimators (not just NearestNeighbors). This means that while training and prediction will be on the GPU, if you pass CPU data into the estimator it will return CPU data from estimator methods and be compatible with existing code that relies on CPU data.

We've seen folks build wrapper classes or just use a utility function like the following and wrap the instantiation of the estimators into branching statements:

def has_cuml():
    try:
        import cuml
        return True
    except ImportError:
        return False

Happy to help answer questions about cuML (or RAPIDS in general) if you have any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants