Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot PCA of transpose of matrix #496

Closed
vivekbhr opened this issue Mar 24, 2017 · 9 comments
Closed

plot PCA of transpose of matrix #496

vivekbhr opened this issue Mar 24, 2017 · 9 comments
Assignees

Comments

@vivekbhr
Copy link
Member

vivekbhr commented Mar 24, 2017

plotPCA issue (not projecting the samples properly, #477 ) is fixed in R by using a transpose of matrix (scaling/centering is not required). But matplotlib.mlab.PCA doesn't accept a transposed matrix. We need to fix this issue.

@vivekbhr vivekbhr added this to the 2.5.0 milestone Mar 24, 2017
@dpryan79
Copy link
Collaborator

@vivekbhr Do you still have the numpy file for this?

@vivekbhr
Copy link
Member Author

@dpryan79 yes.. can share with you ..

@dpryan79
Copy link
Collaborator

It looks like the following works (m is a numpy matrix with nrows > ncols):

U, s, V = np.linalg.svd(m.T, full_matrices=False)
return np.dot(m.T, V.T)

That's among what prcomp() is doing internally from the best I can tell.

@dpryan79
Copy link
Collaborator

I'm not sure SVD is really returning equivalent results to what R is doing in this case. I tried the above code on a play dataset and got reasonable results, but I only got nonsense on real data. This may well turn into a "can't implement without rewriting parts of numpy". I'll remove this from the 2.5 milestone, since I don't think it'll happen for that.

@dpryan79 dpryan79 removed this from the 2.5.0 milestone Mar 29, 2017
@fidelram
Copy link
Collaborator

fidelram commented Mar 29, 2017 via email

@dpryan79
Copy link
Collaborator

I tried sklearn briefly but didn't get much better results.

@dpryan79 dpryan79 added this to the 2.6.0 milestone May 4, 2017
@dpryan79
Copy link
Collaborator

Here's the python code that seems to work correctly (m is a matrix with nrows > ncols):

m2 = (m.T - np.mean(m, axis=1))
U, s, V = np.linalg.svd(m2, full_matrices=False, compute_uv=True)
V = V.T
PCs = np.dot(m2, V)

Each column of PCs is a principal component, with rows as samples. This matches with what R is doing.

dpryan79 added a commit that referenced this issue Jul 21, 2017
@dpryan79
Copy link
Collaborator

@vivekbhr There's now a betterPCA branch, which adds the --transpose, --ntop, and --PCs options. --transpose will produce the PCA on the transposed matrix. As in R, the projection of the samples on the PCs is then plotted rather than the weights/loadings. --ntop specifies how many of the top N most variable rows to use for the PCA (again, exactly as in R). --PCs specifies which components to plot. The default is 1 2, but you can specify whichever components you want. This should then produce exactly the same output as you'd get with prcomp(foo, scale=T, center=T) in R.

@dpryan79 dpryan79 self-assigned this Jul 21, 2017
@dpryan79
Copy link
Collaborator

I seem to now be getting the same results as prcomp, which makes me happy. Even the scaling that matplotlib was doing was suboptimal (it wasn't using Bessel's correction). This is all implemented in the develop branch now and will be included in the 2.6 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants