Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI] sparse_dense Op sparse_data input added #6889

Merged
merged 9 commits into from
Dec 15, 2020

Conversation

ANSHUMAN87
Copy link
Contributor

Change Summary:
Current sparse_dense Op in Topi support only weight as sparse input.
This PR add support for data also to be as sparse input.
So, in either case any one can be a sparse and other one will be dense.

This is a follow up PR from #6685

cc @tkonolige , @siju-samuel !

@ANSHUMAN87 ANSHUMAN87 force-pushed the sparse-32 branch 2 times, most recently from c3d8874 to 06f9911 Compare November 9, 2020 18:17
@tkonolige
Copy link
Contributor

I'm not sure that the correct course of action is to add a flag to sparse_dense to support AB^T with B sparse. This makes all the implementations of sparse_dense confusing because they now have to check this flag and use a compute/schedule depending on if it is enabled or not. I'd instead prefer to make a new op called dense_sparse that does AB^T with B sparse.

Alternatively, I don't really see a reason for supporting AB^T with B sparse directly. Instead, when we convert from a frontend to tvm, we can just insert the correct transposes. In a lot of ways this is a better choice because we do not need to reimplement the same operators and the operators prefer the data to be in this format. I think this is the best choice.

@ANSHUMAN87
Copy link
Contributor Author

ANSHUMAN87 commented Nov 9, 2020

I'm not sure that the correct course of action is to add a flag to sparse_dense to support AB^T with B sparse. This makes all the implementations of sparse_dense confusing because they now have to check this flag and use a compute/schedule depending on if it is enabled or not. I'd instead prefer to make a new op called dense_sparse that does AB^T with B sparse.

Alternatively, I don't really see a reason for supporting AB^T with B sparse directly. Instead, when we convert from a frontend to tvm, we can just insert the correct transposes. In a lot of ways this is a better choice because we do not need to reimplement the same operators and the operators prefer the data to be in this format. I think this is the best choice.

Thanks @tkonolige for response. I also had similar dilemma at the front. But later i resolved to re-utilize the existing Op, in order to enable reuse of small portion of code and it bind the concept of Op together too. But i am in favor of 'dense_sparse' Op as well(But that would be the name for existing Op i think 🙂 ). However i think we do not have much impact on the schedule side, in between these 2 flavor of Ops.

But the utilizing Op with transpose, i felt we are adding additional overhead every-time may not be much in smaller matrix.
Please let me know your thoughts on this. Thanks!

@tkonolige
Copy link
Contributor

I think it is still possible to reuse code with a separate op, but maybe its to a lesser extent.

I'm not sure how much overhead the transposes will add to the code. We don't have to pay the cost of transposing the sparse matrix because that can be done at compile time (the sparse matrix is usually a weight). Could you maybe do some benchmarking and see if the duplication of code is worth it from a performance standpoint?

@ANSHUMAN87
Copy link
Contributor Author

Sure I will check on how much overhead added in case of transpose with existing Op case.

@ANSHUMAN87
Copy link
Contributor Author

Hi @tkonolige , below is the benchmark data i have obtained for 4 different input dimensions.

NOTE: Here with Transpose means using existing sparse_dense Op with additional Transpose layer.
Without Transpose means using the new flavor of the Op implemented in the current PR.
And all the data is collected with iteration=3000 and repeat=3 .

Case 1{Sparse_input = [1024, 512], dense_input=[512, 4096]}:
image

Case 2{Sparse_input = [1024, 512], dense_input=[512, 1024]}:
image

Case 3{Sparse_input = [2048, 512], dense_input=[512, 2048]}:
image

Case 4{Sparse_input = [4096, 512], dense_input=[512, 4096]}:
image

Clearly as the dimension increases the amount of improvement is significant. So i think we should keep the new flavor of the Op implemented in current PR.

@tkonolige
Copy link
Contributor

Just to check, you're only transposing the dense matrix? Also, what is the density of the sparse matrix?

I'm curious, could you do a benchmark with a smaller batch size (dense_input=[16, 1024])? And could you also test on cuda? And if you have time, could you give a breakdown of time spent in the matrix multiply vs time spent in the transpose?

From these numbers, it does look like it is worthwhile to do this. It just means a lot more sparse kernels to write.

@ANSHUMAN87
Copy link
Contributor Author

ANSHUMAN87 commented Nov 13, 2020

Just to check, you're only transposing the dense matrix? Also, what is the density of the sparse matrix?

I'm curious, could you do a benchmark with a smaller batch size (dense_input=[16, 1024])? And could you also test on cuda? And if you have time, could you give a breakdown of time spent in the matrix multiply vs time spent in the transpose?

From these numbers, it does look like it is worthwhile to do this. It just means a lot more sparse kernels to write.

Yes , the additional transpose is done on final output from sparse_dense Op which is a dense tensor.
Cuda scheduling has issues currently which i am working on, maybe next PR i will handle it.
However i think cuda benchmark is not needed to conclude the benifit of new Op added in this PR.

A lower dimension output without transpose{Sparse_input = [4096, 1024], dense_input=[1024, 16]}::
image

With transpose:
image

With this i think we can start the code review.

@ANSHUMAN87
Copy link
Contributor Author

Gentle ping @tkonolige !!!
I am not too sure who else from TVM official reviewer or committer interested in sparse. If you are aware of anyone please feel free to invite. TIA!

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sold on this approach. But maybe someone else can chime in. If we do go with the sparse_data flag, I think we should rename it to something like sparse_rhs or sparse_b. Or we could do the same thing as tf and have adjoint_a and adjoint_b.

src/relay/op/nn/sparse.cc Outdated Show resolved Hide resolved
python/tvm/topi/nn/sparse.py Outdated Show resolved Hide resolved
python/tvm/topi/nn/sparse.py Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
@ANSHUMAN87
Copy link
Contributor Author

But maybe someone else can chime in.

Thanks @tkonolige for your feedback. I believe the performance stats are quite clear to opt for a new Op in the case.
However if you have any kind of concern, you can always let me know either here or offline too. We can discuss more in detail to reach to a more convincing point.

@tqchen : Tristan is helping here with his valuable review efforts. But i think we need a third opinion here (possibly an official reviewer or committer) to help proceed with the PR to next level. As i am not very sure who might be interested in Sparse changes, so if you can help tag few people here. TIA!

@tkonolige
Copy link
Contributor

Sorry, I meant that I'm not sure about the using flags approach vs having a separate operator for sparse x dense vs dense x sparse. From your benchmarks, it does look like we need some way of doing dense x sparse directly.

@ANSHUMAN87
Copy link
Contributor Author

@tqchen : Tristan is helping here with his valuable review efforts. But i think we need a third opinion here (possibly an official reviewer or committer) to help proceed with the PR to next level. As i am not very sure who might be interested in Sparse changes, so if you can help tag few people here. TIA!

Gentle ping @tqchen !

python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
python/tvm/topi/nn/sparse.py Outdated Show resolved Hide resolved
python/tvm/topi/nn/sparse.py Outdated Show resolved Hide resolved
python/tvm/topi/nn/sparse.py Outdated Show resolved Hide resolved
python/tvm/topi/nn/sparse.py Outdated Show resolved Hide resolved
@ANSHUMAN87
Copy link
Contributor Author

@tkonolige : All your comments are addressed. Would you please check and approve. So that we can proceed with this PR!

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I've left a couple comments. I'm approving, but I'm not a committer, so you'll need someone else to approve too.

Could you also add a test to tests/python/topi/python/test_topi_sparse.py with sparse_lhs=True.

python/tvm/relay/op/nn/nn.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/sparse.py Outdated Show resolved Hide resolved
@ANSHUMAN87
Copy link
Contributor Author

tests/python/topi/python/test_topi_sparse.py

Sorry for such delayed response. Now i have addressed your remaining comments too!

@ANSHUMAN87
Copy link
Contributor Author

ANSHUMAN87 commented Dec 8, 2020

@tqchen , @jroesch , @FrozenGene, @junrushao1994 : Would you please help proceed with the PR. TIA!

@FrozenGene
Copy link
Member

@tqchen , @jroesch , @FrozenGene, @junrushao1994 : Would you please help proceed with the PR. TIA!

Just see your mention and sorry for later reply. Will do one round of review tomorrow.

Copy link
Member

@FrozenGene FrozenGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally lgtm. @antinucleon could you help to have one round of review? as you have done some work of sparse

include/tvm/relay/attrs/nn.h Show resolved Hide resolved
@ANSHUMAN87
Copy link
Contributor Author

Gentle ping @antinucleon , @FrozenGene , @tqchen , @jroesch !
Unless if any further comments, I think we can proceed with the merge, as we have greens from @tkonolige , @FrozenGene !
I have lot to improvise post this merge which are partially dependent on this PR!

@FrozenGene FrozenGene merged commit 862655b into apache:main Dec 15, 2020
@FrozenGene
Copy link
Member

@ANSHUMAN87 please go ahead

@ANSHUMAN87
Copy link
Contributor Author

@ANSHUMAN87 please go ahead

Thanks a lot @FrozenGene, @tkonolige!

TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Jan 20, 2021
* [TOPI] sparse_dense op sparse_data input added

* [1] clang issue resolved

* [2] python format resolved

* [3] lint error resolved

* [4] Review comments handled

* [5] Lint error resolved

* [6] Review comments handled

* [7] Review comments handled

* [8] Review comments handled
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021
* [TOPI] sparse_dense op sparse_data input added

* [1] clang issue resolved

* [2] python format resolved

* [3] lint error resolved

* [4] Review comments handled

* [5] Lint error resolved

* [6] Review comments handled

* [7] Review comments handled

* [8] Review comments handled
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
* [TOPI] sparse_dense op sparse_data input added

* [1] clang issue resolved

* [2] python format resolved

* [3] lint error resolved

* [4] Review comments handled

* [5] Lint error resolved

* [6] Review comments handled

* [7] Review comments handled

* [8] Review comments handled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants