[Frontend][TFLite] Densify Op added #7048

ANSHUMAN87 · 2020-12-07T16:47:47Z

Densify Op performs sparse to dense transformation for sparse weights based on their sparse parameters provided.
This Op is needed for sparse ConvNet models.

ANSHUMAN87 · 2020-12-08T14:50:42Z

cc @tkonolige , @u99127 !

tkonolige

Thanks @ANSHUMAN87. Overall looks pretty good, but I am a little confused at exactly what is happening is this pass. Is it converting matrices from sparse to dense at compile time?

python/tvm/relay/frontend/tflite.py

ANSHUMAN87 · 2020-12-09T15:41:30Z

Thanks @ANSHUMAN87. Overall looks pretty good, but I am a little confused at exactly what is happening is this pass. Is it converting matrices from sparse to dense at compile time?

Thanks @tkonolige for review! The Densify Op does not need to be present in TVM runtime currently, because its purpose is to convert sparse weights(constant) to dense form. So i have optimized it out. However in case it is required in future to be present in TVM runtime, we can always do it with sparse_to_dense Op with the indices produced from the utility prepare_dense_matrix_from_sparse().

As to your other comments i have tried to add comments in the necessary places. Please have a look. In case anything more required. Please let me know. TIA!

tkonolige

Thanks for adding all the comments. It looks good to me.

ANSHUMAN87 · 2020-12-10T05:35:03Z

cc @FrozenGene !

u99127 · 2020-12-10T13:38:27Z

Thanks @ANSHUMAN87. Overall looks pretty good, but I am a little confused at exactly what is happening is this pass. Is it converting matrices from sparse to dense at compile time?

Thanks @tkonolige for review! The Densify Op does not need to be present in TVM runtime currently, because its purpose is to convert sparse weights(constant) to dense form. So i have optimized it out. However in case it is required in future to be present in TVM runtime, we can always do it with sparse_to_dense Op with the indices produced from the utility prepare_dense_matrix_from_sparse().

Sorry about the delay in reviewing .

I think there are a couple of design points here that we should consider and take a well-informed decision on this direction.

The advantage of keeping the sparse weights representation all the way through is something that helps with keeping the weights size low in the final binary generated which could help with static size / footprint in the final binary at the expense of some runtime (obviously). For usecases where we go through this with BYOC or others where there are weight compression techniques, this should clear up reasonably ok, however where we fall back to traditional tvm compilation on CPUs and GPUs there's no weight compression techniques and thus this contributes to rodata bloat or flash bloat.
Even if it doesn't make sense to keep this in the final representation to be expanded at the runtime either in the form of an op for the Relay VM or a new op in the graph runtime - is this operator something that appears in other frameworks and if so does it make sense to keep this behaviour common across the stack to allow us to reuse this logic for other frontends as well i.e. the various frontends lower to a relay.op.densify () and the result of that is something that represents a dense weight tensor. Further legalization could end up creating this dense weight tensor ? One of the principles in compiler design is to try and bind things as late as possible to give us a chance of optimizing and being able to recover easily without too much reverse engineering. So this comes back to that as well.

TBH I would probably prefer #2 as an approach to try and reduce any duplications in the frontends and give other compilers or others who use Relay a chance to get to the densify op. I don't think we can use it for any of our work instantly. If the performance or code size becomes an issue we'll need to reconsider a runtime or partial runtime representation of sparse weights at which point #1 would be a stepping stone towards it .

These are my thought on the design.

regards
Ramana

As to your other comments i have tried to add comments in the necessary places. Please have a look. In case anything more required. Please let me know. TIA!

ANSHUMAN87 · 2020-12-10T15:57:33Z

Sorry about the delay in reviewing .

I think there are a couple of design points here that we should consider and take a well-informed decision on this direction.
1. The advantage of keeping the sparse weights representation all the way through is something that helps with keeping the weights size low in the final binary generated which could help with static size / footprint in the final binary at the expense of some runtime (obviously). For usecases where we go through this with BYOC or others where there are weight compression techniques, this should clear up reasonably ok, however where we fall back to traditional tvm compilation on CPUs and GPUs there's no weight compression techniques and thus this contributes to rodata bloat or flash bloat.

2. Even if it doesn't make sense to keep this in the final representation to be expanded at the runtime either in the form of an op for the Relay VM or a new op in the graph runtime - is this operator something that appears in other frameworks and if so does it make sense to keep this behaviour common across the stack to allow us to reuse this logic for other frontends as well i.e. the various frontends lower to a relay.op.densify () and the result of that is something that represents a dense weight tensor. Further legalization could end up creating this dense weight tensor ? One of the principles in compiler design is to try and bind things as late as possible to give us a chance of optimizing and being able to recover easily without too much reverse engineering. So this comes back to that as well.
TBH I would probably prefer #2 as an approach to try and reduce any duplications in the frontends and give other compilers or others who use Relay a chance to get to the densify op. I don't think we can use it for any of our work instantly. If the performance or code size becomes an issue we'll need to reconsider a runtime or partial runtime representation of sparse weights at which point #1 would be a stepping stone towards it .

These are my thought on the design.

regards
Ramana

Thanks @u99127 for your inputs! Most of your arguments i concur.
I think we have 2 possible approaches to bring Sparse Convnet models onto TVM:
1:> Keep the sparse weight and convert to dense in runtime, as Sparse inference is not complete in TVM currently.
2:> Optimize out the Sparse weights, by transforming to Dense while on-boarding to TVM.

When i went through first approach, i found the performance is too bad and Sparse_to_Dense Op resulted in stack corruption when fed with higher dimension inputs, hence i switched to second approach, which makes sense if we think from TFLite perspective. Because TFLite framework, optmizes out Densify Op when it comes to Sparse Inference. As my future work will include the Sparse Inference as well, i think at this point it will suffice to keep densified weight in Runtime.

However as i mentioned earlier, if we want to keep in TVM runtime, current PR changes supports that as well, with the steps i mentioned in my previous comment. Provided we first fix the Sparse_to_Dense limitations(which i am looking into right now).
I think keeping a Densify Op is not a good option as it will be duplicate of Sparse_to_Dense Op.

May be we can do this Op conversion with a configurable user option in Frontend.
Now by default we optimize out this Op and keep the dense weight in Runtime.
When i am ready with an appropriate solution for Sparse_to_Dense Op, we can make that as default and keep the first approach to user choice, so that user can chose between performance or compression.
Let me know your thoughts on this. TIA!

ANSHUMAN87 · 2020-12-23T15:50:14Z

I have added TODO to use sparse_to_dense Op as well to enable user to have Sparse weights in store rather than Dense weight when board onto TVM.

I believe this PR is ready to be merged. Unless any further comment.
@zhiics , @FrozenGene : Would you please help proceed with the PR. TIA!

python/tvm/relay/frontend/tflite.py

ANSHUMAN87 · 2021-01-13T01:05:13Z

Gentle ping @zhiics , @FrozenGene !

ANSHUMAN87 · 2021-01-13T15:28:34Z

Thanks @FrozenGene, @tkonolige!

* [Frontend][TFLite] Densify Op added * [1] Review comments handled * TODO added for sparse_to_dense Op usage * stale comments removed

tkonolige reviewed Dec 8, 2020

View reviewed changes

python/tvm/relay/frontend/tflite.py Show resolved Hide resolved

python/tvm/relay/frontend/tflite.py Show resolved Hide resolved

tkonolige approved these changes Dec 9, 2020

View reviewed changes

ANSHUMAN TRIPATHY added 2 commits December 22, 2020 12:16

[Frontend][TFLite] Densify Op added

38e6945

[1] Review comments handled

a5f1c68

ANSHUMAN87 force-pushed the sparse-33 branch from 573077b to a5f1c68 Compare December 22, 2020 06:46

TODO added for sparse_to_dense Op usage

1589dab

tkonolige approved these changes Dec 23, 2020

View reviewed changes

python/tvm/relay/frontend/tflite.py Outdated Show resolved Hide resolved

stale comments removed

2d30bd4

FrozenGene approved these changes Jan 13, 2021

View reviewed changes

FrozenGene merged commit 006b9b5 into apache:main Jan 13, 2021

masahi pushed a commit to masahi/tvm that referenced this pull request Jan 14, 2021

[Frontend][TFLite] Densify Op added (apache#7048)

5140d73

* [Frontend][TFLite] Densify Op added * [1] Review comments handled * TODO added for sparse_to_dense Op usage * stale comments removed

masahi pushed a commit to masahi/tvm that referenced this pull request Jan 18, 2021

[Frontend][TFLite] Densify Op added (apache#7048)

7787eca

* [Frontend][TFLite] Densify Op added * [1] Review comments handled * TODO added for sparse_to_dense Op usage * stale comments removed

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021

[Frontend][TFLite] Densify Op added (apache#7048)

eb89f3a

* [Frontend][TFLite] Densify Op added * [1] Review comments handled * TODO added for sparse_to_dense Op usage * stale comments removed

electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021

[Frontend][TFLite] Densify Op added (apache#7048)

141958b

* [Frontend][TFLite] Densify Op added * [1] Review comments handled * TODO added for sparse_to_dense Op usage * stale comments removed

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

This was referenced Apr 22, 2022

[Frontend][TFLite] update logical_binary using new function blackkker/tvm#1

Closed

[Frontend][TFLite] update logical_binary using new function #11100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend][TFLite] Densify Op added #7048

[Frontend][TFLite] Densify Op added #7048

ANSHUMAN87 commented Dec 7, 2020

ANSHUMAN87 commented Dec 8, 2020

tkonolige left a comment

ANSHUMAN87 commented Dec 9, 2020

tkonolige left a comment

ANSHUMAN87 commented Dec 10, 2020

u99127 commented Dec 10, 2020

ANSHUMAN87 commented Dec 10, 2020 •

edited

Loading

ANSHUMAN87 commented Dec 23, 2020

ANSHUMAN87 commented Jan 13, 2021

ANSHUMAN87 commented Jan 13, 2021

[Frontend][TFLite] Densify Op added #7048

[Frontend][TFLite] Densify Op added #7048

Conversation

ANSHUMAN87 commented Dec 7, 2020

ANSHUMAN87 commented Dec 8, 2020

tkonolige left a comment

Choose a reason for hiding this comment

ANSHUMAN87 commented Dec 9, 2020

tkonolige left a comment

Choose a reason for hiding this comment

ANSHUMAN87 commented Dec 10, 2020

u99127 commented Dec 10, 2020

ANSHUMAN87 commented Dec 10, 2020 • edited Loading

ANSHUMAN87 commented Dec 23, 2020

ANSHUMAN87 commented Jan 13, 2021

ANSHUMAN87 commented Jan 13, 2021

ANSHUMAN87 commented Dec 10, 2020 •

edited

Loading