Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Autoscheduler][Sparse] Add sparse dense end to end model tuning support for x86/arm cpu & Some bug fix #7635

Merged
merged 27 commits into from
Mar 30, 2021

Conversation

jcf94
Copy link
Contributor

@jcf94 jcf94 commented Mar 11, 2021

#7313 has introduced the tuning support for Sparse, now this PR brings the end to end model tuning support.

cc @merrymercy @comaniac @FrozenGene @yuchaoli

This PR also contains some bug fix:

  • Add a separate op strange for conv2d_nhwc, conv2d_nhwc.winograd and depthwise_conv2d_nhwc in ARM CPU that AutoScheduler can perform better
  • Fix a dense bug in TFLite frontend, which will broken the LayoutRewrite process of AutoScheduler

@jcf94 jcf94 changed the title [Autoscheduler] Add sparse dense end to end model tuning support [Autoscheduler][Sparse] Add sparse dense end to end model tuning support Mar 11, 2021
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@merrymercy
Copy link
Member

cc @antinucleon

@merrymercy
Copy link
Member

Do you have any performance numbers or comparisons against existing manual schedules?

@jcf94
Copy link
Contributor Author

jcf94 commented Mar 12, 2021

Do you have any performance numbers or comparisons against existing manual schedules?

This PR currently just enables the ability of tuning E2E Sparse network, makes it from 0 to 1.

My colleague @yuchaoli has some results on ARM mobile phone, but currently there seems to have some problem with the TVM main branch that ARM cannot get best performance as we expected. I'll spend some time to fix them.
Later we can add ARM results to TLCBench or write a blog for this.

@jcf94 jcf94 changed the title [Autoscheduler][Sparse] Add sparse dense end to end model tuning support [Autoscheduler][Sparse] Add sparse dense end to end model tuning support for x86 & arm cpu Mar 12, 2021
@@ -470,6 +470,38 @@ def _traverse(t):
return sparse_input_map


def random_bsr_matrix(m, n, bs_r, bs_c, density, dtype):
Copy link
Contributor

@ANSHUMAN87 ANSHUMAN87 Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should not be part of Topi. Either you can put where it is used or I testing.

Copy link
Contributor Author

@jcf94 jcf94 Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should not be part of Topi. Either you can put where it is used or I testing.

Fine, just I'm finding that this has been used in many different places. I'll try to find a better postion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to topi/sparse/utils.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for such delayed response, i missed your reply somehow. What my suggestion is, random_bsr_matrix() does not qualify to be in Topi unless it is required by some Ops. What i could see it is just utility for Tutorial, so lets keep this utility func in Tutorial file itself. Otherwise we have one more option, we can put it as part of tvm.testing which can help other tutorials and testcases as well.

@jcf94 jcf94 changed the title [Autoscheduler][Sparse] Add sparse dense end to end model tuning support for x86 & arm cpu [Autoscheduler][Sparse] Add sparse dense end to end model tuning support for x86/arm cpu & Some bug fix Mar 12, 2021
python/tvm/relay/op/strategy/arm_cpu.py Outdated Show resolved Hide resolved
python/tvm/relay/op/strategy/x86.py Outdated Show resolved Hide resolved
1 - sparsity,
)
register_task_input_buffer(
"default", prefix + "W_data", tvm.runtime.ndarray.array(sparse_weight.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not hard-code it, we can use the {name + ".data", name + ".indices", name + ".indptr"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we cannot get the "name" during measuring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for clarification. But i just wonder if name is not available than, how the logic above prefix is working (i mean the line number 98). Its in the same flow right ? Please let me know in case i am mistaken.

@ANSHUMAN87
Copy link
Contributor

#7313 has introduced the tuning support for Sparse, now this PR brings the end to end model tuning support.

cc @merrymercy @comaniac @FrozenGene @yuchaoli

This PR also contains some bug fix:

* Add a separate op strange for conv2d_nhwc, conv2d_nhwc.winograd and depthwise_conv2d_nhwc in ARM CPU that AutoScheduler can perform better

* Fix a dense bug in TFLite frontend, which will broken the LayoutRewrite process of AutoScheduler

I was just wondering whether we can divide this PR into 2, one with bug fix for conv2d and other one with sparse_dense auto scheduler & TFLite bug fix. These 2 are quite unrelated, may not be good to go in 1 PR.

@jcf94
Copy link
Contributor Author

jcf94 commented Mar 15, 2021

#7313 has introduced the tuning support for Sparse, now this PR brings the end to end model tuning support.
cc @merrymercy @comaniac @FrozenGene @yuchaoli
This PR also contains some bug fix:

* Add a separate op strange for conv2d_nhwc, conv2d_nhwc.winograd and depthwise_conv2d_nhwc in ARM CPU that AutoScheduler can perform better

* Fix a dense bug in TFLite frontend, which will broken the LayoutRewrite process of AutoScheduler

I was just wondering whether we can divide this PR into 2, one with bug fix for conv2d and other one with sparse_dense auto scheduler & TFLite bug fix. These 2 are quite unrelated, may not be good to go in 1 PR.

Yeah, I agree.

@@ -1872,7 +1872,7 @@ def convert_fully_connected(self, op):
out_dtype="int32",
)
else:
out = _op.nn.dense(in_expr, weight_expr)
out = _op.nn.dense(in_expr, weight_expr, units=weight_shape[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion, if possible, can we add a test case which will indicate the issue fixed here. So that it will help in future breaks.

Comment on lines 139 to 171
if use_sparse:
# This is a test workload that manually transforms a dense model to sparse
# Check `tutorials/frontend/deploy_sparse.py` for more examples on how to import a
# pretrained model.

def random_sparse_dense_params(func, params, density, BS_R, BS_C):
def deepcopy(param_dic):
ret = {}
for k, v in param_dic.items():
ret[k] = tvm.nd.array(v.asnumpy())
return ret

new_params = deepcopy(params)
dense_weight_names = relay.analysis.sparse_dense._search_dense_op_weight(func)
for item in dense_weight_names:
name = str(item)
shape = new_params[name].shape
if shape[0] % BS_R == 0 and shape[1] % BS_C == 0:
new_w = random_bsr_matrix(
shape[0], shape[1], BS_R, BS_C, density, "float32"
).todense()
new_params[name] = tvm.nd.array(new_w)
return new_params

bs_r = 1
sparsity = 0.85

# Currently we only support to conver dense matmul to sparse dense matmul
mod, params = ddo.simplify_fc_transpose.convert(mod["main"], params)
params = random_sparse_dense_params(mod, params, BS_R=bs_r, BS_C=1, density=1 - sparsity)
mod, params = ddo.bsr_dense.convert(mod, params, (bs_r, 1), sparsity_threshold=0.8)

mod = tvm.IRModule.from_expr(mod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap this as a function. Do not let this huge block of code confuse readers who only want to know how to use auto-scheduler for regular networks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap this as a function. Do not let this huge block of code confuse readers who only want to know how to use auto-scheduler for regular networks.

Simplified this part, and moved the big block to sparse.utils.

@jcf94
Copy link
Contributor Author

jcf94 commented Mar 26, 2021

@merrymercy @FrozenGene Currently removed the modifications about ARM CPU from this PR.
Will add them in a seperate PR.

@merrymercy
Copy link
Member

merrymercy commented Mar 26, 2021

I only have one comment on the tutorial. We should make the tutorials more readable and modifiable for new users. Other parts look good to me.

@jcf94 jcf94 dismissed FrozenGene’s stale review March 30, 2021 09:53

Dismiss for the fix about ARM will be in another PR.

@jcf94 jcf94 merged commit 612f6ce into apache:main Mar 30, 2021
@jcf94 jcf94 deleted the sparse_model branch March 30, 2021 09:54
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
…ort for x86/arm cpu & Some bug fix (apache#7635)

* Add sparse dense end to end model tuning support

* Add sparse tuning for arm network

* Bug fix for tflite frontend dense with layout rewrite

* Move the random_bsr_matrix to sparse.utils
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
…ort for x86/arm cpu & Some bug fix (apache#7635)

* Add sparse dense end to end model tuning support

* Add sparse tuning for arm network

* Bug fix for tflite frontend dense with layout rewrite

* Move the random_bsr_matrix to sparse.utils
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
…ort for x86/arm cpu & Some bug fix (apache#7635)

* Add sparse dense end to end model tuning support

* Add sparse tuning for arm network

* Bug fix for tflite frontend dense with layout rewrite

* Move the random_bsr_matrix to sparse.utils
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
…ort for x86/arm cpu & Some bug fix (apache#7635)

* Add sparse dense end to end model tuning support

* Add sparse tuning for arm network

* Bug fix for tflite frontend dense with layout rewrite

* Move the random_bsr_matrix to sparse.utils
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[arm cpu][auto scheduler] Auto scheduler @ arm cpu can not reproduce paper's performance
5 participants