Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conv1d_transpose speedup #6840

Merged
merged 1 commit into from
Nov 7, 2020
Merged

Conversation

alexgl-github
Copy link
Contributor

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

@alexgl-github
Copy link
Contributor Author

alexgl-github commented Nov 3, 2020

Speedup transposed 1d convolution by eliminating unnecessary multiplications by data values which contain zeroes, for stride greater then 1.

Below are current vs proposed latency numbers for various transposed conv1d parameters:
`
latency current=0.06837 sec new=0.00049 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=128 padding=256
latency current=0.53093 sec new=0.10341 sec channels_out= 257 input_shape=(1, 257, 128) kernel_size=512 strides=4 padding=256
latency current=0.00292 sec new=0.00307 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=1 padding=256
latency current=0.00474 sec new=0.00171 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=2 padding=256
latency current=0.00955 sec new=0.00056 sec channels_out= 1 input_shape=(1, 257, 128) kernel_size=512 strides=16 padding current=256
latency current=0.00054 sec new=0.00023 sec channels_out= 1 input_shape=(1, 1, 16384) kernel_size=512 strides=2 padding=256
latency current=0.00385 sec new=0.00089 sec channels_out= 4 input_shape=(1, 1, 16384) kernel_size=512 strides=4 padding=256
latency current=0.00013 sec new=0.00004 sec channels_out= 1 input_shape=(1, 1, 1024) kernel_size=512 strides=5 padding=256
latency current=0.00002 sec new=0.00002 sec channels_out= 32 input_shape=(1, 3, 224) kernel_size=5 strides=1 padding=0
latency current=0.00004 sec new=0.00003 sec channels_out= 32 input_shape=(1, 3, 224) kernel_size=5 strides=2 padding=0
latency current=0.00006 sec new=0.00003 sec channels_out= 128 input_shape=(1, 32, 32) kernel_size=5 strides=2 padding=0

`

@alexgl-github alexgl-github force-pushed the conv1d_transpose branch 2 times, most recently from f77c5e4 to 0777e36 Compare November 3, 2020 22:51
Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

@anijain2305
Copy link
Contributor

anijain2305 commented Nov 4, 2020

@vinx13 Can you PTAL for CUDA stuff?

tests/python/topi/python/test_topi_conv1d_transpose_ncw.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/conv1d_transpose_ncw.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/conv1d_transpose_ncw.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/conv1d_transpose_ncw.py Outdated Show resolved Hide resolved
@alexgl-github alexgl-github force-pushed the conv1d_transpose branch 3 times, most recently from d93ece9 to 86bbc54 Compare November 4, 2020 19:41
Improve performance of transposed convolution by avoiding
redundant multiplication by zero values from dilated data.
Copy link
Contributor

@giuseros giuseros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments @alexgl-github ! LGTM

@vinx13 vinx13 merged commit f0979e4 into apache:main Nov 7, 2020
@vinx13
Copy link
Member

vinx13 commented Nov 7, 2020

Thanks @alexgl-github @anijain2305 @giuseros

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
Improve performance of transposed convolution by avoiding
redundant multiplication by zero values from dilated data.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-74-104.ec2.internal>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
Improve performance of transposed convolution by avoiding
redundant multiplication by zero values from dilated data.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-74-104.ec2.internal>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
Improve performance of transposed convolution by avoiding
redundant multiplication by zero values from dilated data.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-74-104.ec2.internal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants