Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Topi][Cuda]Optimizations of global_ave_pool for NHWC layout #5450

Merged
merged 2 commits into from
Apr 28, 2020

Conversation

SXM-inspur
Copy link
Contributor

@SXM-inspur SXM-inspur commented Apr 27, 2020

The runtime of global_ave_pool took about 14.8% in Resnet50_v2 with batchsize of 32, when Tensor Core is enabled on Tesla T4 GPU. The runtime decreased to 0.134%, after optimizations in this PR were made for NHWC layout. The results of unit tests are listed below, and the latency is reported with unit of ms. As we can see from the table, great performance improvements have been achieved.

batch original After optimization speedup
16 1.16 0.03 38.67
32 1.17 0.06 19.5
256 1.65 0.52 3.17
Table 1. Shape of input feature maps is batchx7x7x2048.

@Hzfengsy @Laurawly @vinx13 @jwfromm Please help to review

@SXM-inspur SXM-inspur changed the title Optimizations of global_ave_pool for NHWC layout [Topi][Cuda]Optimizations of global_ave_pool for NHWC layout Apr 27, 2020
Copy link
Contributor

@jwfromm jwfromm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vinx13 vinx13 merged commit 0a1e160 into apache:master Apr 28, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 9, 2020
…5450)

* Optimizations of global_ave_pool for NHWC layout

* Optimize the code format to pass inspection of pylint

Co-authored-by: Shawn-Inspur <wushaohua@inspur.com>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 18, 2020
…5450)

* Optimizations of global_ave_pool for NHWC layout

* Optimize the code format to pass inspection of pylint

Co-authored-by: Shawn-Inspur <wushaohua@inspur.com>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 18, 2020
…5450)

* Optimizations of global_ave_pool for NHWC layout

* Optimize the code format to pass inspection of pylint

Co-authored-by: Shawn-Inspur <wushaohua@inspur.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants