Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[topi] add ARM v8.2 udot (uint8) support #3978

Merged
merged 8 commits into from
Oct 1, 2019
Merged

Conversation

yzhliu
Copy link
Member

@yzhliu yzhliu commented Sep 20, 2019

Add uint8 intrinsic for ARM. Currently it is udot.v2i32.v8i8 which may have too small lanes. will add more later

@anijain2305 @zhiics @vinx13 @ZihengJiang

@yzhliu yzhliu changed the title Armint8 [topi] add ARM v8.2 udot (uint8) support Sep 20, 2019
Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minors, other LGTM

topi/python/topi/arm_cpu/tensor_intrin.py Show resolved Hide resolved
topi/python/topi/generic/conv2d.py Show resolved Hide resolved
topi/python/topi/generic/conv2d.py Show resolved Hide resolved
@anijain2305
Copy link
Contributor

Before merging, it would be good if we can try 2 more optimizations

  • Currently, udot seems to be little slow (~1x speedup). Reasoning can be that we are not fully utilizing the fused accumulation. We should look at the assembly to double-check that.
  • Please try udot.v4i32.v16i8, that should quadruple the throughput compared to FP32.

@yzhliu
Copy link
Member Author

yzhliu commented Sep 28, 2019

@anijain2305 @zhiics please review again.

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comments.

@anijain2305
Copy link
Contributor

It will be good if we can share the performance speedup results.

@yzhliu
Copy link
Member Author

yzhliu commented Sep 30, 2019

@tqchen could you check the ci instance? it shows "no space left"

@yzhliu
Copy link
Member Author

yzhliu commented Sep 30, 2019

@anijain2305 The avg speedup is ~2.1x compared to fp32

@tqchen
Copy link
Member

tqchen commented Oct 1, 2019

ci issue fixed

@yzhliu
Copy link
Member Author

yzhliu commented Oct 1, 2019

Thanks @anijain2305 @zhiics @tqchen

@yzhliu yzhliu merged commit 5cc1764 into apache:master Oct 1, 2019
anijain2305 pushed a commit to anijain2305/tvm that referenced this pull request Oct 17, 2019
* [topi] add ARM v8.2 udot (uint8) support

* fix test case

* fix common conv2d schedule

* add back fp32_time in test

* fix lint

* fix doc, add support for int32_lanes=4, signed int

* fix lint

* add ic_bn % 4 checker in schedule
wweic pushed a commit to neo-ai/tvm that referenced this pull request Oct 18, 2019
* [topi] add ARM v8.2 udot (uint8) support

* fix test case

* fix common conv2d schedule

* add back fp32_time in test

* fix lint

* fix doc, add support for int32_lanes=4, signed int

* fix lint

* add ic_bn % 4 checker in schedule
petrex added a commit to petrex/tvm that referenced this pull request Oct 29, 2019
* master:
  Fix split's last factor issue (apache#4044)
  [COMMUNITY] ajtulloch -> committer (apache#4043)
  [TOPI]Add op argwhere (apache#3994)
  [topi] add ARM v8.2 udot (uint8) support (apache#3978)
  [COMMUNITY] anijain2305 -> reviewer (apache#4036)
  [QNN] Renaming dense operator. (apache#4033)
  [Relay][Compile_engine] Int64 shape handling for outputs. (apache#4031)
  Add dmlc-core to the list of installed header directories. (apache#4035)
  [ARITH] migrate indexdiv/mod to floordiv/mod (apache#4008)
  [Relay] Move prelude to text format (apache#3939)
  make tvm compilable by gcc 4.9.2 (apache#4032)
  [AUTOTVM][DOCS] Add a link to the defining network description of auto-tuning tutorial (apache#4023)
  [ARITH] cleanup the indexmod/div on python side (apache#4028)
  [Fix] Add more pad_mode support for onnx converter (apache#4029)
  Add parser support for ReLU tflite operator (apache#4022)
  Additional MXNet Convolution and Deconvolution tests (apache#4026)
  docs: minor spelling tweaks (apache#4027)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants