Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64 base algorithm refactoring in LLVM #6907

Merged
merged 6 commits into from
Nov 23, 2020

Conversation

giuseros
Copy link
Contributor

  • I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR
  • Removed the interleave boolean parameter in the intrinsic to switch
    among two different interleaving modes. LLVM will now take care of
    interleaving the instructions
  • Applied the changes accordingly to conv2d_gemm.py to call the right
    instrinsic

Note: I found LLVM very sensible to the choice of the -mcpu.
So, in order to preserve performance, it is important to specify the
right -mcpu when creating the LLVM target

@giuseros
Copy link
Contributor Author

cc @mbaret @u99127 @anijain2305

@mbaret mbaret self-assigned this Nov 16, 2020
Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps slightly more explanation on the add-pair part of the intrinsic, otherwise looks like a significant improvement.

python/tvm/tir/ir_builder.py Outdated Show resolved Hide resolved
python/tvm/topi/arm_cpu/tensor_intrin.py Show resolved Hide resolved
python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
@mbaret
Copy link
Contributor

mbaret commented Nov 17, 2020

cc @FrozenGene @yzhliu if you're interested

giuseros pushed a commit to giuseros/incubator-tvm that referenced this pull request Nov 19, 2020
This PR stemmed from apache#6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.
giuseros pushed a commit to giuseros/incubator-tvm that referenced this pull request Nov 19, 2020
This PR stemmed from apache#6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.
giuseros pushed a commit to giuseros/incubator-tvm that referenced this pull request Nov 19, 2020
This PR stemmed from apache#6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.
giuseros pushed a commit to giuseros/incubator-tvm that referenced this pull request Nov 19, 2020
This PR stemmed from apache#6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.
tqchen pushed a commit that referenced this pull request Nov 20, 2020
* Bug-fix] Fix tir allocation with multiple lanes

This PR stemmed from #6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.

* Address dtyped vs non-dtyped constant cases
Giuseppe Rossini added 4 commits November 20, 2020 10:55
- I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR
- Removed the `interleave` boolean parameter in the intrinsic to switch
among two different interleaving modes. LLVM will now take care of
interleaving the instructions
- Applied the changes accordingly to conv2d_gemm.py to call the right
instrinsic

Note: I found LLVM very sensible to the choice of the `-mcpu`.
So, in order to preserve performance, it is important to specify the
right `-mcpu` when creating the LLVM target
Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some final comments on the docstrings.

python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
python/tvm/topi/arm_cpu/tensor_intrin.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mbaret mbaret merged commit 5423ffe into apache:main Nov 23, 2020
@mbaret
Copy link
Contributor

mbaret commented Nov 23, 2020

This is now merged, thanks @giuseros !

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
* Bug-fix] Fix tir allocation with multiple lanes

This PR stemmed from apache#6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.

* Address dtyped vs non-dtyped constant cases
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
* AArch64 base algorithm refactoring in LLVM

- I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR
- Removed the `interleave` boolean parameter in the intrinsic to switch
among two different interleaving modes. LLVM will now take care of
interleaving the instructions
- Applied the changes accordingly to conv2d_gemm.py to call the right
instrinsic

Note: I found LLVM very sensible to the choice of the `-mcpu`.
So, in order to preserve performance, it is important to specify the
right `-mcpu` when creating the LLVM target

* Fix linting

* Fix linting -2

* Fixing comments

* Address review comments

* Fix spaces around ':' in docstrings
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
* Bug-fix] Fix tir allocation with multiple lanes

This PR stemmed from apache#6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.

* Address dtyped vs non-dtyped constant cases
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
* AArch64 base algorithm refactoring in LLVM

- I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR
- Removed the `interleave` boolean parameter in the intrinsic to switch
among two different interleaving modes. LLVM will now take care of
interleaving the instructions
- Applied the changes accordingly to conv2d_gemm.py to call the right
instrinsic

Note: I found LLVM very sensible to the choice of the `-mcpu`.
So, in order to preserve performance, it is important to specify the
right `-mcpu` when creating the LLVM target

* Fix linting

* Fix linting -2

* Fixing comments

* Address review comments

* Fix spaces around ':' in docstrings
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
* Bug-fix] Fix tir allocation with multiple lanes

This PR stemmed from apache#6907
and it is fixing a small error in the getter and setter of a buffer for
the case where `t.lanes > 1`. I also added a test to stress the issue.

* Address dtyped vs non-dtyped constant cases
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
* AArch64 base algorithm refactoring in LLVM

- I refactored the assembly in arm_cpu/tensor_intrin.py to use LLVM+TIR
- Removed the `interleave` boolean parameter in the intrinsic to switch
among two different interleaving modes. LLVM will now take care of
interleaving the instructions
- Applied the changes accordingly to conv2d_gemm.py to call the right
instrinsic

Note: I found LLVM very sensible to the choice of the `-mcpu`.
So, in order to preserve performance, it is important to specify the
right `-mcpu` when creating the LLVM target

* Fix linting

* Fix linting -2

* Fixing comments

* Address review comments

* Fix spaces around ':' in docstrings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants