Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microNPU] Tweak a layout transform matrix #10763

Merged
merged 3 commits into from
Mar 30, 2022

Conversation

ekalda
Copy link
Contributor

@ekalda ekalda commented Mar 24, 2022

One of the layout transform matrices currently causes the cascader to stripe
across B16 axis (which is not allowed), so change that and deal with
the implications to the get_valid_block_configs.

@ekalda
Copy link
Contributor Author

ekalda commented Mar 24, 2022

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @ekalda! Just some small suggestions

@@ -187,7 +187,7 @@ def conv2d_compute(
[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 16, 0, 1, -16],
[0, 0, 0, 0, 0, ofm_channels],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Any reason for not making this change in binary_elementwise and unary_elementwise? Additionally it looks like the nhcwb16_to_nhwc and nhwc_to_nhcwb16 matrices are the same across the operators, should we move it to a common place like util to reduce duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I didn't make that change since the old transformation matrix wasn't "harmful" in the case of these operators (it was only a problem for operators that had weights or some kind of a kernel), but I think it makes sense to unify the transform for all the NPU ops since then we can indeed easily define the layout transform matrices in one place and easily reuse them across all the TEs and also in the tests.

@@ -169,7 +169,7 @@ def pooling_compute(
[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 16, 0, 1, -16],
[0, 0, 0, 0, 0, int(ofm_channels)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is int(...) necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since the type of ofm_channels is IntImm, so without the cast it would end up causing an error in propagator.py

@ekalda
Copy link
Contributor Author

ekalda commented Mar 28, 2022

Thanks for the reviews @lhutton1 and @NicolaLancellotti! I've addresed the comments...

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ekalda ! (and other reviewers) .
This looks much cleaner and I've added some suggestions to improve the docs around the matrices
(so it is clear why those numbers looks like that)

from typing import Tuple, List


def get_layout_transform_matrices(ofm_channels: int) -> Tuple[List[List[float]], List[List[float]]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[0, 1, 0, 0, 0],
[0, 0, 0, 1 / 16, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 16],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs : might worth putting a comment to indicate that b16 axis is always going to be of fixed size of 16.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, ofm_channels],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs : lets state that the conversion of nhwc_to_nhcwb16 is lossy (because b16 axis is fixed to 16). Therefore, to recover the original "c" of "nhwc", we need to use the original channels in the transform matrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions @manupa-arm! I added documentation for the transform ops, does it make sense?

from typing import Tuple, List


def get_layout_transform_matrices(ofm_channels: int) -> Tuple[List[List[float]], List[List[float]]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[0, 1, 0, 0, 0],
[0, 0, 0, 1 / 16, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 16],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, ofm_channels],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

One of the layout transforms currently causes the cascader to stripe
across B16 axis (which is not allowed), so change that and deal with
the implications to the get_valid_block_configs.

Change-Id: I04199f9f35fcc31618581567483cfb80d3b5aad2
* Change the nhcwb16_to_nhwc matrix for binary and unary elementwise
  such that it matches the other NPU ops
* Reduce the number of places where the same layout transform matrices are
  defined
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@manupak manupak merged commit d0c7c78 into apache:main Mar 30, 2022
@manupak
Copy link
Contributor

manupak commented Mar 30, 2022

Thanks @ekalda @NicolaLancellotti @lhutton1

@ekalda ekalda deleted the layout-transform-fix branch March 30, 2022 13:47
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
* [microNPU] Fix layout transform matrix

One of the layout transforms currently causes the cascader to stripe
across B16 axis (which is not allowed), so change that and deal with
the implications to the get_valid_block_configs.

Change-Id: I04199f9f35fcc31618581567483cfb80d3b5aad2

* Reduce the duplication of layout transfrom matrices

* Change the nhcwb16_to_nhwc matrix for binary and unary elementwise
  such that it matches the other NPU ops
* Reduce the number of places where the same layout transform matrices are
  defined

* Add documentation to the layout transform matrices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants