Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microNPU] Fix incorrectly calculated stride when converting NHWC to NHCWB16 #9560

Merged
merged 1 commit into from
Nov 25, 2021

Conversation

lhutton1
Copy link
Contributor

Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch.

The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format.

cc @mbaret @ekalda @manupa-arm @NicolaLancellotti @dchauhan-arm

…NHCWB16

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - this was a really awkward bug and the fix isn't the most elegant, but I don't think there's a better way to do it.

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manupak manupak merged commit 238958f into apache:main Nov 25, 2021
@manupak
Copy link
Contributor

manupak commented Nov 25, 2021

Thanks! @lhutton1 @mbaret .

@lhutton1 lhutton1 deleted the fix-layout-conversion-bug branch November 25, 2021 14:40
dchauhan-arm pushed a commit to dchauhan-arm/tvm that referenced this pull request Nov 29, 2021
…NHCWB16 (apache#9560)

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
…NHCWB16 (apache#9560)

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
…NHCWB16 (apache#9560)

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…NHCWB16 (apache#9560)

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
…NHCWB16 (apache#9560)

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
…NHCWB16 (apache#9560)

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…NHCWB16 (apache#9560)

Fixes an issue that causes strides to be incorrectly calculated when the
number of channels in the input is less than 16 and involves a conversion
from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing
generated TE and removing compute that is deemed unnecessary. Consequently,
strides over data are incorrectly calculated leading to an output
mismatch.

The PR uses a reduce sum operation to trick TE's data dependency
analyzer into looping over a whole block (16), rather than the number
of channels actually used (< 16). This causes the calculated strides to
be a multiple of 16 which is required for NHCWB16 format.

Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants