-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microNPU] Fix incorrectly calculated stride when converting NHWC to NHCWB16 #9560
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…NHCWB16 Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
lhutton1
requested review from
anijain2305,
areusch,
comaniac,
icemelon,
jroesch,
junrushao,
jwfromm,
MarisaKirisame,
mbrookhart,
merrymercy,
slyubomirsky,
tqchen,
vinx13,
wweic,
yzhliu,
zhiics and
ZihengJiang
as code owners
November 23, 2021 10:15
mbaret
approved these changes
Nov 23, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - this was a really awkward bug and the fix isn't the most elegant, but I don't think there's a better way to do it.
manupak
approved these changes
Nov 25, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
dchauhan-arm
pushed a commit
to dchauhan-arm/tvm
that referenced
this pull request
Nov 29, 2021
…NHCWB16 (apache#9560) Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
mehrdadh
pushed a commit
to mehrdadh/tvm
that referenced
this pull request
Dec 1, 2021
…NHCWB16 (apache#9560) Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
mehrdadh
pushed a commit
to mehrdadh/tvm
that referenced
this pull request
Dec 1, 2021
…NHCWB16 (apache#9560) Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 7, 2022
…NHCWB16 (apache#9560) Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 11, 2022
…NHCWB16 (apache#9560) Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
yangulei
pushed a commit
to yangulei/tvm
that referenced
this pull request
Jan 12, 2022
…NHCWB16 (apache#9560) Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 13, 2022
…NHCWB16 (apache#9560) Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch. The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format. Change-Id: Ibf76a94a12cebf51fa716fcac1de932a271c4a6d
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes an issue that causes strides to be incorrectly calculated when the number of channels in the input is less than 16 and involves a conversion from NHWC to NHCWB16. This is due to TVM being 'too smart' when analyzing generated TE and removing compute that is deemed unnecessary. Consequently, strides over data are incorrectly calculated leading to an output mismatch.
The PR uses a reduce sum operation to trick TE's data dependency analyzer into looping over a whole block (16), rather than the number of channels actually used (< 16). This causes the calculated strides to be a multiple of 16 which is required for NHCWB16 format.
cc @mbaret @ekalda @manupa-arm @NicolaLancellotti @dchauhan-arm