Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][Quantization] Per-Channel FQ2I #8883

Merged
merged 12 commits into from
Sep 8, 2021
Merged

Conversation

mbrookhart
Copy link
Contributor

This extend the FQ2I pass to support per-channel quantization needed for a number of TF models I've run into recently.

Many thanks to @jwfromm for help finding and fixing issues in the QNN ops with per-channel quantization.

cc @masahi @anijain2305 @elvin-n

@@ -478,7 +478,7 @@ def _impl_v1(cls, inputs, attr, params):
attr["dilations"] = [1] + list(attr["dilations"])
if "pads" in attr:
attr["pads"] = [0, attr["pads"][0], 0, attr["pads"][1]]

attr["channels"] = kernel_shapes[0][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change relate to the subject of the PR or it is side fix? Don't see where "channels" attribute is used in other changes of this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a side fix, I utilized it for an intermediate solution for conv, but I guess I don't technically need it. It doesn't seem to be required in many of the relay passes.

@elvin-n
Copy link
Contributor

elvin-n commented Aug 31, 2021

just to dblcheck assumptions for this PR - we allow to have per-channel scales on the weight inputs only, right? Other stuff like binary op is only a helper for this main purpose?

@mbrookhart
Copy link
Contributor Author

mbrookhart commented Aug 31, 2021

Yeah, the way the math works out, we can only really do per-channel quantization on weights and still end up with an integer conv/dense op, if we have per-channel quantization on the data going into one of these ops the lowering breaks.

All the unary/binary stuff is to help with fusing ops that show up in the graph after the contraction but before the requantize, i.e., things we want to fuse into the contraction.

include/tvm/ir/affine_type.h Outdated Show resolved Hide resolved
python/tvm/ir/affine_type.py Show resolved Hide resolved
python/tvm/ir/affine_type.py Show resolved Hide resolved
@@ -18,13 +18,22 @@
import tvm
from tvm import relay
from tvm.ir import TensorAffineType, TupleAffineType
from tvm.tir import bijective_layout
from ..op import register_fake_quantization_to_integer


def fold_constant(expr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can you use the functions in python/tvm/relay/frontend/common.py or modify them slightly there to support your need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, since this is in transforms and the frontends call the transforms namespace extensively, importing that will cause an import loop.

from ..op import register_fake_quantization_to_integer


def fold_constant(expr):
return relay.transform.FoldConstantExpr(expr, tvm.IRModule())


def get_zeros(scale):
return fold_constant(relay.op.cast(relay.op.zeros_like(scale), "int32"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a cast_like or do we always want things to be int32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero_points in qnn are always int32

src/relay/qnn/op/dense.cc Show resolved Hide resolved
Copy link
Contributor

@elvin-n elvin-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change the description of the qnn.conv2d and qnn.dense to describe which zp and scales are expected. python/tvm/relay/qnn/op/qnn.py is not included into this PR, adding separate note

for example for conv2d we have now

    input_zero_point: tvm.relay.Expr
           The zero point of the data distribution.

    kernel_zero_point: tvm.relay.Expr
           The zero point of the quantized_kernel distribution.

    input_scale: tvm.relay.Expr
           The scale for the input tensor. The scale for the input tensor is
           stored purely for convenience here. See more commentary below.

    kernel_scale: tvm.relay.Expr
           The scale for the weight tensor. The scale for the weight tensor is
           stored for access to this during relay. This information is not
           needed in the pass pipeline after qnn.conv2d is lowered to the
           sequence of steps as in nn.conv2d. See also input_scale in Requantize.

while we allow to have only scalar for input zp and input scales and any(?) zp and scales for weights

python/tvm/ir/affine_type.py Show resolved Hide resolved
arg = expr.args[0]
t = type_map[arg]
scale_shape = infer_shape(t.scale)
z_p = t.zero_point
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might have here 4 situations

  1. scale is scalar, zp is scalar
  2. scale is scalar, zp is not scalar
  3. scale is not scalse, zp is scalar
  4. scale is not scalar, zp is not scalar

cases 3 and 4 are covered by next if, we broadcast zp to scale shapes by axist from AffineType structure
Q: will zp by updated in-place in the TensorAffineType map after broadcast?

case 1 is ok

case 2 - Q: don't we need to handle this explicitly and broadcast scale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the first Q, no, the broadcasted z_p will not be updated in place, it's only used in the computation.

On the second Q, that's an interesting point. I guess I haven't every seen it, but it's feasible. Qnn currently supports scalar scale and scalar zp OR vector scale and scalar zp OR vector scale and vector zp, which matches all of the combinations I've ever seen in the wild. What do you think, should we try to support that in QNN?

python/tvm/relay/transform/fake_quantization_to_integer.py Outdated Show resolved Hide resolved
Matthew and others added 10 commits September 7, 2021 13:53
* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.
* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.
* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.
fix lint

fix lint again
* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Fix bug in requantize dimension expansion.

* Format.

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
respond to review comments
@masahi masahi merged commit 4ffbdcd into apache:main Sep 8, 2021
@mbrookhart mbrookhart deleted the per_channel_fq2i branch September 8, 2021 15:08
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (apache#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (apache#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Per channel fq2i (apache#8)

* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (apache#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (apache#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Fix bug in requantize dimension expansion.

* Format.

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

* respond to review comments

respond to review comments

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (apache#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (apache#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Per channel fq2i (apache#8)

* WIP support per-channel quantization

* more WIP

* More WIP

* fix issue with per-channel bias_add

* Fix fake quantize tests (apache#4)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Add Relu

* One more little one (apache#5)

* Fixed fake quantize issues.

* Formatting.

* Cleanup unused imports

* Fix real int8 tests.

* Fix requantize shape bug.

* Non-working Per-channel Dense

* Fix legalization for non spatial operators. (apache#6)

* Fix legalization for non spatial operators.

* Fix axis checks for end2end functionality.

* fix axis normalization

fix lint

fix lint again

* Fix bug in requantize dimension expansion.

* Format.

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>

* respond to review comments

respond to review comments

Co-authored-by: Josh Fromm <jwfromm@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants