Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] Convert a fake quantized or QAT graph into QNN ops #8126

Merged
merged 6 commits into from
Jun 8, 2021

Conversation

mbrookhart
Copy link
Contributor

Recently, we discovered that tf2onnx is exporting some int8 graphs as fake quantized/QAT models in ONNX, i.e, int8 ops are exported as dequantize->op->quantize.

This PR introduces a pass to convert those graphs into direct int8 ops inside relay. I've tested correctness of the resulting models on Inceptionv1 and ssd-mobilenet-v1 from the tensorflow lite model zoo imported via ONNX. Follow up work will analyze further models for more operations to include in this pass.

cc @AndrewZhaoLuo @masahi @jwfromm

@masahi
Copy link
Member

masahi commented May 25, 2021

Very nice!! cc @anijain2305 @electriclilies

I wonder if "quantize" is the best verb for saying "rewrite fake quantized graphs into real integer-quantized ones". But I don't have a better alternative either.

@mbrookhart
Copy link
Contributor Author

@masahi @anijain2305 Any thoughts on naming? I still don't love what I have, but I agree with Masa that I haven't been able to come up with anything better...

@electriclilies
Copy link
Contributor

I think the repetition of the word quantize is in quantize_fake_quantization is confusing -- maybe you could avoid using the word quantize as a verb and name it something like fake_quantize_to_int8? That doesn't have an active verb in it at all, though, which is also sub-optimal. And, if this gets expanded to other dtypes you'd need to change the name..

@mbrookhart mbrookhart force-pushed the fake_quantized_to_quantized branch from ad608d4 to 2d52eed Compare June 3, 2021 17:46
@mbrookhart
Copy link
Contributor Author

Hmm, that's an interesting idea. To throw other random thoughts out:
fake_quantize_to_integer
fake_quantization_to_affine_space
propagate_integer_ops

Any thoughts? Rebased to get around a weird threading bug in another CI test, if people have a naming preference I can refactor while the CI runs to make sure the pass it working.

@electriclilies
Copy link
Contributor

@mbrookhart I think fake_quantization_to_affine_space and fake_quantization_to_integer are the best options. I slightly preferfake_quantization_to_integer because it's a bit more concise.

Also, I did a quick google search and I think that the term affine space is used when talking about quantization in physics, but I didn't see any references to it in computer science literature. The only thing that comes up if you search "affine space" is stuff about vector spaces, and if you search "quantization affine space" you get physics papers.

So I think if we do use the term affine space, we should be careful to explain what we mean by it in code comments and documentation since it's not a term that is commonly used.

@mbrookhart
Copy link
Contributor Author

I'm a physicist, that must be why that term makes so much more sense to me :D

@mbrookhart
Copy link
Contributor Author

But I'm happy to use fake_quantization_to_integer

@masahi
Copy link
Member

masahi commented Jun 3, 2021

I also prefer fake_quantization_to_integer. I usually don't associate the word "affine" with integers, I think it is more commonly used when talking about affine transform.

@anijain2305
Copy link
Contributor

Thanks, this is nice addition and improves framework coverage very nicely. I agree that fake_quantization_to_integer is more natural. I have typically used affine for loop transformations.

@mbrookhart
Copy link
Contributor Author

Awesome, thanks everyone, I'll refactor to that name.

@mbrookhart
Copy link
Contributor Author

Refactor done. Thanks!

@mbrookhart mbrookhart force-pushed the fake_quantized_to_quantized branch from 1451a1b to 92952bb Compare June 4, 2021 15:40
@anijain2305
Copy link
Contributor

@masahi @electriclilies Please approve explicitly when you get a chance. And we can land this.

Copy link
Contributor

@electriclilies electriclilies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, one nitpick is that you use the word "affine" in a bunch of the documentation and some of the internal names without defining it, which I do think is important since it is a term that isn't used in computer science often. I think at least defining it in the documentation of AffineType and in the documentation of register_fake_quantization_to_integer would be good.
I don't think it's super important though so you could add the definitions in a later PR.

python/tvm/relay/op/op.py Outdated Show resolved Hide resolved
@anijain2305
Copy link
Contributor

@mbrookhart Feel free to merge the PR as you decide if want to address Lily's comments in this or next PR. All good from my side.

@mbrookhart
Copy link
Contributor Author

mbrookhart commented Jun 7, 2021

@electriclilies Thanks for the suggestions, I added a definition for completeness, I don't want to confuse users. I think that it is a fairly common term in the quantization literature, though, see, for instance,
https://arxiv.org/pdf/1712.05877.pdf
https://arxiv.org/pdf/2004.09602.pdf

@mbrookhart mbrookhart force-pushed the fake_quantized_to_quantized branch from 4617b8c to c150e4f Compare June 8, 2021 14:38
@mbrookhart mbrookhart merged commit 9be0f4f into apache:main Jun 8, 2021
@mbrookhart
Copy link
Contributor Author

Thanks @masahi @anijain2305 @electriclilies

@mbrookhart mbrookhart deleted the fake_quantized_to_quantized branch June 8, 2021 19:32
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 17, 2021
* Convert a fake quantized or QAT graph into qnn ops

* fix pylint

* fix typos

* use an identify function for some ops

* rename the pass from quantize_fake_quantization to fake_quantization_to_integer

* add definition for affine
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 17, 2021
* Convert a fake quantized or QAT graph into qnn ops

* fix pylint

* fix typos

* use an identify function for some ops

* rename the pass from quantize_fake_quantization to fake_quantization_to_integer

* add definition for affine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants