Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] A set of utilities that allows a model to be run efficiently on tensorcores. #6748

Merged
merged 10 commits into from
Oct 26, 2020

Conversation

jwfromm
Copy link
Contributor

@jwfromm jwfromm commented Oct 23, 2020

This collection of new utility functions allows enables a starting floating point model to be converted to a datatype and format that can be run using the efficient HWNC tensorcore schedules introduced in #6121. Although these schedules are the fastest available in TVM, they have a few very specific requirements that make it difficult to apply generally to models. Specifically, compatible operators must have inputs set to int4 or int8, all compatible layers must be in the HWNC layout, and incompatible layers should be left in their original layout and datatype. There are currently not tools to make such changes to an existing model. To address this, I've written the following utilities:

count_layers: A pass that determines the number of layers of the specified operator in a graph. Although generally useful, for tensorcores we use this to enable the skip_layers feature.

recast: A pass that changes the input and output datatype of all specified operators in a graph, with the option to skip a set of layers. Although this pass is only useful for benchmarking as it does not apply any intelligent quantization, this type of utility is a common topic on the Discuss forums and can serve as a good example for users interested in similar functionality.

LayoutConfig: An optional scope that can be applied around the ConvertLayout pass. In this PR I use it to enable skipping the conversion of specified conv2d layers, but it could be extended for other customization down the line.

HWNC support for ConvertLayout.

The combination of these utilities allows us to target HWNC tensorcores using a workflow such as this:

mod, params = relay.testing.resnet.get_workload()
layout_config = relay.transform.LayoutConfig(skip_layers=[0])
desired_layouts = {'nn.conv2d: ['HWNC', 'default']}
with layout_config:
    seq = tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts)])
    with tvm.transform.PassContext(opt_level=3):
        mod = seq(mod)
mod = recast(mod, 'int4', 'int32', skip_layers=[0])

When autotuned, the resulting mod will qualify for using the HWNC tensorcore strategy.

@jwfromm
Copy link
Contributor Author

jwfromm commented Oct 23, 2020

@Laurawly @masahi @csullivan @jroesch Can you guys take a look at this PR?

Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the layercount and recastmutator passes @jwfromm! They are quite useful additions to have.


def __init__(self, skip_layers=None):
self.skip_counter = 0
self.skip_layers = skip_layers if skip_layers is not None else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When have you found it useful to skip a specific layer of a given operator type / how do you envision it being used? Mainly for debugging and performance tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the first layer of most networks does not have a sufficient number of channels for our tensorcore schedules to be applied. Although this would in theory not be a problem, there aren't HWNC schedules for GPU. So if you blindly apply ConvertLayout to all layers, you end up with a first layer that cant be executed. Skipping it during conversion is an elegant way to avoid this issue. I imagine a similar pathology could apply to other situations.

@tqchen tqchen merged commit 8d56164 into apache:main Oct 26, 2020
zhiics pushed a commit to zhiics/tvm that referenced this pull request Oct 28, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Oct 29, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
@jwfromm jwfromm deleted the hwnc_tensorcore branch April 12, 2023 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants