Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAY][VM] Enable heterogeneous execution for Relay VM #6337

Merged
merged 21 commits into from
Sep 3, 2020

Conversation

zhiics
Copy link
Member

@zhiics zhiics commented Aug 25, 2020

Currently, the dynamic models can only be executed for on CPU. The GPU execution is not allowed for these models because they have shape functions to do runtime type inference. These functions may contain various control logic to derive the shape of a tensor at runtime and they are never compute intensive, therefore are designed to be executed on CPU. That being said, we must use CPU to execute these functions even when trying to run the whole model on other devices. This PR enables the heterogeneous execution for Relay VM to support dynamic models on devices other than CPU.

More specifically, it includes the following changes:

  • makes the memory_alloc and memory plan passes context aware when inserting vm/memory dialects.
  • designs a union-find based context analysis pass to analyze the device context of the IR node in a relay program [Thanks @jroesch and @icemelon9 for help]
  • implements a DeviceCopy instruction in VM to copy data directly cross different devices.
  • enables GPU tests for various unit tests involving dynamic inputs/shape functions, namely those in test_any.py, test_adt.py, and test_vm.py, and dynamic namespace tests.
  • tests heterogeneous execution for the static cases used for graph runtime (test_pass_annotation.py)
  • fixes several bugs in the VM that are manifested by heterogeneous execution

Followup PRs will fix/add schedules for some ops to enable GPU execution for Bert and TF objection detection models.

cc @icemelon9 @jroesch @mbrookhart @wweic

@zhiics zhiics changed the title [RELAY][VM] Enable heterogeneous execution to VM [RELAY][VM] Enable heterogeneous execution for Relay VM Aug 25, 2020
Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look, mostly in the python sources.

python/tvm/relay/analysis/analysis.py Outdated Show resolved Hide resolved
python/tvm/relay/transform/memory_alloc.py Outdated Show resolved Hide resolved
@mbrookhart
Copy link
Contributor

Yay! I'm so excited for this! I'll do a deep dive today

There are a number tests in tests/python/relay/dyn that skip running on GPU while waiting for this feature, i.e. https://github.com/apache/incubator-tvm/blob/942c90ba7a7b9bccf6d9bce43808aba2bd6c9787/tests/python/relay/dyn/test_dynamic_op_level3.py#L30-L31

Do you want to enable those as part of this test? Or I can do it as a second PR.

@zhiics
Copy link
Member Author

zhiics commented Aug 26, 2020

@mbrookhart Thanks for reminding, I just enabled all the dynamic op tests except for level6 because topk has a problem for GPU which I have already had a TODO in the test_any. We need to look into it later.

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks, I'd like to see a little more documentation on the passes, I'm not sure I fully understand what you're doing just from looking at the code, but overall it looks really good, I'm excited!

python/tvm/relay/backend/vm.py Show resolved Hide resolved
python/tvm/relay/transform/memory_alloc.py Outdated Show resolved Hide resolved
python/tvm/relay/transform/memory_alloc.py Show resolved Hide resolved
src/relay/analysis/context_analysis.cc Show resolved Hide resolved
src/relay/analysis/context_analysis.cc Show resolved Hide resolved
Copy link
Member

@icemelon icemelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half way through the pr. will come back and review the rest

python/tvm/runtime/vm.py Outdated Show resolved Hide resolved
python/tvm/runtime/vm.py Outdated Show resolved Hide resolved
src/runtime/vm/executable.cc Show resolved Hide resolved
src/runtime/vm/vm.cc Outdated Show resolved Hide resolved
src/runtime/vm/vm.cc Outdated Show resolved Hide resolved
src/runtime/vm/vm.cc Outdated Show resolved Hide resolved
python/tvm/runtime/vm.py Outdated Show resolved Hide resolved
src/relay/analysis/context_analysis.cc Outdated Show resolved Hide resolved
src/relay/analysis/context_analysis.cc Show resolved Hide resolved
src/relay/analysis/context_analysis.cc Outdated Show resolved Hide resolved
src/runtime/vm/vm.cc Outdated Show resolved Hide resolved
Copy link
Member

@icemelon icemelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@icemelon icemelon merged commit 1224d56 into apache:master Sep 3, 2020
@icemelon
Copy link
Member

icemelon commented Sep 3, 2020

Thanks @zhiics @mbrookhart @leandron @jwfromm

@mbrookhart
Copy link
Contributor

Sorry for my delay! I've been out the last few days moving. Anyway, looking over what was changed since my last review, I'm happy to give it a post-merge approval, looks great! Thanks @icemelon9 and @zhiics. I'll start enabling the dynamic tests on gpu and I'll work on fixing anything that fails (including topk)

@mbrookhart mbrookhart mentioned this pull request Sep 3, 2020
kevinthesun pushed a commit to kevinthesun/tvm that referenced this pull request Sep 17, 2020
* vm heterogeneous execution

* context analysis on module

* fix profiler

* fix memory plan

* add more unification

* add serialization

* add gpu tests for test_adt

* cache visited functions

* path compression

* C++ context analysis

* remove python context analysis

* add tests

* clean

* lint

* fix

* enable gpu test for dynamic namespace

* remove GetParamsContext

* fix comments and add doc for context analysis

* cache context

* cache allocator

* rebase and fix comments
kevinthesun pushed a commit to kevinthesun/tvm that referenced this pull request Sep 18, 2020
* vm heterogeneous execution

* context analysis on module

* fix profiler

* fix memory plan

* add more unification

* add serialization

* add gpu tests for test_adt

* cache visited functions

* path compression

* C++ context analysis

* remove python context analysis

* add tests

* clean

* lint

* fix

* enable gpu test for dynamic namespace

* remove GetParamsContext

* fix comments and add doc for context analysis

* cache context

* cache allocator

* rebase and fix comments
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 18, 2020
* vm heterogeneous execution

* context analysis on module

* fix profiler

* fix memory plan

* add more unification

* add serialization

* add gpu tests for test_adt

* cache visited functions

* path compression

* C++ context analysis

* remove python context analysis

* add tests

* clean

* lint

* fix

* enable gpu test for dynamic namespace

* remove GetParamsContext

* fix comments and add doc for context analysis

* cache context

* cache allocator

* rebase and fix comments
@jwfromm jwfromm mentioned this pull request Sep 24, 2020
@zhiics zhiics deleted the hetero_vm branch October 8, 2020 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants