Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove MemoryPlan from VM passes #7361

Merged
merged 1 commit into from
Jan 29, 2021

Conversation

mbrookhart
Copy link
Contributor

@jroesch @masahi

As discussed, this disables MemoryPlan in the VM until we can rewrite it to do full reuse planning. The current pass slows down compilation a lot without providing a strong benefit for runtime performance.

This PR also removes a debug print that snuck into another part of the VM

// incomplete to provide memory resuse optimizations. Disable it until we can
// rewrite it in C++ and complete it.
// // Perform memory planning in order to coalesce/reduce allocations.
// pass_seqs.push_back(transform::MemoryPlan());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some benchmark data from dynamic models, such as tf ssd/rcnn, to show the performance impact of disabling this pass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my experience, the performance difference with and without this pass is not evident, at least for the BERT case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icemelon9 Great to see you working on Bert. I've recently been trying to optimize bert with TVM, especially dynamic-batched case. I found that the Relay VM solution introduced a lot of small pieces of PackedFunc related to alloc_storage's size calculation, and it makes vm slower. Do you have any relevant experience or idea to share on this? Thanks. More detailed discussion: https://discuss.tvm.apache.org/t/guideline-relay-aot/5977/17?u=monklof

@kevinthesun
Copy link
Contributor

cc @zhiics @icemelon9

@masahi
Copy link
Member

masahi commented Jan 28, 2021

Can we simply use the pass infra to disable it, like

with tvm.transform.PassContext(opt_level=3, disabled_pass=["MemoryPlan"]):
     ...

@mbrookhart
Copy link
Contributor Author

mbrookhart commented Jan 28, 2021

We could, but we've tested a few onnx and pytorch models and don't see any performance differences, and @jroesch tells me the purpose of the pass was the first half of plan to do graph-runtime like memory reuse, but the second half was never implemented. Unless we can find a usecase where it helps, I think it makes more sense to disable it entirely until we can get the full feature working.

@kevinthesun
Copy link
Contributor

@mbrookhart Thanks. It would be great if you can try tf ssd and fasterrcnn so that we can ensure there is no regression for tf models as well.

Copy link
Member

@icemelon icemelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll leave the merge decision to @kevinthesun.

@mbrookhart
Copy link
Contributor Author

@kevinthesun I'd be happy to do some testing. Do you have scripts for running those models? I'm not finding them in the tutorials.

@kevinthesun
Copy link
Contributor

@mbrookhart Sure. You can refer to tf ssd integration test.

@mbrookhart
Copy link
Contributor Author

Okay for TF-SSD, I timed 10 calls to vm.invoke after all of the compilation happened and averaged them, running on a Ryzen 5950x

main:
34.41 ms/iteration
439s to execute the whole test (import/compile/execution) as measured by pytest
this:
34.57 ms/iteration
266s to execute the whole test

@kevinthesun
Copy link
Contributor

LGTM

@kevinthesun kevinthesun merged commit ef032b3 into apache:main Jan 29, 2021
@altanh
Copy link
Contributor

altanh commented Jan 29, 2021

I'm not sure we have the infra for this, but going forward it would also be interesting to see the memory usage differences with and without, since this pass would affect allocation behavior. I imagine our peak usage would actually decrease atm without the pass since the liveness analysis phase hasn't been implemented, so we are keeping memory around longer than needed.

@kevinthesun
Copy link
Contributor

Thanks @mbrookhart @masahi @icemelon9 @zhiics

@mbrookhart mbrookhart deleted the remove_memoryplan branch January 29, 2021 01:28
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 11, 2021
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
Lokiiiiii pushed a commit to Lokiiiiii/tvm that referenced this pull request Mar 2, 2021
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants