bump PyTorch version to 1.11 #10794

t-vi · 2022-03-26T05:27:49Z

This bumps PyTorch to 1.11 and fixes 3 test failures. The bump is required to enable the libtorch_ops fallback due to DLPack version incompatibilities.

QAT training has its own fuse_modules version (fuse_modules_qat) in PyTorch, so I changed the test.

Two amendments to the front end:

searchsorted gets more (optional) parameters to its signature,
There is a sub variant with alpha (a - alpha * b). PyTorch rewrites rsub with alpha to this, but we ignored it. Now we handle sub with alpha.

Thank you, @masahi for getting me started with the bump and pointing out the test failures. Any errors are my own.

t-vi · 2022-03-26T07:21:15Z

Caffe2 has been dropped from PyTorch, which is what we are getting here: pytorch/pytorch#67151

t-vi · 2022-03-26T07:27:41Z

I think I could need a hint how to proceed, given that this is likely a major issue for all caffe2 use in TVM.

masahi · 2022-03-26T07:32:31Z

Unless there is a standalone way to install caffe2, then I think removing caffe2 support entirely is the only way forward. Our caffe2 frontend hasn't been updated for years, so I don't think people would object...

masahi · 2022-03-26T07:42:11Z

We can propose dropping caffe2 support to the community next week. In the meantime, we can remove caffe2 bits from CI to unblock this PR:

Remove https://github.com/apache/tvm/blob/main/docker/Dockerfile.ci_gpu#L80-L81
Remove

tvm/tests/scripts/task_python_frontend.sh

Line 60 in 92a80e9

run_pytest cython python-frontend-caffe2 tests/python/frontend/caffe2
Probably need to remove https://github.com/apache/tvm/blob/44549e623433dd10d9e97e442ef529fb44c46c14/gallery/how_to/compile_models/from_caffe2.py

masahi · 2022-03-26T07:55:27Z

Also ci-qemu has failed to install PT 1.11 for some version conflict reason https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10794/1/pipeline/67. But I don't know why it needs to install pytorch... probably it only needs onnx. So I suggest decoupling pytorch install from ubuntu_install_onnx.sh and create ubuntu_install_pytorch.sh.

masahi · 2022-03-26T08:16:33Z

python/tvm/relay/frontend/pytorch.py

@@ -2938,7 +2948,7 @@ def create_convert_map(self):
            "aten::pixel_shuffle": self.pixel_shuffle,
            "aten::device": self.none,
            "prim::device": self.none,
-            "aten::sub": self.make_elemwise("subtract"),
+            "aten::sub": self.sub,


It seems this change breaks test_lstm.py:

FAILED test_lstm.py::test_custom_lstm - AttributeError: 'function' object has no attribute 'dtype'

I'll see to fixing it. Thank you.

t-vi · 2022-03-27T05:00:59Z

Also ci-qemu has failed to install PT 1.11 for some version conflict reason

Ohoh. I think this is because Python 3.6 is EOL upstream and PyTorch doesn't support it anymore...
This is a larger can of worms than I had hoped for. 🙂

masahi · 2022-03-27T06:03:20Z

I thought we are now running Python 3.7, see

but maybe the qemu image hasn't been updated yet.

leandron · 2022-03-27T10:22:53Z

I thought we are now running Python 3.7, see

[docker] Update CI to Python 3.7 and Ubuntu 18 #10247

[CI] Update docker images to bring Python 3.7 and TensorFlow 2.6 #10654

but maybe the qemu image hasn't been updated yet.

On this, it looks like ci_qemu (https://github.com/apache/tvm/blob/main/docker/Dockerfile.ci_qemu) is not installing Python using the commons script: https://github.com/apache/tvm/blob/main/docker/install/ubuntu1804_install_python.sh.

This probably needs to be fixed in a separate PR. Do you want to send that fix? (Asking just because I won’t be able to take this for the next ~two weeks).

masahi · 2022-03-28T06:09:45Z

Note that probably we need to update the ci-qemu image version in https://github.com/apache/tvm/blob/main/Jenkinsfile#L54 as well. For that, we need to wait until a new nightly image containing the docker script update from your previous PR is pushed to https://hub.docker.com/r/tlcpackstaging/ci_qemu/tags.

So please wait another day before resuming PT 1.11 update work, or remove pytorch install from ubuntu_install_onnx.sh.

t-vi · 2022-03-28T06:32:49Z

@masahi Thank you for merging the qemu Python bump and the advice. I think waiting for it to show up might be the cleanest option, especially given that I still need to learn so much about how the CI works. :)

t-vi · 2022-03-29T08:10:15Z

@masahi OK, so now we're part of the nightly with the qemu update. How would I get a version tag that is useful for bumping the version in the Jenkinsfile?

masahi · 2022-03-29T08:26:26Z

Yeah, so the way it works is as follows

We update the ci-docker-staging branch https://github.com/apache/tvm/tree/ci-docker-staging to point Jenkinsfile to the new image in tlcpackstaging.
After the CI passes, we retag the nightly image as v0.83 etc and push the image to tlcpack dockerhub.
Update Jenkinsfile in main to point to the new image on tlcpack, and send a PR to merge this change.

Note that the above process needs to happen for every CI image update. Right now we are in the middle of ci-qemu update, but after we merge this PR to update PT, we need to go through the same exercises to update ci-gpu.

I've been trying to run a CI job on ci-docker-staging, I keep getting various errors but hopefully I can pass it in a few hours.

t-vi · 2022-03-29T08:30:31Z

Thank you @masahi . So if I understand this right, the next step is something you need to do? I'd appreciate a shout if I can proceed here or help other bits along.

masahi · 2022-03-29T08:55:03Z

Yes, pushing changes to ci-docker-staging, or pushing a new image to tlcpack dockerhub need to be done by a committer. No worry, I've done this many times. But since we need to wait for at least two CI runs (one for ci-docker-staging to test the new image, another for main to actually update the image), things won't be done by today.

leandron · 2022-03-29T08:57:11Z

Yes, pushing changes to ci-docker-staging, or pushing a new image to tlcpack dockerhub need to be done by a committer. No worry, I've done this many times. But since we need to wait for at least two CI runs (one for ci-docker-staging to test the new image, another for main to actually update the image), things won't be done by today.

Just a heads up that you're likely to see the issue with updated containers - #10696.

masahi · 2022-03-29T08:59:31Z

Yeah I hit that error once a couple of hours before, fortunately the ongoing run https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/238/pipeline/ didn't hit it.

masahi · 2022-03-29T11:38:00Z

@t-vi the PR #10814 should unblock this work.

masahi · 2022-03-29T22:50:54Z

#10815 was merged so we can resume the work on this PR. I restarted a CI job by closing / reopening (there should be a better way than this...)

Also I dropped caffe2 deprecation announcement in https://discuss.tvm.apache.org/t/caffe2-frontend-support-is-being-dropped-to-unblock-pytorch-update/12442

t-vi · 2022-03-30T04:03:30Z

[2022-03-29T23:00:39.045Z] unknown parent image ID sha256:f75815a47f249990da41ca0e349ede16f8710bd2d573de74a9afbd1a9b528055

in the docker build. I would not even know what to look at here... 😕

masahi · 2022-03-30T04:23:05Z

Hopefully it is just a flaky issue, since the error came from the unrelated image ci-hexagon

t-vi · 2022-03-30T05:43:51Z

I'm not sure whether it is flakiness or merging main, but it seems to be past that bit now. Hopefully, if there are more failures, it'll be something I can look into to fix. Thank you for all your help @masahi !

masahi · 2022-03-30T07:25:35Z

you got an error in one of tests because the CI environment is still using PT 1.10. We get to use 1.11 at the last step of #10794 (comment)

So can you revert that change or use different code paths depending on versions?

t-vi · 2022-03-30T07:31:18Z

I think I have the error from a premature 1.12 compat change, I'm fixing this right now but want to test locally.

… PyTorch

masahi · 2022-03-30T10:10:18Z

Another unknown parent image ID error https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10794/6/pipeline
cc @driazati (the other one was https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10794/4/pipeline/)

t-vi · 2022-03-30T13:56:05Z

All green. 🙂

t-vi · 2022-03-30T19:15:13Z

Thank you, @masahi for merging and helping me. So next we would need to update the docker gpu image used before I can return to enabling the test that needs PyTorch 1.11?

masahi · 2022-03-30T19:19:09Z

Yes, the next step is to wait until the nightly image appears in https://hub.docker.com/r/tlcpackstaging/ci_gpu/tags. It will happen about 12h later.

leandron · 2022-04-05T10:55:31Z

To confirm here, as I did the update this time... the latest version of the images now contains PyTorch 1.11.

$ docker run -it --rm tlcpack/ci-gpu:v0.84 bash
root@6ba934f13b82:/# python3
Python 3.7.5 (default, Dec  9 2021, 17:04:37) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.11.0+cu102'
>>>

masahi · 2022-04-05T11:12:19Z

The update to PT 1.11 was already done in #10849

* bump PyTorch version to 1.11 * disable some caffe2 ci * Fix sub conversion in PyTorch frontend * use fuse_modules_qat if available, fallback to fuse_modules for older PyTorch * Re-Run CI

bump PyTorch version to 1.11

f333889

t-vi mentioned this pull request Mar 26, 2022

Manually add libtorch op test #10758

Open

masahi reviewed Mar 26, 2022

View reviewed changes

t-vi added 2 commits March 28, 2022 05:10

disable some caffe2 ci

9511ff1

Fix sub conversion in PyTorch frontend

d253476

t-vi mentioned this pull request Mar 28, 2022

use python3.7 install script in ci-qemu #10799

Merged

masahi mentioned this pull request Mar 29, 2022

[CI] Update ci-qemu to use python 3.7 #10814

Closed

masahi mentioned this pull request Mar 29, 2022

[CI] Update ci-qemu to use python 3.7 #10815

Merged

masahi closed this Mar 29, 2022

masahi reopened this Mar 29, 2022

Merge branch 'main' into bump_pytorch

b3e0fc0

use fuse_modules_qat if available, fallback to fuse_modules for older…

e728c3d

… PyTorch

Re-Run CI

fec6897

masahi approved these changes Mar 30, 2022

View reviewed changes

masahi merged commit 6d42264 into apache:main Mar 30, 2022

This was referenced Mar 30, 2022

[skip ci][hotfix] Fix broken lint #10827

Merged

[ci][docker] Remove nvidia ml repository before updating #10828

Merged

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bump PyTorch version to 1.11 #10794

bump PyTorch version to 1.11 #10794

t-vi commented Mar 26, 2022

t-vi commented Mar 26, 2022

t-vi commented Mar 26, 2022

masahi commented Mar 26, 2022

masahi commented Mar 26, 2022 •

edited

Loading

masahi commented Mar 26, 2022

masahi Mar 26, 2022

t-vi Mar 26, 2022

t-vi commented Mar 27, 2022

masahi commented Mar 27, 2022

leandron commented Mar 27, 2022

masahi commented Mar 28, 2022 •

edited

Loading

t-vi commented Mar 28, 2022

t-vi commented Mar 29, 2022

masahi commented Mar 29, 2022

t-vi commented Mar 29, 2022

masahi commented Mar 29, 2022

leandron commented Mar 29, 2022

masahi commented Mar 29, 2022

masahi commented Mar 29, 2022

masahi commented Mar 29, 2022 •

edited

Loading

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

t-vi commented Mar 30, 2022

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

leandron commented Apr 5, 2022

masahi commented Apr 5, 2022

bump PyTorch version to 1.11 #10794

bump PyTorch version to 1.11 #10794

Conversation

t-vi commented Mar 26, 2022

t-vi commented Mar 26, 2022

t-vi commented Mar 26, 2022

masahi commented Mar 26, 2022

masahi commented Mar 26, 2022 • edited Loading

masahi commented Mar 26, 2022

masahi Mar 26, 2022

Choose a reason for hiding this comment

t-vi Mar 26, 2022

Choose a reason for hiding this comment

t-vi commented Mar 27, 2022

masahi commented Mar 27, 2022

leandron commented Mar 27, 2022

masahi commented Mar 28, 2022 • edited Loading

t-vi commented Mar 28, 2022

t-vi commented Mar 29, 2022

masahi commented Mar 29, 2022

t-vi commented Mar 29, 2022

masahi commented Mar 29, 2022

leandron commented Mar 29, 2022

masahi commented Mar 29, 2022

masahi commented Mar 29, 2022

masahi commented Mar 29, 2022 • edited Loading

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

t-vi commented Mar 30, 2022

t-vi commented Mar 30, 2022

masahi commented Mar 30, 2022

leandron commented Apr 5, 2022

masahi commented Apr 5, 2022

masahi commented Mar 26, 2022 •

edited

Loading

masahi commented Mar 28, 2022 •

edited

Loading

masahi commented Mar 29, 2022 •

edited

Loading