-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correct placement of +PTX flag + correcting cuda arch list #114
Comments
It seems we are only missing 3.7 which is useful for the K80 Anyway, pytorch wants to drop it :/ |
I am 100% fine with dropping it! 😜 |
ok, lets not change the build script until we get a user that specifically asks for it. |
I think PTX allows you to use an
|
Okay, sounds good to me. Let's close this then for the time being! |
Thanks for engaging productively/constructively and patiently btw, not just with me but with others as well. 🌟 |
That's a seriously short architecture list, and in fact both of those have been deprecated as of cuda 11. I've had something like
in the faiss-feedstock for a long time already (actually dropping 3.5/5.0 for cuda 11+). PTX is not really great because the runtime compilation can look like a mysterious hang, and it can take a while. IMO we should pre-compile for the most common GPU architectures, even though that will unfortunately further increase compile times. |
Not sure if the links are all up-to-date, but I tried to document (mostly for myself) what I was doing in the build script: https://github.com/conda-forge/faiss-split-feedstock/blob/main/recipe/build-lib.sh |
The is is appended to based on the cuda version https://github.com/conda-forge/pytorch-cpu-feedstock/blob/main/recipe/build_pytorch.sh#L86 I don't think anybody is falling back on PTX. I'm not sure we even ship it by default. |
Yes, that was my point of this issue, so to move from:
to:
i.e. dropping "PTX" and changing 3.5 to 3.7. As a diff starting L84 in build.sh: if [[ ${cuda_compiler_version} != "None" ]]; then
export USE_CUDA=1
- export TORCH_CUDA_ARCH_LIST="3.5;5.0+PTX"
+ export TORCH_CUDA_ARCH_LIST="3.7;5.0"
if [[ ${cuda_compiler_version} == 9.0* ]]; then
export TORCH_CUDA_ARCH_LIST="$TORCH_CUDA_ARCH_LIST;6.0;7.0"
Sorry for the miscommunication. I concur with the suggestion that we don't change things unless someone asks for the time being, I am okay with that too. It is not an urgent issue |
I think we should keep the PTX. It ensure that those that buy the newest GPUs still get a working package. It should not affect those with existing GPUs. |
My concern with PTX is the speed of compilation. I understand what you're saying about newer GPUs, but I would be blunt and say those fancy new GPUs should probably get the newer pytorch releases. My vague worry is something along the lines:
|
It's on my list of items to test next time I run/build with singularity on an HPC (I try to keep a low profile...). If I find the compilation time differs, I will come back and reopen this issue... |
Agreed.
For compute capability greater than 8.6, it will generate PTX code and SASS will be generated at runtime.
Why? Generating for PTX is only one and removing it won't magically speed things up. For cuda 11.2, you'll be reducing 1/9th of the total time |
Oh, so adding PTX is only like adding for one additional arch? I obviously do not know enough. My worry/concern was that, e.g.,
was being compiled as
and so in other words, each one above the +PTX flag is being repeated twice... I agree with 1/9th if we move it to the end, that makes sense to me. |
Exactly. There's no magic reshuffling of that list going on, every specifier seperated by
That really doesn't change anything, and seems like a gratuitous deviation from upstream, so I suggest to just not touch it.
Removal has been discussed, in particular because one issue is JIT startup time the first time you do One real-world example: say you are a visual effects studio and have a large software stack, e.g. a rendering pipeline for 3D animation, of which some PyTorch models are one small piece. Then you work on a game or a movie; at the start you do quality assurance (QA) for your whole software stack. Now you are two years in, and new GPU types have become the standard. What you want to do is plus those in, and have them "just work" (you are relying on there being no bugs in CUDA's PTX support, but nothing else - and that turns out to be acceptable). If you instead pull in a new release from the |
It does change something. The PTX can be optimized for compute capability 8.6+ devices when 8.6+PTX is given, whereas 5.0+PTX generates PTX valid for compute capability 5.0+ devices and may miss optimizations. |
Thanks for the correction - I did some digging through the CUDA docs, and it indeed seems to be recommended to include PTX only for the newest architecture. Since this hasn't been tested upstream, a one-off test with |
Do you want to run this test at runtime on a machine with a real GPU? |
Yes, if only just once - and can even be post-release, because no user will run into it straight away. But it seems healthy to do. I have a GPU, so I can volunteer to run the tests against a new conda-forge package once it's built. |
Can you review the changes in #116 |
The one last thing I would inquire about here is: 3.5 vs 3.7. Should we match upstream replacing 3.5 with 3.7 or should we just let it go for the time being? Do people have strong opinions? |
Wow, I never thought this would be an application. Makes sense. I guess it would also apply to people using pytorch in like a deployment setting (e.g. an app or inference model or something). My point of view is often limited to research settings and so discussing things with people like you helps a lot! Thanks! |
pytorch upstream just tacks PTX on the end of the arch list https://github.com/pytorch/pytorch/blob/88f4a1240275205f630dbaca97b1f5d7977f25f8/torch/utils/cpp_extension.py#L1751 |
That's for dynamic compilation IIRC, for package builds it's quite hard to figure out what the up-to-date scripts are, but I believe it's still these: https://github.com/pytorch/builder/search?q=TORCH_CUDA_ARCH_LIST&type=code |
Solution to issue cannot be found in the documentation.
Issue
Following (now deleted) comments in #108:
maybe +PTX should be removed or replaced at the end of the arch list we have here.
https://pytorch.org/docs/stable/cpp_extension.html
Also, currently, this feedstock uses:
as base cuda_arch list, but it should actually be
"3.7+PTX;5.0"
to match upstream. Debatable if the PTX is useful here --- see above.Installed packages
Environment info
The text was updated successfully, but these errors were encountered: