-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simple, but inelegant support for non-x86 using SIMDe #1163
Conversation
Hi Michael, thanks for the PR! Should we pass this flag explicitly, i.e.
Cheers |
Hey Alex. For Debian we set The reason those are separate is that we also do a compilation with an additional compiler flag but with We then ship all the versions and install a wrapper script (that picks the best version based upon the user's CPU) in place of the original executable |
Sorry for not explaining this very well the first time! |
Hi Michael, thanks a lot for the explanation! I will make sure that if SFX is defined, then it will just use SFX logic. Cheers |
With https://github.com/alexdobin/STAR/pull/1163/files#diff-e3c6cef1e1cddf53d9c53744c9c5baaf1159edc926f45db748ea4eb02f2b1fc6R10 opal can use sub-SSE 4.1, like SSE2; or without SSE all together.
You are very welcome! |
For more context on how Debian uses this patch to build for x86 systems see https://salsa.debian.org/med-team/rna-star/-/blob/master/debian/rules#L26 & https://salsa.debian.org/med-team/rna-star/-/blob/master/debian/bin/simd-dispatch |
Hi Michael, I am not sure I understand how to make SIMDe work properly. I have tried this on a machine with max sse4.2, and it threw compile errors with -mavx2 , -msse4.2 , -msse4.1 Many thanks again! |
Hey Alex, SIMDe enables your code to work the same way for the capabilities it was compiled for. For example, in Debian we compile STAR multiple times, once each for AVX2, AVX, SSE4.2, etc.. If you want one binary that will run using the highest SIMD level available, then you use https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html or compile the relevant compilation unit (library) multiple times and add your own dispatching code. This is a bit more advanced :-) |
Hi Michael, sorry, I am still missing something. Opal supports sse4.1, sse4.2, avx and avx2. Thanks! |
SIMDe lets Opal use SSE4.1 intrinsics in its source and the result can be compiled for SSE4.1 ( non-SSE4.1 on X86 systems is required for Debian, as per the "amd64" Debian Architecture baseline For ARM & ARM64, SIMDe includes optimized implementation of the Intel SSE* intrinsics using the ARM NEON instructions SIMDe is insufficient on its own to achieve a single binary that can run on any X86 processor at the highest level of SIMD support; for that one needs a CPU dispatcher in addition. |
Thanks for the clarification! The original Opal version compiles fine with -mavx and -msse4.2 The SIMDe Opal version compiles with -mavx, throwing a few warning like those:
However,it seems the object still contains AVX2 commands:
I also see these commands when I compile the original Opal with -mavx Another problem is that the compilation with -msse4.2 fails:
I wonder if this may indicate some sort of incompatibility between Opal and SIMDe. Cheers |
Using AVX for example (just so I don't have to type both a bunch of times): The -mavx flags tell the compiler that it is allowed to generate AVX code. This doesn't just mean that you can explicitly call AVX functions; the compiler is actually allowed to use those features for all code. Since AVX code is allowed, the compiler also defines the The On most compilers you're actually not allowed to call native AVX functions if you didn't pass the flag to enable AVX; you'll usually get an error about a "target mismatch". It is possible to get around this for a specific function (using the target attribute), but honestly that's not used all that often since not all compilers support it. Defining So basically what SIMDe allows you to do is call functions like As for the redefinition warnings, usually that happens because you're including both the native header (e.g., immintrin.h) and SIMDe, but a bug in SIMDe is definitely feasible. If you can post some info on how to reproduce the warnings (commands used and whether you have made any modifications to Michael's patch) I'd be happy to take a look and see if I can figure out what is going on. FWIW, SIMDe should work without warnings with -mavx (in fact, we have a CI job to try to make sure it does). |
This may be due to me having made an amalgamated header for |
I doubt it; the amalgamated header shouldn't really be any different… the scrcipt just scans the files and replaces I managed to reproduce the error @alexdobin is seeing, it looks like it only happens on very old versions of GCC. It looks like libc's opt_random.h (included indirectly from opal.cpp via algorithm.h) includes x86intrin.h, which includes immintrin.h (which is where the AVX types are defined). I'm not sure when exactly GCC fixed that issue, but we can work around it with a pretty straightforward patch in SIMDe: e8b7a2ec175ceb3725ce0827ef9a6725b6309cc9. |
Hi Michael and Evan, thanks a lot for the explanations, and for looking into the compilation issues. The warning might be indeed related to the old GCC. I do not get any warnings with 9.2.0 . It would be great if it could work with older versions as many people still have them. As a stopgap, I could make SIMDe optional for such cases. However, there is a more serious issue. Not sure if it's feasible, but I wonder if SIMDe can (optionally) use its own architecture target, independent of the compiler's -m... option. This would simplify retrofitting old codes for SIMDe. Cheers |
Sorry, I wasn't really clear in my previous comment: the patch I linked to is already merged into SIMDe. As soon as the copy of SIMDe in the patch is updated to the newer version it should work on older compilers. I don't have a copy of RHEL 6.9 sitting around, but it works on Ubuntu 16.04 with GCC 5 (where I was able to reproduce the problem before applying that patch to SIMDe). I don't think you ever answered Michael's question about the amalgamated header vs. a submodule. Basically, there is the main SIMDe repository, which is pretty massive because of the test suite. There is also a simde-no-tests repo which is just the part of SIMDe designed to be used from other projects and works well as a submodule or subtree. Finally, there is a script we can use to generate a single large amalgamated header (basically in just concatenates a bunch of the SIMDe headers into a single file), which is what this patch currently uses. If you're comortable using submodules (or subtrees) then the simde-no-tests repo is a great way to go since it makes updating your copy of SIMDe really straightforward. If you don't like submodules an amalgamated header works well, but updating it is slightly more involved.
There are a couple of options here. Do you have any code you could use for benchmarking a few different options? If so, you could first try always using the AVX2 code path. SIMDe will automatically use whatever ISA extension you target really supports, even if it's only SSE 4.1 or SSE2 (or NEON for that matter). There is often a bit of overhead when converting between 128- and 256-bit vectors, but depending on the code it might not be noticable. You could just change your If there is a significant difference my suggestion would be to change those macros so that instead of switching based on Basically, instead of #if defined(__AVX2__)
use_avx2_implementation();
#elif defined(__SSE4_1__)
use_sse4_1_implementation();
#else
error();
#endif You could just do: #if SIMDE_NATURAL_VECTOR_SIZE >= 256
use_avx2_implementation();
#else
use_sse4_1_implementation();
#endif
It should be possible to do that, but I don't think it would really simplify the process. Rather the opposite, most likely :( The logic for choosing which features to enable is in simde/simde-features.h. For example, if you're compiling with If you're talking about going the other way (having SIMDe call native AVX2 implementations when compiled without #define SIMDE_X86_AVX2_NATIVE
#pragma GCC push_options
#pragma GCC target("avx2")
#include <simde/x86/avx2.h>
#pragma GCC pop_options But I don't think that would have the effect you want anyways. You still don't have dynamic dispatch, and the generated code depends on AVX2, so why not just use I'd really suggest one of the options I mentioned above; most likely the Is opal the only place you're using SIMD code? If so, it sounds like you have a pretty straightforward way to leverage the upcoming glibc-hwcaps support. Just compile a few different variations of opal as a library and let glibc choose the best implementation at runtime. |
@alexdobin I updated the bundled SIMDe version to include @nemequ 's patch |
Hi Evan, Michael, thanks for the suggestions and update! I am leaning towards taking the easiest path (for the users and myself), as you suggested: forcing opal to always choose AVX and SIMDe to do the translation for all the lower architectures. Even if it might be somewhat less efficient for SSE4.1, it's not going to affect the overall performance too much, since this operation is only ~5% of the overall run time. For the same reason, I would stay with the amalgamated header for now. Submodules are nice but require an extra step of pulling submodules, which may be cumbersome for some users. I am thinking of making the submodule optional in the future, for the advanced users who want the most up-to-date SIMDe. It will take me a couple of days to finalize the changes and test them, and then I will get back to you. Cheers |
Hi Evan, Michael, I have finally tested the changes described above, modifying opal to always use avx2. Thanks a lot for your help! |
You are very welcome @alexdobin ! |
Could address #1160
Do you prefer a git submodule instead of the amalgamated header? Happy to switch that out.
The
$(SFX)
is useful for compiling different executables with-mavx
forSTAR-avx
and-mav2
forSTAR-avx2
, etc..Or you could add in a dispatcher around opal and compile that object multiple times. That would make the resulting binary much more portable and would be appreciated by docker container users, bioconda, etc..