-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AVX2 in windows better #381
Conversation
Thanks Mike! I had tried AVX2 and it worked, but was extremely slow. The JFK.wav sample took 17 seconds, where with AVX it takes 4.5 seconds. Using Visual Studio with AVX2 set under C/C++ Code Generation -> Enable Enhanced Instruction Set -> Advanced Vector Extensions 2 (/arch:AVX2) time went from 4.5 to about 3.6 seconds. Nice improvement. |
4faae25
to
8f4f7d9
Compare
@RndyP https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html For example, the case that you refer to I think actually depends on wether FMA is supported or not: Lines 483 to 487 in 87dd4a3
So it is strange that AVX2 flags make any difference at all on Windows. My understanding is that AVX2 should not make a difference, but I might be missing something. |
I think what's happening is the Microsoft compiler is smart enough to convert the _mm256_add_ps(_mm256_mul_ps(b, c), a) to the fused assembly version, when the compiler option is set. The code above goes from 16 down to 12 vector instructions. 4.5 to 3.6 seconds is a nice improvement. |
I spent quite a while trying to figure out why AVX2 wasn't enabled when I built this on Windows in my build system. Adding
/arch:AVX2
to the C-compiler seems to fix it. I'm not building through Visual Studio but via a more customized build system (leveraging MSBuild), so that's likely why this isn't required for other people building on Windows, but since ggml.c is a C file (not C++), it seems like this should be correct and necessary in some cicrumstances.