Support AVX2 in windows better #381

Mike-Bell · 2023-01-06T16:21:53Z

I spent quite a while trying to figure out why AVX2 wasn't enabled when I built this on Windows in my build system. Adding /arch:AVX2 to the C-compiler seems to fix it. I'm not building through Visual Studio but via a more customized build system (leveraging MSBuild), so that's likely why this isn't required for other people building on Windows, but since ggml.c is a C file (not C++), it seems like this should be correct and necessary in some cicrumstances.

RndyP · 2023-01-06T16:44:28Z

Thanks Mike! I had tried AVX2 and it worked, but was extremely slow. The JFK.wav sample took 17 seconds, where with AVX it takes 4.5 seconds. Using Visual Studio with AVX2 set under C/C++ Code Generation -> Enable Enhanced Instruction Set -> Advanced Vector Extensions 2 (/arch:AVX2) time went from 4.5 to about 3.6 seconds. Nice improvement.

RndyP · 2023-01-06T19:12:12Z

I did some research into what's going on here. One of the most critical sections of code is in ggml_vec_dot_f16(). AVX2 has fused multiply add instructions and AVX does not. So with AVX2 we have:

Without (just AVX) we have:

Note in the AVX2 case the disassembly shows the use of the fused intrinsics vfmadd231ps

ggerganov · 2023-01-06T19:21:38Z

@RndyP
So recently, I was revisiting the SIMD code in ggml.c and I came to the realisation that we don't use AVX2 at all because all x86 intrinsics that we currently use are either AVX, FMA or F16C. This is based on lookup in the following site:

https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html

For example, the case that you refer to I think actually depends on wether FMA is supported or not:

whisper.cpp/ggml.c

Lines 483 to 487 in 87dd4a3

    
           #if defined(__FMA__) 
        
               #define GGML_F32x8_FMA(a, b, c) _mm256_fmadd_ps(b, c, a) 
        
           #else 
        
               #define GGML_F32x8_FMA(a, b, c) _mm256_add_ps(_mm256_mul_ps(b, c), a) 
        
           #endif

So it is strange that AVX2 flags make any difference at all on Windows. My understanding is that AVX2 should not make a difference, but I might be missing something.

RndyP · 2023-01-06T20:42:25Z

I think what's happening is the Microsoft compiler is smart enough to convert the _mm256_add_ps(_mm256_mul_ps(b, c), a) to the fused assembly version, when the compiler option is set. The code above goes from 16 down to 12 vector instructions. 4.5 to 3.6 seconds is a nice improvement.

Support AVX2 in windows better

8f4f7d9

Mike-Bell force-pushed the SupportWindowsAvx2Better branch from 4faae25 to 8f4f7d9 Compare January 6, 2023 17:05

ggerganov merged commit 41e05c6 into ggerganov:master Jan 6, 2023

Mike-Bell deleted the SupportWindowsAvx2Better branch January 6, 2023 17:46

rock3125 pushed a commit to rock3125/whisper.cpp that referenced this pull request Feb 21, 2023

cmake : support AVX2 in Windows better (ggerganov#381)

7b8fc7d

anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request Apr 28, 2023

cmake : support AVX2 in Windows better (ggerganov#381)

499563f

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023

cmake : support AVX2 in Windows better (ggerganov#381)

d40c2be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AVX2 in windows better #381

Support AVX2 in windows better #381

Mike-Bell commented Jan 6, 2023 •

edited

Loading

RndyP commented Jan 6, 2023 •

edited

Loading

RndyP commented Jan 6, 2023 •

edited

Loading

ggerganov commented Jan 6, 2023 •

edited

Loading

RndyP commented Jan 6, 2023

Support AVX2 in windows better #381

Support AVX2 in windows better #381

Conversation

Mike-Bell commented Jan 6, 2023 • edited Loading

RndyP commented Jan 6, 2023 • edited Loading

RndyP commented Jan 6, 2023 • edited Loading

ggerganov commented Jan 6, 2023 • edited Loading

RndyP commented Jan 6, 2023

Mike-Bell commented Jan 6, 2023 •

edited

Loading

RndyP commented Jan 6, 2023 •

edited

Loading

RndyP commented Jan 6, 2023 •

edited

Loading

ggerganov commented Jan 6, 2023 •

edited

Loading