Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bench fails on Apple M3 Pro #2322

Closed
jtrmal opened this issue Jul 25, 2024 · 2 comments · Fixed by #2324
Closed

bench fails on Apple M3 Pro #2322

jtrmal opened this issue Jul 25, 2024 · 2 comments · Fixed by #2324

Comments

@jtrmal
Copy link

jtrmal commented Jul 25, 2024

Hi,
for the current master branch, the bench command (./bench -m ./models/ggml-small.en.bin -t 4) fails with following error:

system_info: n_threads = 4 / 14 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0
GGML_ASSERT: ggml.c:3147: view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)
Abort trap: 6

The revision 858452d mentioned in benchmark results here #89 (comment) (in #89 ) works fine.

I managed to bisect this to the commit f842d31

Let me know if you need more info from me
Full log:

whisper_init_from_file_with_params_no_state: loading model from '/Users/jtrmal/projects/whisper.cpp/models/ggml-small.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3 (small)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:    Metal total size =   487.00 MB
whisper_model_load: model size    =  487.00 MB
whisper_backend_init_gpu: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M3 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 77309.41 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   56.62 MB
whisper_init_state: kv cross size =   56.62 MB
whisper_init_state: kv pad  size  =    4.72 MB
whisper_init_state: compute buffer (conv)   =   22.41 MB
whisper_init_state: compute buffer (encode) =  284.68 MB
whisper_init_state: compute buffer (cross)  =    6.18 MB
whisper_init_state: compute buffer (decode) =   98.65 MB

system_info: n_threads = 4 / 14 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0
GGML_ASSERT: ggml/src/ggml.c:3593: view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)
Abort trap: 6
@jtrmal
Copy link
Author

jtrmal commented Jul 25, 2024

Oh and I should mention, ./main -m /Users/jtrmal/projects/whisper.cpp/models/ggml-small.en.bin -f samples/jfk.wav -t 4 works just fine even on master -- just the bench command doesnt.
Full log of "main"

$ ./main -m /Users/jtrmal/projects/whisper.cpp/models/ggml-small.en.bin -f samples/jfk.wav -t 4
whisper_init_from_file_with_params_no_state: loading model from '/Users/jtrmal/projects/whisper.cpp/models/ggml-small.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3 (small)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:    Metal total size =   487.00 MB
whisper_model_load: model size    =  487.00 MB
whisper_backend_init_gpu: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M3 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 77309.41 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   56.62 MB
whisper_init_state: kv cross size =   56.62 MB
whisper_init_state: kv pad  size  =    4.72 MB
whisper_init_state: compute buffer (conv)   =   22.41 MB
whisper_init_state: compute buffer (encode) =  284.68 MB
whisper_init_state: compute buffer (cross)  =    6.18 MB
whisper_init_state: compute buffer (decode) =   98.65 MB

system_info: n_threads = 4 / 14 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   226.11 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     4.28 ms
whisper_print_timings:   sample time =    23.14 ms /   139 runs (    0.17 ms per run)
whisper_print_timings:   encode time =   144.78 ms /     1 runs (  144.78 ms per run)
whisper_print_timings:   decode time =    15.86 ms /     3 runs (    5.29 ms per run)
whisper_print_timings:   batchd time =   222.12 ms /   132 runs (    1.68 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =   652.64 ms
ggml_metal_free: deallocating

@ggerganov
Copy link
Owner

Thank you for investigating - #2324 should resolve the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants