Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load on iOS device after update - Compiler encountered an internal error #7085

Closed
Animaxx opened this issue May 5, 2024 · 4 comments · Fixed by #7169
Closed

Unable to load on iOS device after update - Compiler encountered an internal error #7085

Animaxx opened this issue May 5, 2024 · 4 comments · Fixed by #7169

Comments

@Animaxx
Copy link

Animaxx commented May 5, 2024

Edited: Starting from b2771 build, it always gets the error "Compiler encountered an internal error" on iOS, it works before

llama_model_loader: loaded meta data with 19 key-value pairs and 201 tensors from /var/mobile/Containers/Data/Application/35D0FF27-8033-45D8-AE25-97B59FBB9AF9/Documents/tinyllama-1.1b-1t-openorca.Q4_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = jeff31415_tinyllama-1.1b-1t-openorca
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_0:  155 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 606.53 MiB (4.63 BPW) 
llm_load_print_meta: general.name     = jeff31415_tinyllama-1.1b-1t-openorca
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.20 MiB
ggml_backend_metal_log_allocated_size: allocated buffer, size =   571.39 MiB, (  571.39 /  2730.67)
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors:        CPU buffer size =    35.16 MiB
llm_load_tensors:      Metal buffer size =   571.39 MiB
.....................................................................................
Using 4 threads
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A13 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/9FE37B94-DFB9-4D5A-BEE7-A49A6BE4CFD0/llama.swiftui.app/llama_llama.bundle/default.metallib'
ggml_metal_init: GPU name:   Apple A13 GPU
ggml_metal_init: GPU family: MTLGPUFamilyApple6  (1006)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = false
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  2863.32 MB
fopen failed for data file: errno = 2 (No such file or directory)
Errors found! Invalidating cache...
ggml_metal_init: skipping kernel_mul_mm_f32_f32                    (not supported)
ggml_metal_init: skipping kernel_mul_mm_f16_f32                    (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_1_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_1_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q8_0_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q2_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q3_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q4_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q5_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_q6_K_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xxs_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_xs_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_xxs_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq3_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq2_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_s_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq1_m_f32                  (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_nl_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_iq4_xs_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f32_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_f16_f32                 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_1_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_1_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q8_0_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q2_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q3_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q4_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q5_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_q6_K_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xxs_f32             (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_xs_f32              (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_xxs_f32             (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq3_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq2_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_s_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq1_m_f32               (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_nl_f32              (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_iq4_xs_f32              (not supported)
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 3 try
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 3 try
ggml_metal_init: error: load pipeline error: Error Domain=AGXMetalA13 Code=3 "Compiler encountered an internal error" UserInfo={NSLocalizedDescription=Compiler encountered an internal error}
llama_new_context_with_model: failed to initialize Metal backend
Could not load context!
Error: The operation couldn’t be completed. (llama_swiftui.LlamaError error 0.)
Snapshotting a view (0x103b42e10, _UIButtonBarStackView) that is not in a visible window requires afterScreenUpdates:YES.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Requesting visual style in an implementation that has disabled it, returning nil. Behavior of caller is undefined.
Snapshotting a view (0x103b42e10, _UIButtonBarStackView) that is not in a visible window requires afterScreenUpdates:YES.
@Animaxx Animaxx changed the title Unable to load on iOS device after update Unable to load on iOS device after update - Compiler encountered an internal error May 5, 2024
@Animaxx
Copy link
Author

Animaxx commented May 9, 2024

I feel it might related to Flash Attention implementation #5021

@ggerganov
Copy link
Owner

Please test #7169

@Animaxx
Copy link
Author

Animaxx commented May 10, 2024

Please test #7169

it works on your branch! thank you for the update!

@Animaxx
Copy link
Author

Animaxx commented May 10, 2024

verified and works on branch gg/metal-fattn-reqs

@Animaxx Animaxx closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants