LLaVA does not offload layers to GPU #3616

ruslanmustafin · 2023-10-13T15:00:24Z

The issue was already mentioned in #3436. Creating a separate issue so that it does not get lost.

I run LLaVA with (commit id: 1e0e873)

./llava -m ggml-model-q5_k.gguf \
        --mmproj mmproj-model-f16.gguf \
        --temp 0.1 -ngl 64 -mg 0 \
        --image n008-2018-09-18-14-54-39-0400__CAM_FRONT__1537297366762404.jpg

This the relevant parts from the output:

ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6

...

llm_load_tensors: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device
llm_load_tensors: mem required  = 4560.96 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
..................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size = 162.13 MB
llama_new_context_with_model: VRAM scratch buffer: 156.00 MB
llama_new_context_with_model: total VRAM used: 156.00 MB (model: 0.00 MB, context: 156.00 MB)

...

main: image encoded in  1561.49 ms by CLIP (    2.71 ms per image patch)

llama_print_timings:        load time =    3042.21 ms
llama_print_timings:      sample time =      11.65 ms /   136 runs   (    0.09 ms per token, 11671.82 tokens per second)
llama_print_timings: prompt eval time =    9440.69 ms /   626 tokens (   15.08 ms per token,    66.31 tokens per second)
llama_print_timings:        eval time =   47661.78 ms /   136 runs   (  350.45 ms per token,     2.85 tokens per second)
llama_print_timings:       total time =   58800.36 ms
```

The text was updated successfully, but these errors were encountered:

monatis mentioned this issue Oct 14, 2023

Fix Cuda offloading in llava #3621

Merged

KerfuffleV2 closed this as completed in #3621 Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA does not offload layers to GPU #3616

LLaVA does not offload layers to GPU #3616

ruslanmustafin commented Oct 13, 2023 •

edited

Loading

LLaVA does not offload layers to GPU #3616

LLaVA does not offload layers to GPU #3616

Comments

ruslanmustafin commented Oct 13, 2023 • edited Loading

ruslanmustafin commented Oct 13, 2023 •

edited

Loading