We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The issue was already mentioned in #3436. Creating a separate issue so that it does not get lost.
I run LLaVA with (commit id: 1e0e873)
./llava -m ggml-model-q5_k.gguf \ --mmproj mmproj-model-f16.gguf \ --temp 0.1 -ngl 64 -mg 0 \ --image n008-2018-09-18-14-54-39-0400__CAM_FRONT__1537297366762404.jpg
This the relevant parts from the output:
ggml_init_cublas: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6 Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6 ... llm_load_tensors: using CUDA for GPU acceleration ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device llm_load_tensors: mem required = 4560.96 MB llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/35 layers to GPU llm_load_tensors: VRAM used: 0.00 MB .................................................................................................. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: kv self size = 1024.00 MB llama_new_context_with_model: compute buffer total size = 162.13 MB llama_new_context_with_model: VRAM scratch buffer: 156.00 MB llama_new_context_with_model: total VRAM used: 156.00 MB (model: 0.00 MB, context: 156.00 MB) ... main: image encoded in 1561.49 ms by CLIP ( 2.71 ms per image patch) llama_print_timings: load time = 3042.21 ms llama_print_timings: sample time = 11.65 ms / 136 runs ( 0.09 ms per token, 11671.82 tokens per second) llama_print_timings: prompt eval time = 9440.69 ms / 626 tokens ( 15.08 ms per token, 66.31 tokens per second) llama_print_timings: eval time = 47661.78 ms / 136 runs ( 350.45 ms per token, 2.85 tokens per second) llama_print_timings: total time = 58800.36 ms ```
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
The issue was already mentioned in #3436. Creating a separate issue so that it does not get lost.
I run LLaVA with (commit id: 1e0e873)
This the relevant parts from the output:
The text was updated successfully, but these errors were encountered: