Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cuda offloading in llava #3621

Merged
merged 1 commit into from
Oct 14, 2023
Merged

Fix Cuda offloading in llava #3621

merged 1 commit into from
Oct 14, 2023

Conversation

monatis
Copy link
Collaborator

@monatis monatis commented Oct 14, 2023

closes #3616

I simply forgot to set n_gpu_layers when loading the model. This should fix it.

Copy link
Collaborator

@KerfuffleV2 KerfuffleV2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty straightforward. Tested, seems to work (with ROCM even). The text generates much faster when offloading, as expected.

"The image features a white fox sitting on the ground, with its mouth wide open, possibly yawning or growling. The fox appears to be in a forest setting, surrounded by grass and trees. The scene is depicted in a black and white style, giving it a classic and timeless feel." About my profile picture. It's supposed to be a wolf cub, but still pretty impressive!

@KerfuffleV2 KerfuffleV2 merged commit 11dc109 into master Oct 14, 2023
35 of 40 checks passed
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 19, 2023
* 'master' of github.com:ggerganov/llama.cpp:
  fix embeddings when using CUDA (ggerganov#3657)
  llama : avoid fprintf in favor of LLAMA_LOG (ggerganov#3538)
  readme : update hot-topics & models, detail windows release in usage (ggerganov#3615)
  CLBlast: Fix temporary buffer size for f16 conversion (wsize)
  train-text-from-scratch : fix assert failure in ggml-alloc (ggerganov#3618)
  editorconfig : remove trailing spaces
  server : documentation of JSON return value of /completion endpoint (ggerganov#3632)
  save-load-state : fix example + add ci test (ggerganov#3655)
  readme : add Aquila2 links (ggerganov#3610)
  tokenizer : special token handling (ggerganov#3538)
  k-quants : fix quantization ranges (ggerganov#3646)
  llava : fix tokenization to not add bos between image embeddings and user prompt (ggerganov#3645)
  MPT : support GQA for replit-code-v1.5 (ggerganov#3627)
  Honor -ngl option for Cuda offloading in llava (ggerganov#3621)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LLaVA does not offload layers to GPU
2 participants