Add support for gguf #8

phronmophobic · 2023-09-24T21:31:56Z

The latest llama.cpp development has deprecated the ggml format in favor of a new gguf format.

llama.cpp has chosen to break their API and make ggml models useless. The goal for llama.clj is to upgrade without breaking backwards compatibility. More research is required, but the initial plan is something like:

treat the new raw API for llama.cpp as a separate library
create a protocol for any shared functionality and implement it for both the ggml version and the latest version
create independent builds that can be included independently or together
Add better support for including your own llama.cpp build

phronmophobic · 2023-09-24T21:33:42Z

Links:

phronmophobic · 2023-09-29T21:18:44Z

I updated to the latest version of llama.cpp locally and was able to get a gguf model to run without too many changes. However, there are a still a few updates in progress for llama.cpp that I'll probably wait on before making a new release:

metal support not currently working: Metal uses Intel HD instead of AMD GPU on Intel iMac5k ggerganov/llama.cpp#2407
new decoding API: llama : custom attention mask + parallel decoding + no context swaps ggerganov/llama.cpp#3228

#8

phronmophobic · 2023-10-09T21:49:41Z

Fixed in v0.8.

phronmophobic added a commit that referenced this issue Sep 30, 2023

Initial changes for updating llama.clj to latest llama.cpp.

da92ce2

#8

phronmophobic closed this as completed Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for gguf #8

Add support for gguf #8

phronmophobic commented Sep 24, 2023

phronmophobic commented Sep 24, 2023

phronmophobic commented Sep 29, 2023

phronmophobic commented Oct 9, 2023

Add support for gguf #8

Add support for gguf #8

Comments

phronmophobic commented Sep 24, 2023

phronmophobic commented Sep 24, 2023

phronmophobic commented Sep 29, 2023

phronmophobic commented Oct 9, 2023