Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for the new GGUF format which replaces GGML #3676

Closed
apcameron opened this issue Aug 24, 2023 · 31 comments
Closed

Add Support for the new GGUF format which replaces GGML #3676

apcameron opened this issue Aug 24, 2023 · 31 comments
Labels
enhancement New feature or request stale

Comments

@apcameron
Copy link

Llama.cpp has dropped support for the GGML format and now only supports GGUF

@apcameron apcameron added the enhancement New feature or request label Aug 24, 2023
@berkut1
Copy link
Contributor

berkut1 commented Aug 24, 2023

abetlen/llama-cpp-python#628
We can only wait.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 25, 2023

You can probably use: abetlen/llama-cpp-python#633

if you merge it yourselves I would back up old llama-cpp-python. llama.cpp does not care about breaking changes.

@sirus20x6
Copy link

it looks like this is now done

@Yzord
Copy link

Yzord commented Aug 25, 2023

Still can't download them. Hope it will be supported soon :)

@sammcj
Copy link
Contributor

sammcj commented Aug 26, 2023

It looks like we're now waiting on https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels to be updated, I've added a request issue - jllllll/llama-cpp-python-cuBLAS-wheels#3

@jllllll
Copy link
Contributor

jllllll commented Aug 26, 2023

Wheels have been uploaded:

https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.79+cu117-cp310-cp310-win_amd64.whl; platform_system == "Windows"
https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.1.79+cu117-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

Keep in mind that this version of llama-cpp-python does not support GGML models. Only GGUF models.
The reason for this is that llama.cpp has dropped GGML support.

ctransformers has already been updated in the webui to support GGUF, if all you want is to try it out.

Personally, I would prefer to wait until more GGML models are converted to GGUF before updating llama-cpp-python.
This is a significant, breaking change at this point with so few GGUF models available and so many GGML models in use.
While there is a conversion script that people can use, I don't expect many people to be okay with that.

If ooba wants, I can implement the previous version of llama-cpp-python as separate packages in order to maintain GGML support, but that is a pretty messy solution even if it is temporary.

Alternatively, ctransformers can be used for GGML support as it supports both formats. This isn't a great solution either, as ctransformers is noticeably slower than llama-cpp-python, for whatever reason.

@sirus20x6
Copy link

Personally, I would prefer to wait until more GGML models are converted to GGUF before updating llama-cpp-python.

isn't there a conversion script? if so why wait?

@jllllll
Copy link
Contributor

jllllll commented Aug 26, 2023

isn't there a conversion script? if so why wait?

The conversion script is not guaranteed to work and it's usage can be somewhat involved in order to perform a proper conversion.
My concern with this is that there are many people using the webui that do not have the technical ability to run the conversion script. Updating now will force them to use ctransformers to load their GGML models, which also means they will run slower.

As I mentioned, there aren't many GGUF models available right now and the ctransformers loader already supports them.
There is very little benefit to updating llama-cpp-python until more models are available.
Remember, the latest llama-cpp-python can not load GGML models at all. Only GGUF.

@sirus20x6
Copy link

yeah, I just deleted all my GGML files because GGUF came out. Guess I'll stick with ctransformers for now, but I think people want the ease of not having to set any parameters that comes with GGUF

@Dampfinchen
Copy link

If ooba wants, I can implement the previous version of llama-cpp-python as separate packages in order to maintain GGML support, but that is a pretty messy solution even if it is temporary.

Yes, I do think this would be the ideal approach. Otherwise, many people are going to complain why it suddenly stops working. If a GGML file is detected, it just uses the older commit while GGUF would use the new one.

@sirus20x6
Copy link

what about just have the script convert them for people so no duplicate llama-cpp-python is needed? maybe a check and a "do you want to convert this to gguf?" ui element

@jllllll
Copy link
Contributor

jllllll commented Aug 26, 2023

If ooba wants, I can implement the previous version of llama-cpp-python as separate packages in order to maintain GGML support, but that is a pretty messy solution even if it is temporary.

Yes, I do think this would be the ideal approach. Otherwise, many people are going to complain why it suddenly stops working. If a GGML file is detected, it just uses the older commit while GGUF would use the new one.

I have written the code needed to support this here: jllllll@4a999e3
I will make a PR for it if ooba wants to use it: #3695

@jllllll
Copy link
Contributor

jllllll commented Aug 26, 2023

what about just have the script convert them for people so no duplicate llama-cpp-python is needed? maybe a check and a "do you want to convert this to gguf?" ui element

The conversion script is not guaranteed to work with every model.
One such model was just discovered in the Discord server: MythoMax-L2-13B
The script is not intended to be the main method of creating GGUF models.
It is intended to be a backup for those who don't have the hardware to create the GGUF model from scratch.
The intended method of creating GGUF models is to convert HF models directly to GGUF, which requires loading the full HF model.
This just isn't feasible for most people.


Edit: The issue with converting MythoMax-L2-13B has been fixed.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 26, 2023

Hope it works for those 70B I d/l over the last week. I think this is the 3rd or 4th time they deprecate a format and it's always all or nothing.

@jllllll
Copy link
Contributor

jllllll commented Aug 26, 2023

Fortunately, GGUF is designed to be expandable. So, this should be the last format deprecation.

@Patronics
Copy link
Contributor

Patronics commented Aug 27, 2023

I was just starting to experiment with installing local LLMs (I've wanted to experiment with them for ages but been too busy), but seems I've picked a tumultuous time to start, so eagerly waiting for this migration to be complete so I can download and start playing with the models that won't be obsolete in a week!

Edit: It seems to now be supported! I successfully am running TheBloke's GGUF CodeLlama release!

@FartyPants
Copy link
Contributor

FartyPants commented Aug 27, 2023

Works, but currently GGUF speak like Yoda:
Greetings, I am here to provide. Is there you need help with?
fix:
abetlen/llama-cpp-python#644

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 27, 2023

What is this I hear about GGUFv2 and header compatibility removed by October?

@berkut1
Copy link
Contributor

berkut1 commented Aug 27, 2023

@Ph0rk0z yes, GGUFv2 was released, which is BC :) They decided it was better, because GGUFv1 is not so much yet.
So again, need to wait llama-cpp-python

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 27, 2023

Right.. but they deprecate the format again? Why not deprecate the format now or keep it? Waiting till October so people upload already obsolete models?

@jllllll
Copy link
Contributor

jllllll commented Aug 27, 2023

The GGUFv2 implementation is still compatible with v1. I don't think it will cause any issues.
v2 is pretty much just for 64-bit models.

@berkut1
Copy link
Contributor

berkut1 commented Aug 27, 2023

@Ph0rk0z well. someone says, that backward compatibility seems to work fine ggerganov/llama.cpp#2821

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 27, 2023

Right.. but I read this in the code


// NOTE: temporary handling of GGUFv1 >> remove after Oct 2023
static bool gguf_fread_str_cur(FILE * file, struct gguf_str * p, size_t * offset) {
    p->n    = 0;
    p->data = NULL;

    bool ok = true;

also
case GGUF_FILE_VERSION_V1: return "GGUF V1 (support until nov 2023)";

@jllllll
Copy link
Contributor

jllllll commented Aug 27, 2023

Why on earth would they do that?
The whole point of GGUF was to eliminate the need for format deprecation:
https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md#specification

If they keep doing that, people just won't bother with GGUF. They could get away with frequent format deprecation in the past because there wasn't all that many llama-based models back then. Now, there are too many models using GGMLv3 to want to have to deal with constantly redownloading and reconverting models.

@berkut1
Copy link
Contributor

berkut1 commented Aug 27, 2023

@Ph0rk0z Yes, I noticed that too when I checked the code. My first thoughts were based because of their initial discussion.

@jllllll I hope that is the last BC, it looks like their just forgot about 64 bit.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 27, 2023

Yea.. so people will upload GGUFv1 models next 2 months for what? It has only been a couple of days. Why not convert them now when there are few. But instead, no mention of this and we have to go looking for it.

I have low bandwidth so can't just re download every 70b.

Had this situation before, when GPU offloading was first created. All the GGML I downloaded had to be requantized from scratch or re-downloaded. You couldn't even use a script to convert them.

And I can't not bother with GGUF because some good PRs got merged after.

@mechanicmuthu
Copy link

They are saying in this discussion that its a simple command line conversion between GGUF v1 to GGUF v2. The deprecation in this GGUF format is simply for simplicity rather than having a technical reason to stop supporting GGUF v1.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Aug 27, 2023

Right! Very simple! Little chance of failure.

I don't need --allow-requantize or --leave-output-tensor, right?

It probably will work for the non-k-quants types but pretty sure k-quants won't work. (There were also some changes to the decisions k-quants makes for LLaMA2 70B models so in that particular case it wouldn't pass through all the tensors even if the other issues were dealt with.)

'spose I'll see how it goes for the current quants and turning them into GGUF.. but will they be GGUFv1 or v2 when I use the script now?

@Dharmavineta
Copy link

Dharmavineta commented Sep 5, 2023

After days of troubleshooting and literally having brain freeze over the "can't load the model" error, I came across this thread now!!(RIP me) I burned about 30 gb downloading all kinds of models and none of them are gguf(talk about being unlucky). Will atleast the gguf version stay for a while now? I don't want to end up with unsupported versions and then again break my head over the same issue in a week time.

Also what are versions does langchain work with? Does the gptq version work or is it only gguf?

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Sep 5, 2023

Convert the GGML to GGUF if you got them recently it will work.

@github-actions
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests