Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v2.1.2 #209

Closed
wants to merge 58 commits into from
Closed

Release v2.1.2 #209

wants to merge 58 commits into from

Conversation

amakropoulos
Copy link
Collaborator

@amakropoulos amakropoulos commented Aug 16, 2024

@ltoniazzi
Copy link
Contributor

ltoniazzi commented Aug 21, 2024

I was trying to check the adapters work using the test gguf files from llama.cpp (generated by running test-lora-conversion-inference.sh, or you can find the gguf files directly here).

These models are overfitted to return the same sentence for the same initial word, but I am struggling to make them work in branch release/v2.1.2. I think it's because the prompt is templated as a chat, so a user saying Hello sends

"<|user|>\nHello<|end|>\n<|assistant|>\n"

Instead of (as in llama.cpp tests)

"<bos>Hello"

Would it be helpful if I train similarly small overfitted models to test different adapters respond correctly in a chat?
Then they can be used to test the hot-swapping as well.

Or is there a mode where the user input is not sent within a chat template?

@amakropoulos
Copy link
Collaborator Author

Yes, you can use Complete("<bos>Hello") instead that doesn't use template.

@ltoniazzi
Copy link
Contributor

Yes, you can use Complete("Hello") instead that doesn't use template.

Nice, I tested the adapter is working correctly!

I am planning to test out what happens when multiple adapters are loaded, because in that case one probably should use the param --lora-init-without-apply when spinning up the server for the same base model, to be able to swap between adapters.

Not sure what happens now if two LLM use same base but different adapters. Do two different servers spin-up? Have you already looked into this?

@ElevenGameStudios
Copy link

ElevenGameStudios commented Aug 22, 2024

I tired this branch in Unity via github URL and it loads Llama3.1 and Gemma models fine, but only in CPU mode. Using cuda via numGPULayers variable crashes Unity/Editor for me right now, whereas the asset store version does not. Using latest Unity 6 preview 15f1, Win10. I tried running without and with full library installed via Extras button..
Just wanted to let you know. Thanks for this llama.cpp Unity port, overall it works quite nice.

@amakropoulos
Copy link
Collaborator Author

@ElevenGameStudios thanks for sending.
What GPU do you have?
Could you send me the Editor.log file when you run the scene and crashes?
It would be very helpful if you could join the Discord channel to send you some other build to try.

@amakropoulos
Copy link
Collaborator Author

Yes, you can use Complete("Hello") instead that doesn't use template.

Nice, I tested the adapter is working correctly!

I am planning to test out what happens when multiple adapters are loaded, because in that case one probably should use the param --lora-init-without-apply when spinning up the server for the same base model, to be able to swap between adapters.

You can use multiple adapters at the same time, they are all initialised with scale 1.
Then you can use the SetLoraScale function to adjust the scale how you want.
I'm adapting the code to make it possible to set weights before the LLM starts

Not sure what happens now if two LLM use same base but different adapters. Do two different servers spin-up? Have you already looked into this?

Yes each different LLM object starts a new LLM server.

@ElevenGameStudios
Copy link

@ElevenGameStudios thanks for sending. What GPU do you have? Could you send me the Editor.log file when you run the scene and crashes?
I have a Nvidia 3070 RTX 8GB. It seems to crash before the point the "Using architecture: ..." log is sent. The 2.1.1. version runs cuda just fine.
I joined the discord, will send the Editor log file there.

@amakropoulos
Copy link
Collaborator Author

closing in favor of #220 because it is not a minor release anymore :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants