Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running llama-cpp-python OpenAI compatible server #140

Open
abasu0713 opened this issue Apr 24, 2024 · 7 comments
Open

Running llama-cpp-python OpenAI compatible server #140

abasu0713 opened this issue Apr 24, 2024 · 7 comments

Comments

@abasu0713
Copy link

Requesting a little help here. Trying to test out copilot functionality with llama-cpp-python with this extension. Below is my configuration setting.

{
    "[python]": {
        "editor.formatOnType": true
    },
    "cmake.configureOnOpen": true,
    "llm.backend": "openai",
    "llm.configTemplate": "Custom",
    "llm.url": "http://192.X.X.X:12080/v1/chat/completions",
    "llm.fillInTheMiddle.enabled": false,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.requestBody": {
        "parameters": {
            "max_tokens": 60,
            "temperature": 0.2,
            "top_p": 0.95
        }
    },
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOS>"
    ],
    "llm.tokenizer": null,
    "llm.tlsSkipVerifyInsecure": true,
    "llm.modelId": "",
}

I am seeing there is inference going on the server:

Screenshot 2024-04-23 at 11 10 01 PM

So I am not entirely sure what I am missing. Additionally I am trying to see the extension logs.. for the worker calls. But I don't see anything. Would you be able to give any guidance or some step by step explanation on how this can be done.

Thank you so much

@zikeji
Copy link

zikeji commented May 23, 2024

Not sure if it's the same, but I'm using koboldcpp - perhaps try using v1/completions, not v1/chat/completions?

@McPatate
Copy link
Member

Hi, it's indeed the /v1/completions endpoint and not /v1/chat/completions. Also, you shouldn't need to add the path anymore, you can set llm.url as http[s]://{hostname}

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Jun 29, 2024
@abasu0713
Copy link
Author

Hi, it's indeed the /v1/completions endpoint and not /v1/chat/completions. Also, you shouldn't need to add the path anymore, you can set llm.url as http[s]://{hostname}

I am going to give this a try tomorrow and report back. And sorry I didn't get back sooner. I just saw a note from github notification. Thank you for the reply. I was using /v1/chat/completions since I was using the llama-insruct models. Which require that endpoint no?

@github-actions github-actions bot removed the stale label Jul 1, 2024
@McPatate
Copy link
Member

McPatate commented Jul 3, 2024

Which require that endpoint no?

They might yes, the extension doesn't support chat models atm. The model you use must be compatible with code completion, either with fill in the middle or not (but I strongly advise to use FIM as it generates more relevant completions).

Copy link
Contributor

github-actions bot commented Aug 3, 2024

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Aug 3, 2024
@dphov
Copy link

dphov commented Sep 12, 2024

bump

@github-actions github-actions bot removed the stale label Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants