Running llama-cpp-python OpenAI compatible server #140

abasu0713 · 2024-04-24T04:11:33Z

Requesting a little help here. Trying to test out copilot functionality with llama-cpp-python with this extension. Below is my configuration setting.

{
    "[python]": {
        "editor.formatOnType": true
    },
    "cmake.configureOnOpen": true,
    "llm.backend": "openai",
    "llm.configTemplate": "Custom",
    "llm.url": "http://192.X.X.X:12080/v1/chat/completions",
    "llm.fillInTheMiddle.enabled": false,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.requestBody": {
        "parameters": {
            "max_tokens": 60,
            "temperature": 0.2,
            "top_p": 0.95
        }
    },
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOS>"
    ],
    "llm.tokenizer": null,
    "llm.tlsSkipVerifyInsecure": true,
    "llm.modelId": "",
}

I am seeing there is inference going on the server:

So I am not entirely sure what I am missing. Additionally I am trying to see the extension logs.. for the worker calls. But I don't see anything. Would you be able to give any guidance or some step by step explanation on how this can be done.

Thank you so much

The text was updated successfully, but these errors were encountered:

zikeji · 2024-05-23T06:27:49Z

Not sure if it's the same, but I'm using koboldcpp - perhaps try using v1/completions, not v1/chat/completions?

McPatate · 2024-05-29T08:10:06Z

Hi, it's indeed the /v1/completions endpoint and not /v1/chat/completions. Also, you shouldn't need to add the path anymore, you can set llm.url as http[s]://{hostname}

github-actions · 2024-06-29T01:48:05Z

This issue is stale because it has been open for 30 days with no activity.

abasu0713 · 2024-06-30T04:51:15Z

Hi, it's indeed the /v1/completions endpoint and not /v1/chat/completions. Also, you shouldn't need to add the path anymore, you can set llm.url as http[s]://{hostname}

I am going to give this a try tomorrow and report back. And sorry I didn't get back sooner. I just saw a note from github notification. Thank you for the reply. I was using /v1/chat/completions since I was using the llama-insruct models. Which require that endpoint no?

McPatate · 2024-07-03T08:31:03Z

Which require that endpoint no?

They might yes, the extension doesn't support chat models atm. The model you use must be compatible with code completion, either with fill in the middle or not (but I strongly advise to use FIM as it generates more relevant completions).

github-actions · 2024-08-03T01:50:13Z

This issue is stale because it has been open for 30 days with no activity.

dphov · 2024-09-12T15:07:07Z

bump

github-actions bot added the stale label Jun 29, 2024

github-actions bot removed the stale label Jul 1, 2024

github-actions bot added the stale label Aug 3, 2024

github-actions bot removed the stale label Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running llama-cpp-python OpenAI compatible server #140

Running llama-cpp-python OpenAI compatible server #140

abasu0713 commented Apr 24, 2024

zikeji commented May 23, 2024

McPatate commented May 29, 2024

github-actions bot commented Jun 29, 2024

abasu0713 commented Jun 30, 2024

McPatate commented Jul 3, 2024

github-actions bot commented Aug 3, 2024

dphov commented Sep 12, 2024

Running llama-cpp-python OpenAI compatible server #140

Running llama-cpp-python OpenAI compatible server #140

Comments

abasu0713 commented Apr 24, 2024

zikeji commented May 23, 2024

McPatate commented May 29, 2024

github-actions bot commented Jun 29, 2024

abasu0713 commented Jun 30, 2024

McPatate commented Jul 3, 2024

github-actions bot commented Aug 3, 2024

dphov commented Sep 12, 2024